Add runtime evidence policy and remove misleading test checks by SSobol77 · Pull Request #115 · SSobol77/ecli

SSobol77 · 2026-06-23T23:25:24Z

This PR adds runtime evidence discipline for ECLI development and removes/reworks misleading tests that provided false confidence.

Changes:

adds Makefile clean-logs target for fresh runtime log evidence
keeps logs/.gitkeep and logs/README.md tracked while ignoring runtime log artifacts
documents that runtime/TUI fixes require fresh logs plus manual smoke evidence
documents that tests are regression guards, not runtime truth for TUI behavior
rewrites selected weak UI tests to exercise more realistic dispatch/thread behavior
removes a no-op privileged action test that did not exercise project behavior
updates AGENTS.md, CODEX.md, CLAUDE.md and runtime agent docs with the evidence rule
removes scripts/clean_logs.sh to preserve the no-shell-scripts packaging contract

Validation:

make clean-logs
uv run ruff check src tests
uv run ruff format --check src tests
uv run pytest -q tests/services/test_privileged_action_service.py
uv run pytest -q tests/ui/test_pysh_console_panel.py tests/ui/test_terminal_panel_reservation.py
uv run pytest -q tests/packaging/test_packaging_release_contract.py::test_no_shell_wrappers_remain_under_scripts
uv run pytest -q tests/packaging/test_scripts_python_migration_contract.py::test_no_shell_files_remain_under_scripts
uv run pytest -q tests/packaging/test_scripts_python_migration_contract.py::test_live_contracts_do_not_reference_removed_script_shell_files
uv run pytest -q tests/extensions tests/packaging tests/docs

Non-goals:

no SafeRotatingFileHandler implementation
no F4 Diagnostics work
no F7 AI work
no F10 File Browser changes
no F11 PySH changes
no TextMate changes
no extension asset changes

Summary by CodeRabbit

Documentation
- Added comprehensive testing reality and evidence policy for runtime and UI validation.
- Updated runtime engineering guidance for logging and issue reproduction.
Tests
- Improved test reliability by replacing synchronous mocks with real async polling.
- Enhanced test assertions to verify actual behavior rather than mock calls.
Chores
- Added make clean-logs command for clearing runtime logs before testing sessions.
- Refined log directory management and git ignore patterns.

coderabbitai · 2026-06-23T23:26:47Z

Warning

Review limit reached

@SSobol77, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 55 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ed7a770a-b0b3-40e2-bd7b-84a2d03964bd

📥 Commits

Reviewing files that changed from the base of the PR and between fbc6b8e and 9253311.

📒 Files selected for processing (2)

.gitignore
Makefile

📝 Walkthrough

Walkthrough

Introduces a make clean-logs Makefile target and propagates a "fresh logs required before runtime triage" evidence policy across all agent instruction files (AGENTS.md, CLAUDE.md, CODEX.md, .claude/agents/, .codex/). Adds docs/testing-reality-policy.md, restructures .gitignore for the logs/ directory, and rewrites two over-mocked UI tests to use real threading and interaction-based assertions.

Changes

Runtime Evidence Policy, clean-logs Tooling, and Test Realism

Layer / File(s)	Summary
`clean-logs` target and logs directory hygiene `Makefile`, `.gitignore`, `logs/README.md`	Adds `make clean-logs` that purges runtime log files and recreates `logs/editor.log`. Updates `.gitignore` to ignore all of `/logs/**` while keeping `.gitkeep` and `README.md`. Adds `logs/README.md` describing the directory as a volatile evidence buffer and removes old `logs/README-logs.md` content.
Testing-reality policy document `docs/testing-reality-policy.md`	Creates the canonical policy defining runtime acceptance requirements, valid vs. invalid mock usage, keep/rewrite/delete criteria for tests, and an audit ledger of specific test actions taken in this PR.
AGENTS.md full rewrite `AGENTS.md`	Replaces the entire file with a new "ECLI — Agentic AI Instructions" document covering role/source-of-truth discipline, build system preflight, artifact contract with checksum verification, runtime architecture threading invariants and blast-radius list, log-evidence rules, bug-fix reporting requirements, and a communication checklist.
Propagate `clean-logs` and evidence policy to agent instruction files `.claude/agents/runtime-engineer.md`, `.codex/roles/runtime-engineer.md`, `.codex/runbooks/runtime.md`, `CLAUDE.md`, `CODEX.md`	Adds `make clean-logs` prerequisites and fresh-log/manual-smoke evidence requirements to all agent and Codex instruction files.
UI test rewrites: real threading and interaction-based assertions `tests/ui/test_pysh_console_panel.py`, `tests/ui/test_terminal_panel_reservation.py`, `tests/services/test_privileged_action_service.py`	Replaces `_ImmediateThread` monkeypatching in `test_pysh_console_external_command_flow` with a real deadline-driven polling loop. Rewrites `test_f11_opens_pysh_console_panel_action` to use a new `_KeyBinderEditor` test double driven via `KeyBinder.handle_input(275)`. Removes the stale `test_all_test_artifacts_remain_under_logs` test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

SSobol77/ecli#96: Introduced the PySH Console Panel and its F11 wiring in KeyBinder and Ecli, which is the direct subject of the rewritten test_f11_opens_pysh_console_panel_action test in this PR.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the primary changes: establishing a runtime evidence policy and removing misleading test checks that incorrectly suggest runtime fixes when only tests pass.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue/test-reality-audit-remove-misleading-mocks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@logs/README.md`:
- Line 9: The README.md file at line 9 references the script
`scripts/clean_logs.sh` which is being removed in this PR as part of the
no-shell-scripts packaging contract. Update the line to reference the make
command instead by replacing the text that says "Run `scripts/clean_logs.sh`
before manual smoke tests and debug sessions" with "Run `make clean-logs` before
manual smoke tests and debug sessions" to maintain an accurate reference in the
documentation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2adc362d-954a-4817-bfaf-579e454b1d97

📥 Commits

Reviewing files that changed from the base of the PR and between bea8ba6 and fbc6b8e.

📒 Files selected for processing (14)

.claude/agents/runtime-engineer.md
.codex/roles/runtime-engineer.md
.codex/runbooks/runtime.md
.gitignore
AGENTS.md
CLAUDE.md
CODEX.md
Makefile
docs/testing-reality-policy.md
logs/README-logs.md
logs/README.md
tests/services/test_privileged_action_service.py
tests/ui/test_pysh_console_panel.py
tests/ui/test_terminal_panel_reservation.py

💤 Files with no reviewable changes (2)

logs/README-logs.md
tests/services/test_privileged_action_service.py

sonarqubecloud · 2026-06-23T23:41:31Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

SSobol77 added 2 commits June 24, 2026 00:33

chore: add runtime log cleanup evidence discipline

d6bfe22

test: add runtime evidence policy and remove misleading checks

fbc6b8e

coderabbitai Bot requested changes Jun 23, 2026

View reviewed changes

Comment thread logs/README.md Outdated

test: add runtime evidence policy and remove misleading checks

9253311

coderabbitai Bot approved these changes Jun 23, 2026

View reviewed changes

SSobol77 merged commit 08c6a3b into main Jun 23, 2026
6 checks passed

SSobol77 deleted the issue/test-reality-audit-remove-misleading-mocks branch June 23, 2026 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add runtime evidence policy and remove misleading test checks#115

Add runtime evidence policy and remove misleading test checks#115
SSobol77 merged 3 commits into
mainfrom
issue/test-reality-audit-remove-misleading-mocks

SSobol77 commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SSobol77 commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 23, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SSobol77 commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading