Skip to content

Add runtime evidence policy and remove misleading test checks#115

Merged
SSobol77 merged 3 commits into
mainfrom
issue/test-reality-audit-remove-misleading-mocks
Jun 23, 2026
Merged

Add runtime evidence policy and remove misleading test checks#115
SSobol77 merged 3 commits into
mainfrom
issue/test-reality-audit-remove-misleading-mocks

Conversation

@SSobol77

@SSobol77 SSobol77 commented Jun 23, 2026

Copy link
Copy Markdown
Owner

This PR adds runtime evidence discipline for ECLI development and removes/reworks misleading tests that provided false confidence.

Changes:

  • adds Makefile clean-logs target for fresh runtime log evidence
  • keeps logs/.gitkeep and logs/README.md tracked while ignoring runtime log artifacts
  • documents that runtime/TUI fixes require fresh logs plus manual smoke evidence
  • documents that tests are regression guards, not runtime truth for TUI behavior
  • rewrites selected weak UI tests to exercise more realistic dispatch/thread behavior
  • removes a no-op privileged action test that did not exercise project behavior
  • updates AGENTS.md, CODEX.md, CLAUDE.md and runtime agent docs with the evidence rule
  • removes scripts/clean_logs.sh to preserve the no-shell-scripts packaging contract

Validation:

  • make clean-logs
  • uv run ruff check src tests
  • uv run ruff format --check src tests
  • uv run pytest -q tests/services/test_privileged_action_service.py
  • uv run pytest -q tests/ui/test_pysh_console_panel.py tests/ui/test_terminal_panel_reservation.py
  • uv run pytest -q tests/packaging/test_packaging_release_contract.py::test_no_shell_wrappers_remain_under_scripts
  • uv run pytest -q tests/packaging/test_scripts_python_migration_contract.py::test_no_shell_files_remain_under_scripts
  • uv run pytest -q tests/packaging/test_scripts_python_migration_contract.py::test_live_contracts_do_not_reference_removed_script_shell_files
  • uv run pytest -q tests/extensions tests/packaging tests/docs

Non-goals:

  • no SafeRotatingFileHandler implementation
  • no F4 Diagnostics work
  • no F7 AI work
  • no F10 File Browser changes
  • no F11 PySH changes
  • no TextMate changes
  • no extension asset changes

Summary by CodeRabbit

  • Documentation

    • Added comprehensive testing reality and evidence policy for runtime and UI validation.
    • Updated runtime engineering guidance for logging and issue reproduction.
  • Tests

    • Improved test reliability by replacing synchronous mocks with real async polling.
    • Enhanced test assertions to verify actual behavior rather than mock calls.
  • Chores

    • Added make clean-logs command for clearing runtime logs before testing sessions.
    • Refined log directory management and git ignore patterns.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@SSobol77, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 55 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ed7a770a-b0b3-40e2-bd7b-84a2d03964bd

📥 Commits

Reviewing files that changed from the base of the PR and between fbc6b8e and 9253311.

📒 Files selected for processing (2)
  • .gitignore
  • Makefile
📝 Walkthrough

Walkthrough

Introduces a make clean-logs Makefile target and propagates a "fresh logs required before runtime triage" evidence policy across all agent instruction files (AGENTS.md, CLAUDE.md, CODEX.md, .claude/agents/, .codex/). Adds docs/testing-reality-policy.md, restructures .gitignore for the logs/ directory, and rewrites two over-mocked UI tests to use real threading and interaction-based assertions.

Changes

Runtime Evidence Policy, clean-logs Tooling, and Test Realism

Layer / File(s) Summary
clean-logs target and logs directory hygiene
Makefile, .gitignore, logs/README.md
Adds make clean-logs that purges runtime log files and recreates logs/editor.log. Updates .gitignore to ignore all of /logs/** while keeping .gitkeep and README.md. Adds logs/README.md describing the directory as a volatile evidence buffer and removes old logs/README-logs.md content.
Testing-reality policy document
docs/testing-reality-policy.md
Creates the canonical policy defining runtime acceptance requirements, valid vs. invalid mock usage, keep/rewrite/delete criteria for tests, and an audit ledger of specific test actions taken in this PR.
AGENTS.md full rewrite
AGENTS.md
Replaces the entire file with a new "ECLI — Agentic AI Instructions" document covering role/source-of-truth discipline, build system preflight, artifact contract with checksum verification, runtime architecture threading invariants and blast-radius list, log-evidence rules, bug-fix reporting requirements, and a communication checklist.
Propagate clean-logs and evidence policy to agent instruction files
.claude/agents/runtime-engineer.md, .codex/roles/runtime-engineer.md, .codex/runbooks/runtime.md, CLAUDE.md, CODEX.md
Adds make clean-logs prerequisites and fresh-log/manual-smoke evidence requirements to all agent and Codex instruction files.
UI test rewrites: real threading and interaction-based assertions
tests/ui/test_pysh_console_panel.py, tests/ui/test_terminal_panel_reservation.py, tests/services/test_privileged_action_service.py
Replaces _ImmediateThread monkeypatching in test_pysh_console_external_command_flow with a real deadline-driven polling loop. Rewrites test_f11_opens_pysh_console_panel_action to use a new _KeyBinderEditor test double driven via KeyBinder.handle_input(275). Removes the stale test_all_test_artifacts_remain_under_logs test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • SSobol77/ecli#96: Introduced the PySH Console Panel and its F11 wiring in KeyBinder and Ecli, which is the direct subject of the rewritten test_f11_opens_pysh_console_panel_action test in this PR.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the primary changes: establishing a runtime evidence policy and removing misleading test checks that incorrectly suggest runtime fixes when only tests pass.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue/test-reality-audit-remove-misleading-mocks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@logs/README.md`:
- Line 9: The README.md file at line 9 references the script
`scripts/clean_logs.sh` which is being removed in this PR as part of the
no-shell-scripts packaging contract. Update the line to reference the make
command instead by replacing the text that says "Run `scripts/clean_logs.sh`
before manual smoke tests and debug sessions" with "Run `make clean-logs` before
manual smoke tests and debug sessions" to maintain an accurate reference in the
documentation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2adc362d-954a-4817-bfaf-579e454b1d97

📥 Commits

Reviewing files that changed from the base of the PR and between bea8ba6 and fbc6b8e.

📒 Files selected for processing (14)
  • .claude/agents/runtime-engineer.md
  • .codex/roles/runtime-engineer.md
  • .codex/runbooks/runtime.md
  • .gitignore
  • AGENTS.md
  • CLAUDE.md
  • CODEX.md
  • Makefile
  • docs/testing-reality-policy.md
  • logs/README-logs.md
  • logs/README.md
  • tests/services/test_privileged_action_service.py
  • tests/ui/test_pysh_console_panel.py
  • tests/ui/test_terminal_panel_reservation.py
💤 Files with no reviewable changes (2)
  • logs/README-logs.md
  • tests/services/test_privileged_action_service.py

Comment thread logs/README.md Outdated
@SSobol77 SSobol77 merged commit 08c6a3b into main Jun 23, 2026
6 checks passed
@sonarqubecloud

Copy link
Copy Markdown

@SSobol77 SSobol77 deleted the issue/test-reality-audit-remove-misleading-mocks branch June 23, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant