Skip to content

Fix submission cleanup: recover all non-terminal states, not just Running#2454

Draft
ObadaS wants to merge 2 commits into
developfrom
fix/cleanup-all-non-terminal-states
Draft

Fix submission cleanup: recover all non-terminal states, not just Running#2454
ObadaS wants to merge 2 commits into
developfrom
fix/cleanup-all-non-terminal-states

Conversation

@ObadaS

@ObadaS ObadaS commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Original PR #2414

Description

Fixes a bug where submissions stuck in non-terminal states (Submitted, Preparing, Scoring) would hang forever instead of being recovered by the cleanup task.

Problem: The submission_status_cleanup() task only recovered submissions stuck in Running state. Submissions that never reached Running (stuck in Submitted, Preparing, or Scoring) would never be cleaned up.

Root cause:

  • Cleanup only checked for Running status
  • No fallback for submissions without started_when (those that never reached Running)

Solution:

  1. Extend cleanup to cover all non-terminal states: Submitted, Preparing, Running, Scoring
  2. Use created_when as fallback when started_when is null
  3. All non-terminal submissions now recovered after 24h + execution_time_limit

Code changes:

  • src/apps/competitions/tasks.py:

    • Extended non_terminal_statuses list to include all states
    • Added created_when fallback: reference_time = started_when if started_when else created_when
  • src/apps/competitions/tests/test_submissions.py:

    • Added 4 unit tests for new states (Submitted, Preparing, Scoring)
    • Added negative test for recent submissions
    • All tests pass

Issues this PR resolves

Fixes #2413

Background

This bug was discovered during the EEG Foundation Challenge incident analysis where submissions were observed stuck in non-Running states for extended periods with no recovery mechanism.

Checklist for hand testing

  • Create a competition with at least one phase
  • Submit submissions and verify they get stuck when compute_worker is stopped
  • Age submissions to >24h
  • Run cleanup task: docker compose exec django python manage.py shell -c "from competitions.tasks import submission_status_cleanup; submission_status_cleanup()"
  • Verify all stuck submissions marked as Failed

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCI tests are passing
  • Ready to merge

hananechrif and others added 2 commits July 3, 2026 12:03
…ning

Problem:
- submission_status_cleanup() only recovered Running submissions
- Submissions stuck in Submitted, Preparing, or Scoring would hang forever
- No fallback for submissions that never reached Running (started_when null)

Solution:
- Extend cleanup to cover all non-terminal states: Submitted, Preparing, Running, Scoring
- Use created_when as fallback when started_when is null
- All non-terminal submissions now recovered after 24h + execution_time_limit

Changes:
- src/apps/competitions/tasks.py:
  * Extended non_terminal_statuses list to include all states
  * Added created_when fallback logic for reference_time
  * Cleaned up comments per Codabench guidelines

- src/apps/competitions/tests/test_submissions.py:
  * Added 4 unit tests covering Submitted, Preparing, Scoring states
  * Added negative test for recent non-terminal submissions
  * Cleaned up docstrings (removed M3 references)

- tests/k6/:
  * run_cleanup_test.sh: End-to-end orchestrator
  * test_stuck_submissions.js: K6 recovery verification
  * test_cleanup_conservation.js: K6 conservation harness
  * README_cleanup_tests.md: Test documentation
  * All files cleaned up (removed M3 references per guidelines)

Tests validate:
- All non-terminal states recovered after deadline
- Recent submissions NOT cleaned up
- 100% conservation rate

Fixes #2413
@ObadaS ObadaS force-pushed the fix/cleanup-all-non-terminal-states branch from 14f57bb to 626e678 Compare July 3, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Submission cleanup only recovers Running submissions, not Submitted/Preparing/Scoring

2 participants