Skip to content

feat: add TrainerRank#731

Closed
bradhilton wants to merge 115 commits into
mainfrom
feat/trainer-rank-gdn-tree
Closed

feat: add TrainerRank#731
bradhilton wants to merge 115 commits into
mainfrom
feat/trainer-rank-gdn-tree

Conversation

@bradhilton

Copy link
Copy Markdown
Collaborator

Summary

  • add art.megatron.trainer_rank plus a minimal dev/trainer_rank.py torchrun demo
  • add shared-prefix packing/tree helpers and unify GDN execution around the generic tree path
  • add TrainerRank request-head support for target logprobs, multi-target labels, top-k, logits, and hidden states
  • add topology/perf/parity dev harnesses and unit/integration coverage

Validation

  • uv run ruff check src/art/megatron/context_parallel/builder.py src/art/megatron/shared_prefix_state.py tests/unit/test_shared_prefix_attention_builder.py dev/trainer_rank_perf.py
  • uv run pytest tests/unit/test_shared_prefix_packing.py tests/unit/test_shared_prefix_tree.py tests/unit/test_shared_prefix_grad_parity.py tests/unit/test_trainer_rank_validation.py tests/unit/test_shared_prefix_attention_builder.py (34 passed, 8 skipped locally; Megatron-only attention builder passes on H200)
  • H200: shared-prefix attention builder 7 passed
  • H200: GDN CP packed correctness 7 passed, 2 skipped
  • H200: real GDN/native FLA CP 4 passed, 2 skipped
  • H200: Qwen35 full-model CP1 packed vs flattened 1 passed
  • H200: TrainerRank topology matrix 120/120 passed across DP/TP/CP <= 4 and depths 0..3
  • H200 35B/A3B CP=4 EP=4 perf guards: Austin 198k, depth-3 random, no-sharing 90k, mixed hidden/logits/top-k outputs

@bradhilton bradhilton changed the title feat: add TrainerRank and generic tree GDN feat: add TrainerRank Jun 22, 2026
@bradhilton bradhilton force-pushed the feat/trainer-rank-gdn-tree branch from 2cecc5a to 94afb0f Compare June 22, 2026 19:01
@bradhilton bradhilton force-pushed the feat/trainer-rank-gdn-tree branch 5 times, most recently from ed32a0c to 6443d99 Compare June 25, 2026 16:52
@bradhilton bradhilton force-pushed the feat/trainer-rank-gdn-tree branch from 6443d99 to fa09126 Compare June 25, 2026 16:58
@bradhilton

Copy link
Copy Markdown
Collaborator Author

Superseded by the split draft PRs: #739 for the Austin-facing Megatron/CP/GDN core changes, and #740 for the dependent art.trainer_rank API layer.

@bradhilton bradhilton closed this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant