Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Changed VERSION to 2.18.0.dev0 urgent
#3140 opened Jun 23, 2026 by KshitijLakhani Collaborator Loading…
4 of 13 tasks
Update FE to 1.25 2.17
#3139 opened Jun 22, 2026 by cyanguwa Collaborator Loading…
8 of 13 tasks
[Common] Fix Build: Remove nproc from parallel make for NCCL EP build
#3138 opened Jun 22, 2026 by phu0ngng Collaborator Loading…
7 of 13 tasks
[Common] Experimental CuTeDSL MXFP8 backends in C++ via TVM-FFI
#3137 opened Jun 21, 2026 by kainzhong Collaborator Draft
13 tasks
[Common/PyTorch] Grouped-quantize kernels for 1D and 2D FP8 block-scaling FP8 MoE performance Performance issues
#3135 opened Jun 17, 2026 by denera Collaborator Loading…
8 of 13 tasks
Single-launch CUTLASS grouped GEMM for per-tensor NVFP4 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3134 opened Jun 17, 2026 by cael-ling Contributor Loading…
9 of 13 tasks
Enable NVFP4 RHT amax for grouped SReLU MLP community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3133 opened Jun 16, 2026 by sraman-rgb Contributor Loading…
13 tasks
[Common] Support scaled & clamped swiglu, srelu for BF16 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3132 opened Jun 16, 2026 by zhongbozhu Collaborator Loading…
13 tasks
[torch.compile] Bunch of small changes needed for enabling torch.compile
#3130 opened Jun 15, 2026 by pggPL Collaborator Loading…
8 of 13 tasks
feat: add SM_121 (GB10 consumer Blackwell) support for FA4 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3125 opened Jun 12, 2026 by TyGu1 Loading…
Avoid unpickling the extra state when not needed
#3123 opened Jun 12, 2026 by ptrendx Member Loading…
2 of 6 tasks
docs(readme): update latest news
#3121 opened Jun 11, 2026 by sbhavani Collaborator Loading…
6 of 13 tasks
TE EP integration to MoEBlock
#3116 opened Jun 10, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
[JAX] Collective Gemm test fixes
#3115 opened Jun 10, 2026 by jberchtold-nvidia Collaborator Loading…
13 tasks
Abstract CUDA hardcodes into configurable te_device_type / te_platform community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3113 opened Jun 10, 2026 by lxd-cumt Loading…
Add entrypoint for flagos multi-backend plugin system community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3107 opened Jun 9, 2026 by lxd-cumt Loading…
[PyTorch][torch.compile] Decouple amax reduction group from the quantizer
#3104 opened Jun 8, 2026 by pggPL Collaborator Loading…
4 of 13 tasks
Quantization support for GroupedTensor: FP8 per-tensor community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3102 opened Jun 7, 2026 by int-smart Contributor Loading…
11 of 13 tasks
Introduce Mega-C++ to reduce CPU overhead community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3099 opened Jun 6, 2026 by zhongbozhu Collaborator Loading…
3 of 16 tasks
increased a bit tolerance for pytorch/distributed/run_numerics.py community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3095 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
NVFP4: cache GEMM-swizzled weight scale factors across micro-batches community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3093 opened Jun 5, 2026 by cael-ling Contributor Loading…
3 of 13 tasks
ProTip! Filter pull requests by the default branch with base:main.