Skip to content

[Common] Fix Build: Remove nproc from parallel make for NCCL EP build#3138

Open
phu0ngng wants to merge 1 commit into
NVIDIA:mainfrom
phu0ngng:te_ep/fix_build
Open

[Common] Fix Build: Remove nproc from parallel make for NCCL EP build#3138
phu0ngng wants to merge 1 commit into
NVIDIA:mainfrom
phu0ngng:te_ep/fix_build

Conversation

@phu0ngng

Copy link
Copy Markdown
Collaborator

Description

Remove nproc from parallel make for NCCL EP build

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
@phu0ngng phu0ngng changed the title [Common]Fix Build: Remove nproc from parallel make for NCCL EP build [Common] Fix Build: Remove nproc from parallel make for NCCL EP build Jun 22, 2026
@phu0ngng

Copy link
Copy Markdown
Collaborator Author

/te-ci L0

@greptile-apps

greptile-apps Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR removes the explicit parallelism limit (-j <nproc>) from the make invocation that builds the NCCL EP submodule (libnccl_ep.a), replacing it with a bare -j (unlimited jobs). The nproc variable derived from os.cpu_count() or 8 is also removed as it is no longer referenced.

  • The bare -j flag tells GNU make to spawn as many parallel jobs as the dependency graph allows, without an explicit upper bound. This is a common build pattern in CI/CD environments and avoids edge-case failures where os.cpu_count() returns an unexpected value in containerized runtimes.
  • The rest of the NCCL EP build logic — gencode stamp checking, make clean on gencode change, and library path discovery — is unchanged.

Confidence Score: 4/5

The change is safe to merge; it removes a single variable and widens the make parallelism bound, with no impact on correctness of the build output.

The only open question is whether unbounded NVCC parallelism could exhaust memory on large machines or constrained containers. The rest of the build logic is untouched and the change is minimal in scope.

setup.py — specifically the make -j call at line 279 and whether the caller's environment enforces any memory/concurrency ceiling.

Important Files Changed

Filename Overview
setup.py Removes explicit CPU-count limit from make -j call in NCCL EP submodule build, switching from bounded (-j <nproc>) to unbounded (-j) parallelism.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[build_nccl_ep_submodule called] --> B[Resolve CUDA arch list from NVTE_CUDA_ARCHS]
    B --> C{SM >= 90 found?}
    C -- No --> D[Raise RuntimeError]
    C -- Yes --> E[Build gencode string]
    E --> F[Set env: NVCC_GENCODE, NCCL_HOME, NCCL_EP_BUILDDIR]
    F --> G{libnccl_ep.a exists AND gencode unchanged?}
    G -- Yes --> K[Return nccl_home - skip build]
    G -- No --> H{gencode changed since last build?}
    H -- Yes --> I[make -C contrib/nccl_ep clean]
    I --> J
    H -- No --> J["make -j -C contrib/nccl_ep lib (unbounded parallelism)"]
    J --> L[Write gencode stamp]
    L --> K
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[build_nccl_ep_submodule called] --> B[Resolve CUDA arch list from NVTE_CUDA_ARCHS]
    B --> C{SM >= 90 found?}
    C -- No --> D[Raise RuntimeError]
    C -- Yes --> E[Build gencode string]
    E --> F[Set env: NVCC_GENCODE, NCCL_HOME, NCCL_EP_BUILDDIR]
    F --> G{libnccl_ep.a exists AND gencode unchanged?}
    G -- Yes --> K[Return nccl_home - skip build]
    G -- No --> H{gencode changed since last build?}
    H -- Yes --> I[make -C contrib/nccl_ep clean]
    I --> J
    H -- No --> J["make -j -C contrib/nccl_ep lib (unbounded parallelism)"]
    J --> L[Write gencode stamp]
    L --> K
Loading

Reviews (1): Last reviewed commit: "remove manually nproc" | Re-trigger Greptile

Comment thread setup.py
Comment on lines 278 to 282
subprocess.check_call(
["make", "-j", str(nproc), "-C", "contrib/nccl_ep", "lib"],
["make", "-j", "-C", "contrib/nccl_ep", "lib"],
cwd=str(nccl_root),
env=env,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Using make -j with no job limit means GNU make will spawn as many concurrent compiler processes as the dependency graph allows. NVCC compilations are memory-intensive (each process can consume several GB), so on high-core-count machines or in memory-constrained containers this can trigger OOM kills or cause the build to hang. A common middle-ground is to honour any MAKEFLAGS/MAKE_JOBS already set in the environment and fall back to a capped default only when neither is present.

Suggested change
subprocess.check_call(
["make", "-j", str(nproc), "-C", "contrib/nccl_ep", "lib"],
["make", "-j", "-C", "contrib/nccl_ep", "lib"],
cwd=str(nccl_root),
env=env,
)
nproc = int(os.environ.get("MAKE_JOBS", os.cpu_count() or 8))
subprocess.check_call(
["make", "-j", str(nproc), "-C", "contrib/nccl_ep", "lib"],
cwd=str(nccl_root),
env=env,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant