feat(pt_expt): graph-native dpa1 — .pt2 export + compiled training + C++ inference single & multi-rank (NeighborGraph PR-B) by wanghan-iapcm · Pull Request #5604 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-06-29T14:46:42Z

NeighborGraph PR-B — graph `.pt2` export, compiled training, and C++ inference (single & multi-rank)

This PR spans the full PR-B: B1 (Python: graph .pt2 export + compiled training on the graph lower), B2 (C++ single-rank inference of the graph .pt2, dynamic edge axis), and B3 (C++/LAMMPS multi-rank). Built on the merged PR-A (#5583). Scope: dpa1, attn_layer=0, pt_expt.

B1 — graph `.pt2` export + compiled training (Python)

forward_common_lower_graph_exportable trace target; serialization.py graph export branch (lower_kind="graph", lower_input_kind metadata); _eval_model_graph DeepEval dispatch (parity vs eager dpa1 1e-10 pbc+nopbc).
Compiled training retargeted to the graph lower so eager == compiled (the MUST-FIX) → force_legacy_descriptor deleted. Root cause was a real dpa1 call_graph autograd detach bug (xp.asarray(tebd, device=) drops the tebd-net gradient under torch); fixed.

B2 — C++ graph ingestion (dynamic edge axis, single-rank)

Graph .pt2 uses a dynamic edge axis (Dim("nedge", min=2)) — one artifact evals any system size (proven across 56- and 380-edge systems at 1e-10), no C++ capacity ceiling.
C++ DeepPotPTExpt: lower_input_is_graph_ + run_model_graph (NeighborGraph ABI: atype, n_node, edge_index, edge_vec, edge_mask, …) + buildGraphTensors (mirrors the refactor(pt/dpa4): use edge schema for dpa4 and ignore sel #5562 edge path; node types from atype_ext); remap_graph_outputs_to_dense_keys (single-rank).
gtest: 5 cases × {double,float} = 10/10 (build-nlist parity, dynamic-E 2nd size, ago>0, tiny system, atomic-overload). The review process caught two bugs that would otherwise have shipped: an ago>0 heap-OOB (by inspection) and a public-vs-internal output-key mismatch (at runtime).

B3 — multi-rank C++ / LAMMPS (non-MP)

dpa1 is non-message-passing ⇒ multi-rank needs NO border_op/with-comm artifact (that is a message-passing concern, deferred to PR-G). Multi-rank reuses the same single-rank graph .pt2, fed an extended-region graph (buildGraphTensors(fold_to_local=false), N=nall, ghost node types from atype_ext incl. halo), with owned energy = sum(atom_energy[0:nloc]) and the extended force folded to owners through the existing dense select_map reverse-comm. The fail-fast for graph && multi_rank && has_message_passing is retained.
Validated locally on multi-CPU (no GPU needed for correctness): test_lammps_dpa1_graph_pt2.py — single-rank vs reference, mpirun -n 2 ≡ single-rank (energy + per-atom force + virial, atol 1e-8), plus an empty-subdomain (nloc=0) corner. Single-rank gtests stay 10/10 (multi-rank is purely additive). Multi-rank matched single-rank on the first run.

Tests / known limitations

Per-task + whole-phase reviews all Ready-to-merge.
pt_expt-only; dpa1 (non-MP) only. Follow-ons: PR-C vesin/nv O(N) builders (carry-all builders still use nonzero, eager-only), PR-D attention, PR-E angles, PR-F jax graph force, PR-G dpa2/3 message-passing (forward halo + with-comm). CUDA multi-rank unvalidated locally. Carried code-cleanup follow-ups: a ~60-line DRY duplication in training.py; the multi-rank atomic output branch has no direct gtest (covered indirectly by the mpirun per-atom-virial assertion, since a single-process gtest can't set nprocs>1).

Summary by CodeRabbit

New Features
- Added support for exporting and running graph-based model archives.
- Enabled graph-form inference in Python, training, and C++/LAMMPS paths.
- Added broader support for dynamic system sizes and static neighbor capacities.
Bug Fixes
- Improved gradient handling so type embeddings train correctly in graph mode.
- Fixed tracing/export behavior for symbolic dimensions and padding-related indexing.
- Preserved consistent output keys between graph and dense model formats.
Tests
- Added coverage for graph export, metadata, inference parity, training consistency, and LAMMPS single- and multi-rank runs.

… graph .pt2 export - Add make_model.forward_common_lower_graph_exportable: make_fx trace of forward_common_lower_graph with edge_vec as the autograd leaf; symbolic tracing enabled via two data-dependent shape fixes. - Add EnergyModel.forward_lower_graph_exportable: wraps the above with a second make_fx pass that translates internal keys to public names (atom_energy, energy, force, virial, atom_virial). - Fix edge_transform_output.py: replace int(n_node.sum()) with next(iter(fit_ret.values())).shape[0] (static shape attr, trace-safe); thread node_capacity through edge_energy_deriv -> edge_force_virial so that segment_sum sizes are static under make_fx symbolic mode. - Add edge_energy_deriv node_capacity param (None = eager fallback). - Test: source/tests/pt_expt/model/test_graph_export.py — TDD RED->GREEN, verifies traced module reproduces eager energy_redu (rtol=1e-10).

… do_atomic_virial branches), dedup key-translation, drop redundant detach

…r_graph Adds test_graph_static_capacity.py (4 tests): shape (2,64), real-edge count matches dynamic, real prefix identity + masked tail, overflow raises. No source change needed -- builder.py already threads layout.edge_capacity into pad_and_guard_edges.

…raph .pt2 metadata for the C++ hub (B2)

…pbc, 1e-10) Add a graph-form .pt2 DeepEval dispatch and parity test (NeighborGraph PR-B Phase B1.4). DeepEval: route lower_input_kind=="graph" archives through a new _eval_model_graph that builds a carry-all NeighborGraph (padded to the static edge_capacity baked in metadata), feeds the positional schema (atype, n_node, edge_index, edge_vec, edge_mask, fparam, aparam, charge_spin) to the AOTI forward, and reshapes the LOCAL public outputs by category (no communicate_extended_output, since the graph path is ghost-free). The dense (nlist) path is untouched. Fix dynamic-shape generalization of the graph .pt2 export: under symbolic make_fx/torch.export, int() on a SymInt SPECIALIZES the axis, baking the trace-time sample size (N=14, nf=2) into the autograd-derived force/energy/ virial outputs. Drop int() on node_capacity (edge_force_virial) and on n_node.shape[0] (fit_output_to_model_output_graph), and derive nf-1 in frame_id_from_n_node as a runtime 0-d tensor instead of xp.asarray(shape-1). The exported graph now generalizes across arbitrary N and nf. Test: dpa1(attn_layer=0) energy model exported with lower_kind="graph", evaluated through the pt_expt DeepPot and compared to the eager dense (sel-capped, neighbor_graph_method="legacy") reference at rtol=atol=1e-10 for pbc and nopbc. The fixture is a sparse 8-atom cluster and asserts non-binding sel (max neighbors < 30) so the carry-all/sel-capped neighbor sets coincide (anti-vacuity).

…); drop force_legacy_descriptor Retarget _CompiledModel to compile forward_common_lower_graph for graph-eligible descriptors (dpa1 attn_layer==0), gated by the same mixed_types()+uses_graph_lower() predicate the eager default-flip uses; se_e2_a/dpa2/dpa3 keep compiling the dense forward_lower. _trace_and_compile_graph builds a synthetic NeighborGraph with prime-distinct nf/N/E axes (no make_fx duck-shape merge) and edge_vec as the autograd leaf; _forward_graph builds the carry-all graph eagerly and unravels flat (N,*) node outputs to (nf,nloc,*). cpp.simdlen=0 for the graph compile avoids an inductor CPU scatter-vectorizer crash on the per-frame virial atomic_add. Also fixes an eager autograd bug in dpa1 call_graph: xp.asarray(type_embedding, device=dev) DETACHES under torch, so the type-embedding weights never trained in the graph path (grad None despite a real finite-diff dependency). make_fx traced through it, so compiled != eager and the optimizer diverged after step 0. Use type_embedding directly (mirrors the dense path); the tebd net now trains and eager==compiled to 1e-10 across the varying-natoms trajectory. Drops the force_legacy_descriptor workaround + uses_graph_lower monkeypatch.

… + legacy-gate assumption in the graph compile path

for more information, see https://pre-commit.ci

coderabbitai · 2026-06-29T14:49:11Z

📝 Walkthrough

Walkthrough

Adds a graph-native NeighborGraph lowering path across export, inference, training compilation, C++ runtime, and tests. The PR also updates dpmodel and neighbor-graph helpers to preserve gradients and symbolic shapes during graph tracing.

Changes

Graph-form lower-forward export and runtime

Layer / File(s)	Summary
Trace-safe graph utilities `deepmd/dpmodel/descriptor/dpa1.py`, `deepmd/dpmodel/utils/neighbor_graph/derivatives.py`, `deepmd/dpmodel/utils/neighbor_graph/graph.py`	`type_embedding` is used directly in graph descriptor paths; `node_capacity` is kept symbolic in `edge_force_virial`; `frame_id_from_n_node` clamps using a runtime tensor instead of `n_node.shape[0]`.
Trace-stable edge transforms `deepmd/pt_expt/model/edge_transform_output.py`	`edge_energy_deriv` gains `node_capacity`, and graph fitting/output shaping uses trace-stable node-axis sizing for frame indexing and backward scatter sizing.
Graph export model entry points `deepmd/pt_expt/model/ener_model.py`, `deepmd/pt_expt/model/make_model.py`	Energy export key translation is centralized, graph-local export keys are added, and `forward_common_lower_graph_exportable` traces the graph forward into an exportable module.
Graph export metadata and DeepEval dispatch `deepmd/pt_expt/utils/serialization.py`, `deepmd/pt_expt/infer/deep_eval.py`	`deserialize_to_file` accepts `lower_kind`, graph exports build dynamic graph inputs and store `lower_input_kind`, and DeepEval routes graph archives through `_eval_model_graph` with graph-key output mapping.
GRAPH compiled lower `deepmd/pt_expt/train/training.py`	Graph eligibility detection, graph tracing/compilation helpers, a cached graph forward path, and graph-shaped output unraveling are added to compiled training execution.
C++ graph runtime path `source/api_cc/include/commonPT.h`, `source/api_cc/include/DeepPotPTExpt.h`, `source/api_cc/src/DeepPotPTExpt.cc`	C++ support adds graph tensor packing, graph model execution, graph metadata dispatch, dense-key remapping, and graph-path cache handling.
Graph export and runtime tests `source/tests/common/dpmodel/test_graph_static_capacity.py`, `source/tests/pt_expt/model/test_graph_export.py`, `source/tests/pt_expt/infer/test_graph_deepeval.py`, `source/tests/pt_expt/utils/test_graph_pt2_metadata.py`, `source/tests/pt_expt/test_training.py`, `source/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.cc`, `source/tests/infer/gen_dpa1.py`, `source/lmp/tests/test_lammps_dpa1_graph_pt2.py`	Tests cover static edge capacity, graph export tracing, metadata, DeepEval parity, C++ inference, graph artifact generation, the compiled varying-natoms graph path, and LAMMPS graph `.pt2` inference.

Sequence Diagram(s)

sequenceDiagram
  participant deserialize_to_file
  participant forward_lower_graph_exportable
  participant _eval_model
  participant _eval_model_graph
  participant exported_module
  deserialize_to_file->>forward_lower_graph_exportable: trace graph lower
  deserialize_to_file->>deserialize_to_file: record lower_input_kind
  _eval_model->>_eval_model: branch on lower_input_kind
  _eval_model->>_eval_model_graph: graph archive path
  _eval_model_graph->>exported_module: forward(atype, n_node, edge_index, edge_vec, edge_mask, ...)
  exported_module-->>_eval_model_graph: graph outputs

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

perf: DPA4 .pt2 LAMMPS (C++ AOTI) inference 3-6x slower than Python (ASE) due to CPU-side edge construction and data marshalling #5595: Adds graph-schema DeepPotPTExpt and commonPT paths (compute, buildGraphTensors, run_model_graph) that touch the same C++ inference pipeline.

Possibly related PRs

deepmodeling/deepmd-kit#5284: Extends the pt_expt DeepEval inference pipeline that this PR routes through _eval_model_graph for graph-form archives.
deepmodeling/deepmd-kit#5391: Touches the same deepmd/pt_expt/infer/deep_eval.py execution path used here for model evaluation.
deepmodeling/deepmd-kit#5457: Modifies the same serialization and training compilation machinery that now supports lower_kind="graph".

Suggested labels

Core

Suggested reviewers

OutisLi
njzjz

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly matches the PR’s main scope: graph-native DPA1, .pt2 export, compiled training, and C++ inference across single- and multi-rank paths.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

+    DeepPot,
+)
+from deepmd.pt_expt.utils.env import (
+    DEVICE,


+    if task_buf_order:
+        try:
+            _fitting = model.get_fitting_net()
+        except AttributeError:


+            pass
+        try:
+            _atomic_model = model.atomic_model
+        except AttributeError:


+        _dim_fp = model.get_dim_fparam()
+        if _dim_fp > 1:
+            _forbidden.add(_dim_fp)
+    except Exception:


+        _dim_ap = model.get_dim_aparam()
+        if _dim_ap > 1:
+            _forbidden.add(_dim_ap)
+    except Exception:


coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

deepmd/pt_expt/utils/serialization.py (1)
750-789: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject unknown lower_kind values instead of falling back to nlist.

A typo such as lower_kind="grpah" currently reaches the dense path and records lower_input_kind="nlist", producing the wrong artifact without any signal.
Proposed fix
 def deserialize_to_file(
@@
     """
+    if lower_kind not in {"nlist", "graph"}:
+        raise ValueError(
+            f"Unknown lower_kind {lower_kind!r}; expected 'nlist' or 'graph'."
+        )
     if model_file.endswith(".pt2"):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/utils/serialization.py` around lines 750 - 789, The
deserialize entrypoint in serialization.py currently accepts any lower_kind
value and silently falls back to the nlist path, which can produce the wrong
artifact for typos. Update deserialize_to_file and the lower-path selection
logic in _deserialize_to_file_pt2 / _deserialize_to_file_pte to validate
lower_kind explicitly against the supported values ("nlist" and "graph") and
raise a clear error for anything else instead of defaulting to nlist, so the
recorded lower_input_kind always matches the requested schema.
source/tests/pt_expt/test_training.py (1)
1355-1374: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert that the DPA1 no-attn case really takes the graph lower.

These changes now document graph-lower routing, but the test still only proves eager/compiled parity. If _CompiledModel silently falls back to the dense lower and the sampled systems stay non-binding, this case still passes, so the dispatch change introduced by this PR is untested. Please add an explicit assertion for the DPA1 no-attn branch that the compiled model is graph-eligible / using the graph lower, while keeping the dense expectation for the other descriptors.

Also applies to: 1391-1392, 1466-1479
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt_expt/test_training.py` around lines 1355 - 1374, The current
test only checks eager/compiled parity in _check_varying_natoms, so the DPA1
no-attn routing to the graph lower is not explicitly verified. Add an assertion
in the DPA1 branch that the compiled model is graph-eligible and actually uses
forward_common_lower_graph, while preserving the dense forward_lower expectation
for the non-eligible descriptors. Extend the same explicit dispatch check in the
related test sections that cover the same descriptor routing.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Around line 968-970: The graph-compiled forward path is dropping the
do_atomic_virial request, so atom_virial cannot be returned from compiled
models. Update forward and _forward_graph in training.py to thread
do_atomic_virial through the graph-eligible dispatch, and make
_trace_and_compile_graph cache/trace separately based on that flag so the
compiled lower path matches forward_common_lower_graph behavior.
- Around line 1283-1293: The reshaping logic in the result loop currently relies
on the `N != nframes` heuristic, which fails for `nloc == 1` and leaves
node-level outputs flat. Update the `training.py` postprocessing in the
`result.items()` loop to use explicit key categories from the graph/lower-level
public keys instead of shape inference, and reshape known node-level outputs
like `atom_energy` and `force` to `(nf, 1, *)` even when `N == nframes`.

In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 447-448: The dynamic-shape spec in _build_graph_dynamic_shapes
still only marks the frame axis for aparam, so its atom axis remains fixed to
the sample nloc. Update the aparam entry in the shape mapping to include both
the frame axis and the atom axis as dynamic, matching the intended (nf, nloc,
nda) behavior. Keep the change localized to the graph shape spec alongside the
existing fparam and aparam handling so exported graphs no longer specialize to a
single atom count.

---

Outside diff comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 750-789: The deserialize entrypoint in serialization.py currently
accepts any lower_kind value and silently falls back to the nlist path, which
can produce the wrong artifact for typos. Update deserialize_to_file and the
lower-path selection logic in _deserialize_to_file_pt2 /
_deserialize_to_file_pte to validate lower_kind explicitly against the supported
values ("nlist" and "graph") and raise a clear error for anything else instead
of defaulting to nlist, so the recorded lower_input_kind always matches the
requested schema.

In `@source/tests/pt_expt/test_training.py`:
- Around line 1355-1374: The current test only checks eager/compiled parity in
_check_varying_natoms, so the DPA1 no-attn routing to the graph lower is not
explicitly verified. Add an assertion in the DPA1 branch that the compiled model
is graph-eligible and actually uses forward_common_lower_graph, while preserving
the dense forward_lower expectation for the non-eligible descriptors. Extend the
same explicit dispatch check in the related test sections that cover the same
descriptor routing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 58e2933c-1a05-4d30-bdfa-0cec147b8f6a

📥 Commits

Reviewing files that changed from the base of the PR and between 58bef11 and b046874.

📒 Files selected for processing (14)

deepmd/dpmodel/descriptor/dpa1.py
deepmd/dpmodel/utils/neighbor_graph/derivatives.py
deepmd/dpmodel/utils/neighbor_graph/graph.py
deepmd/pt_expt/infer/deep_eval.py
deepmd/pt_expt/model/edge_transform_output.py
deepmd/pt_expt/model/ener_model.py
deepmd/pt_expt/model/make_model.py
deepmd/pt_expt/train/training.py
deepmd/pt_expt/utils/serialization.py
source/tests/common/dpmodel/test_graph_static_capacity.py
source/tests/pt_expt/infer/test_graph_deepeval.py
source/tests/pt_expt/model/test_graph_export.py
source/tests/pt_expt/test_training.py
source/tests/pt_expt/utils/test_graph_pt2_metadata.py

coderabbitai · 2026-06-29T14:59:17Z

+            return self._forward_graph(
+                coord, atype, box, fparam, aparam, charge_spin, nframes, nloc, rcut
+            )


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Preserve do_atomic_virial in the graph compiled path.

forward(..., do_atomic_virial=True) is dropped when graph-eligible models dispatch to _forward_graph, and _trace_and_compile_graph() always traces do_atomic_virial=False. That makes the compiled graph path unable to return atom_virial even though forward_common_lower_graph supports it.

Consider threading this flag through _forward_graph() and compiling/cache-keying the graph lower by the requested atomic-virial mode.

Also applies to: 743-759, 1223-1230

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/pt_expt/train/training.py` around lines 968 - 970, The graph-compiled forward path is dropping the do_atomic_virial request, so atom_virial cannot be returned from compiled models. Update forward and _forward_graph in training.py to thread do_atomic_virial through the graph-eligible dispatch, and make _trace_and_compile_graph cache/trace separately based on that flag so the compiled lower path matches forward_common_lower_graph behavior.

coderabbitai · 2026-06-29T14:59:17Z

+        for key, val in result.items():
+            # ``N != nframes`` distinguishes node-level keys (lead dim N) from
+            # frame-level keys (lead dim nf) by shape. DEGENERATE: when nloc==1,
+            # N == nframes, so node-level keys are NOT unravelled and stay
+            # (nf, *) instead of (nf, 1, *). Harmless for the varying-natoms
+            # trainer (nloc >> 1); a single-atom-per-frame system would need an
+            # explicit per-key category check instead of the shape heuristic.
+            if val is not None and val.shape[:1] == torch.Size([N]) and N != nframes:
+                out[key] = val.reshape(nframes, nloc, *val.shape[1:])
+            else:
+                out[key] = val


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use key categories instead of the N != nframes heuristic.

For nloc == 1, node-level outputs like atom_energy and force remain flat (nf, *) instead of (nf, 1, *), breaking the forward output contract for valid single-atom systems. The graph lower already emits public keys, so reshape known node-level keys explicitly.

Proposed fix

+ node_level_keys = {"atom_energy", "force", "atom_virial", "mask"} for key, val in result.items(): - # ``N != nframes`` distinguishes node-level keys (lead dim N) from - # frame-level keys (lead dim nf) by shape. DEGENERATE: when nloc==1, - # N == nframes, so node-level keys are NOT unravelled and stay - # (nf, *) instead of (nf, 1, *). Harmless for the varying-natoms - # trainer (nloc >> 1); a single-atom-per-frame system would need an - # explicit per-key category check instead of the shape heuristic. - if val is not None and val.shape[:1] == torch.Size([N]) and N != nframes: + if ( + key in node_level_keys + and val is not None + and val.shape[:1] == torch.Size([N]) + ): out[key] = val.reshape(nframes, nloc, *val.shape[1:]) else: out[key] = val

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for key, val in result.items():

# ``N != nframes`` distinguishes node-level keys (lead dim N) from

# frame-level keys (lead dim nf) by shape. DEGENERATE: when nloc==1,

# N == nframes, so node-level keys are NOT unravelled and stay

# (nf, *) instead of (nf, 1, *). Harmless for the varying-natoms

# trainer (nloc >> 1); a single-atom-per-frame system would need an

# explicit per-key category check instead of the shape heuristic.

if val is not None and val.shape[:1] == torch.Size([N]) and N != nframes:

out[key] = val.reshape(nframes, nloc, *val.shape[1:])

else:

out[key] = val

node_level_keys = {"atom_energy", "force", "atom_virial", "mask"}

for key, val in result.items():

if (

key in node_level_keys

and val is not None

and val.shape[:1] == torch.Size([N])

):

out[key] = val.reshape(nframes, nloc, *val.shape[1:])

else:

out[key] = val

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/pt_expt/train/training.py` around lines 1283 - 1293, The reshaping logic in the result loop currently relies on the `N != nframes` heuristic, which fails for `nloc == 1` and leaves node-level outputs flat. Update the `training.py` postprocessing in the `result.items()` loop to use explicit key categories from the graph/lower-level public keys instead of shape inference, and reshape known node-level outputs like `atom_energy` and `force` to `(nf, 1, *)` even when `N == nframes`.

coderabbitai · 2026-06-29T14:59:17Z

+        {0: nframes_dim} if fparam is not None else None,  # fparam: (nf, ndf)
+        {0: nframes_dim} if aparam is not None else None,  # aparam: (nf, nloc, nda)


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

set -euo pipefail # Locate the target file and nearby dynamic-shape/export logic. git ls-files deepmd/pt_expt/utils/serialization.py ast-grep outline deepmd/pt_expt/utils/serialization.py --view expanded # Inspect the relevant region around the cited lines. sed -n '400,480p' deepmd/pt_expt/utils/serialization.py # Find graph-export and dynamic-shape helpers. rg -n "_build_graph_dynamic_shapes|deserialize_to_file|lower_kind|aparam|nloc_dim|nframes_dim" deepmd/pt_expt/utils/serialization.py deepmd/pt_expt -S

Repository: deepmodeling/deepmd-kit

Length of output: 38920

🏁 Script executed:

set -euo pipefail # Inspect tests/examples that cover graph export or aparam-specific behavior. rg -n "dim_aparam|aparam|torch.export|dynamic_shapes|graph export|deserialize_to_file" deepmd -S --glob '!**/dist/**' --glob '!**/build/**'

Repository: deepmodeling/deepmd-kit

Length of output: 50380

🏁 Script executed:

set -euo pipefail # Read the graph-export branch and the graph sample-input builder with line numbers. sed -n '313,420p' deepmd/pt_expt/utils/serialization.py printf '\n---\n' sed -n '850,960p' deepmd/pt_expt/utils/serialization.py

Repository: deepmodeling/deepmd-kit

Length of output: 7322

🏁 Script executed:

set -euo pipefail # Pull just the aparam-related graph-export docs/comments from the inference side for contract context. sed -n '1230,1260p' deepmd/pt_expt/infer/deep_eval.py

Repository: deepmodeling/deepmd-kit

Length of output: 1284

Make graph aparam’s atom axis dynamic
_build_graph_dynamic_shapes still leaves aparam.shape[1] static, so graph exports specialize it to the sample nloc=7. That breaks dim_aparam > 0 exports when runtime atom counts differ. Update the graph dynamic-shape spec for aparam to include the atom axis.

Proposed fix

nframes_dim = torch.export.Dim("nframes", min=1) n_node_total_dim = torch.export.Dim("n_node_total", min=1) + nloc_dim = torch.export.Dim("nloc", min=1) return ( {0: n_node_total_dim}, # atype: (N,) {0: nframes_dim}, # n_node: (nf,) @@ - {0: nframes_dim} if aparam is not None else None, # aparam: (nf, nloc, nda) + {0: nframes_dim, 1: nloc_dim} if aparam is not None else None, # aparam

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{0: nframes_dim} if fparam is not None else None, # fparam: (nf, ndf)

{0: nframes_dim} if aparam is not None else None, # aparam: (nf, nloc, nda)

nloc_dim = torch.export.Dim("nloc", min=1)

{0: nframes_dim} if fparam is not None else None, # fparam: (nf, ndf)

{0: nframes_dim, 1: nloc_dim} if aparam is not None else None, # aparam

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/pt_expt/utils/serialization.py` around lines 447 - 448, The dynamic-shape spec in _build_graph_dynamic_shapes still only marks the frame axis for aparam, so its atom axis remains fixed to the sample nloc. Update the aparam entry in the shape mapping to include both the frame axis and the atom axis as dynamic, matching the intended (nf, nloc, nda) behavior. Keep the change localized to the graph shape spec alongside the existing fparam and aparam handling so exported graphs no longer specialize to a single atom count.

codecov · 2026-06-29T15:47:54Z

Codecov Report

❌ Patch coverage is 65.74074% with 148 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.34%. Comparing base (5082854) to head (b25fdfc).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
...rce/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.cc	6.36%	103 Missing ⚠️
deepmd/pt_expt/train/training.py	80.89%	34 Missing ⚠️
deepmd/pt_expt/utils/serialization.py	89.33%	8 Missing ⚠️
deepmd/pt_expt/infer/deep_eval.py	91.17%	3 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (5082854) and HEAD (b25fdfc). Click for more details.

HEAD has 30 uploads less than BASE

Flag BASE (5082854) HEAD (b25fdfc)

56 26

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5604      +/-   ##
==========================================
- Coverage   82.37%   73.34%   -9.04%     
==========================================
  Files         902      880      -22     
  Lines      101527    96521    -5006     
  Branches     4056     3390     -666     
==========================================
- Hits        83630    70790   -12840     
- Misses      16432    24938    +8506     
+ Partials     1465      793     -672

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… static edge_capacity The graph-form .pt2 export now marks the edge axis dynamic (Dim("nedge", min=2)) instead of baking a static E_max=ceil(1.25*nloc*nnei) capacity. The AOTI artifact accepts any system size with no capacity ceiling. - _build_graph_dynamic_shapes: edge_index dim1 / edge_vec dim0 / edge_mask dim0 are now dynamic; mirrors the dense Dim("nnei", min=...) precedent. - _trace_and_export graph branch: drop the metadata["edge_capacity"] write; apply _strip_shape_assertions to neutralise the SIGFPE-prone deferred shape guards on the dynamic E axis (same handling the spin dense path uses). - deep_eval._eval_model_graph: build the carry-all graph at its tight edge count (no edge_capacity padding). - test_graph_deepeval: eval TWO different-size systems (8- and 20-atom, 56 vs 380 real edges) through the SAME exported artifact; both match eager dense dpa1 at 1e-10 pbc+nopbc, both non-binding. The 20-atom system (380 > 263) would have overflowed the B1 static artifact -> the RED. - test_graph_pt2_metadata: graph metadata no longer carries edge_capacity.

…e graph (dynamic-E) caller + edge-axis safety

…) generator Section B of gen_dpa1.py produces deeppot_dpa1_graph.pt2 (lower_kind="graph", do_atomic_virial=True) and the accompanying deeppot_dpa1_graph.expected sidecar. Reference values come from an independent nlist .pt2 eval (NOT the graph .pt2) so the C++ gtest (B2.5) validates the graph AOTI path against a known-good reference. Sanity check: graph .pt2 vs nlist ref force diff 1.3e-18 (machine precision; sel=30 >> actual neighbors, so both paths see identical neighbor sets). Forces non-degenerate: max |F| ~5.3e-4 (PBC). Config: type_map=[O,H], sel=30, rcut=6.0, attn_layer=0, neuron=[2,4,8], axis_neuron=4, fitting neuron=[5,5,5], resnet_dt=True, seed=1 — mirrors DPA1_CONFIG in source/tests/pt_expt/utils/test_graph_pt2_metadata.py.

B2.2: read lower_input_kind="graph" -> lower_input_is_graph_. B2.3: run_model_graph with NeighborGraph AOTI input order (atype, n_node, edge_index, edge_vec, edge_mask, [fparam], [aparam], [charge_spin]); no coord / edge_scatter_index. B2.4: GraphTensorPack + buildGraphTensors (delegates to createEdgeTensors+compactEdgeTensors for the rcut filter, dynamic edge count and 2 masked dummy edges; drops edge_index_ext, adds n_node=[nloc], node types from atype_ext[0:nloc]) + compute_inner & standalone dispatch branches. Multi-rank graph fails fast (PR-B3).

… path Local std::vector<int64_t> mapping was declared in compute_inner and populated only inside if(ago==0). The graph branch called buildGraphTensors with this local vector on every step, causing an OOB heap read on ago>0 (mapping.size()==0). Fix: promote mapping to a member mapping_ (parallel to mapping_tensor) so it persists across steps. Edge-path (createEdgeTensors) and dense-path (mapping_tensor) are unaffected in behavior; only the vector source changes from local to member.

…xtraction Add test_deeppot_dpa1_graph_ptexpt.cc — the first runtime exercise of the NeighborGraph .pt2 C++ ingestion (B2.2-4). Four cases x {double,float}: build-nlist parity vs the committed .expected, a second 12-atom system through the same model (dynamic edge axis), the LAMMPS InputNlist+ago=0/ago=1 path (compute_inner + cached mapping_), and a tiny no-edge system (nedge_min=2 guard). The first run surfaced a B2.2-4 bug: the graph forward emits LOCAL flat-N PUBLIC keys (atom_energy/energy/force/virial/atom_virial) but compute()'s output extraction read the dense INTERNAL keys (energy_redu/energy_derv_r/...), so output_map["energy_redu"].view() threw on an undefined tensor. The graph branch had only ever been compiled, never run. Fix: remap_graph_outputs_to_dense_keys() in commonPT.h, called after extract_outputs in both compute overloads (gated on lower_input_is_graph_). Rewrites the public keys into the dense internal-key layout; per-atom force/atom_virial are local (nloc) and zero-padded up to nall so the existing fold-back is a no-op on ghost rows (local rows already carry the folded ghost contributions). Dense/edge paths untouched. gen_dpa1.py now persists the dense nlist-ref .pt2 (deeppot_dpa1_graph_nlist_ref.pt2) as a live graph-vs-dense oracle for the dynamic-edge cases.

…ring (nlist .pt2, 1e-5)

for more information, see https://pre-commit.ci

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

source/api_cc/src/DeepPotPTExpt.cc (1)

508-539: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Require lmp_list.mapping for graph artifacts with ghosts.

The graph branch uses mapping_ in buildGraphTensors to fold ghost neighbors onto local owners, but the current guard only requires an atom map for has_message_passing_ models. A DPA1 graph artifact can therefore run PBC LAMMPS without atom_modify map yes, fall back to identity mapping, and feed invalid/wrong folded graph indices.

Proposed fix

   if (lower_input_is_graph_ && multi_rank) {
     throw deepmd::deepmd_exception(
         "Multi-rank graph (NeighborGraph) .pt2 inference is not yet "
         "supported (PR-B3). Run single-rank, or use a dense/edge .pt2 for "
         "multi-rank LAMMPS.");
   }
+  if (lower_input_is_graph_ && nghost > 0 && !atom_map_present) {
+    throw deepmd::deepmd_exception(
+        "Graph (NeighborGraph) .pt2 inference with ghost atoms requires "
+        "`atom_modify map yes` so ghost neighbors can be folded onto their "
+        "local owners. Re-run with an atom map enabled, or use a dense .pt2.");
+  }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 508 - 539, The
graph-artifact path in DeepPotPTExpt::buildGraphTensors is using mapping_ to
fold ghost neighbors, but the current validation only enforces atom-map presence
for has_message_passing_ models. Add a guard for lower_input_is_graph_ with
ghosts so graph artifacts also require lmp_list.mapping (or atom_modify map yes)
before compute(), and fail fast when it is missing instead of falling back to
identity mapping.

deepmd/pt_expt/utils/serialization.py (1)

793-799: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Reject unknown lower_kind values before dispatch.

A typo currently falls through to the dense nlist export while recording nlist metadata, producing a valid-looking artifact with the wrong ABI.

Proposed fix

     """
+    if lower_kind not in {"nlist", "graph"}:
+        raise ValueError(
+            f"Unsupported lower_kind={lower_kind!r}; expected 'nlist' or 'graph'."
+        )
     if model_file.endswith(".pt2"):
         _deserialize_to_file_pt2(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/utils/serialization.py` around lines 793 - 799, The dispatch
in serialization.py currently lets unknown lower_kind values fall through to the
dense nlist export path, which can produce a misleading artifact with incorrect
ABI metadata. Add validation before the model_file.endswith(".pt2") branch so
only the expected lower_kind values are accepted, and raise an error for
anything else. Keep the check close to the dispatch logic in the
deserialization/export flow that calls _deserialize_to_file_pt2 and
_deserialize_to_file_pte.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 793-799: The dispatch in serialization.py currently lets unknown
lower_kind values fall through to the dense nlist export path, which can produce
a misleading artifact with incorrect ABI metadata. Add validation before the
model_file.endswith(".pt2") branch so only the expected lower_kind values are
accepted, and raise an error for anything else. Keep the check close to the
dispatch logic in the deserialization/export flow that calls
_deserialize_to_file_pt2 and _deserialize_to_file_pte.

In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 508-539: The graph-artifact path in
DeepPotPTExpt::buildGraphTensors is using mapping_ to fold ghost neighbors, but
the current validation only enforces atom-map presence for has_message_passing_
models. Add a guard for lower_input_is_graph_ with ghosts so graph artifacts
also require lmp_list.mapping (or atom_modify map yes) before compute(), and
fail fast when it is missing instead of falling back to identity mapping.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c4b21156-2b33-4392-b6b0-4819a492ea77

📥 Commits

Reviewing files that changed from the base of the PR and between b046874 and b25fdfc.

📒 Files selected for processing (9)

deepmd/pt_expt/infer/deep_eval.py
deepmd/pt_expt/utils/serialization.py
source/api_cc/include/DeepPotPTExpt.h
source/api_cc/include/commonPT.h
source/api_cc/src/DeepPotPTExpt.cc
source/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.cc
source/tests/infer/gen_dpa1.py
source/tests/pt_expt/infer/test_graph_deepeval.py
source/tests/pt_expt/utils/test_graph_pt2_metadata.py

🚧 Files skipped from review as they are similar to previous changes (2)

deepmd/pt_expt/infer/deep_eval.py
source/tests/pt_expt/infer/test_graph_deepeval.py

…st (B2.5 follow-ups)

…-comm; no with-comm)

wanghan-iapcm · 2026-06-30T00:37:39Z

B3: non-MP multi-rank C++ inference (single-rank ≡ multi-rank)

Adds the multi-rank graph inference path for dpa1 (commits 36b767b9..92c35a6b1).

Key design (dpa1 is non-message-passing): multi-rank reuses the same single-rank graph .pt2 fed an extended-region graph (buildGraphTensors(fold_to_local=false), N=nall, ghost node types from atype_ext incl. halo), owned energy = sum(atom_energy[0:nloc]), extended force (N=nall) folded to owners via the existing dense reverse-comm (select_map + LAMMPS reverse comm). No border_op/with-comm artifact — that is for message-passing descriptors (DPA2/3, a later PR). The fail-fast for graph && multi_rank && has_message_passing is retained.

Validation (local, multi-CPU — no GPU needed for correctness): source/lmp/tests/test_lammps_dpa1_graph_pt2.py — single-rank vs reference, and mpirun -n 2 ≡ single-rank (energy + per-atom force + virial, atol 1e-8), plus an empty-subdomain (nloc=0) corner. Existing single-rank gtests (10/10) unchanged. Multi-rank matched single-rank on the first run — the design correction (non-MP needs no with-comm) held under real MPI domain decomposition.

Known limits: pt_expt-only; dpa1 (non-MP) only — DPA2/3 multi-rank (forward halo / message passing) is a follow-on; CUDA multi-rank unvalidated locally.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

source/api_cc/src/DeepPotPTExpt.cc (1)

568-585: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Require a real atom map before folded single-rank graph inference.

buildGraphTensors(..., fold_to_local=true) still needs mapping_ to collapse ghost neighbors onto owner indices. Falling back to mapping_[ii] = ii here means a single-rank graph run with ghosts but without InputNlist.mapping can feed ghost indices into a graph whose n_node is only nloc, so this becomes a silent wrong-answer path instead of a fail-fast.

Suggested fix

     } else {
+      if (lower_input_is_graph_ && !multi_rank && nghost > 0) {
+        throw deepmd::deepmd_exception(
+            "Single-rank NeighborGraph .pt2 inference with ghosts requires "
+            "`atom_modify map yes` in the LAMMPS input (so "
+            "InputNlist.mapping is populated).");
+      }
       // Identity fallback.  The fail-fast above guarantees we only
       // reach this branch when one of these is true:
       //   - The model is non-message-passing (mapping is unused).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 568 - 585, The fallback
identity mapping in DeepPotPTExpt::buildGraphTensors is unsafe for folded
single-rank graph inference because it can let ghost indices slip into a graph
built with n_node based on nloc. Remove this silent fallback for the
fold_to_local path and make the code fail fast unless a real InputNlist.mapping
is present; keep the identity-only behavior restricted to cases where mapping is
truly unused, such as non-message-passing or zero-ghost paths. Use the existing
symbols buildGraphTensors, mapping_, and mapping_tensor to locate the branch and
enforce the check before constructing the tensor.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/lmp/tests/test_lammps_dpa1_graph_pt2.py`:
- Around line 67-74: The fallback in the test setup is turning missing graph
artifacts into None, but the module still continues into comparisons and
indexing in the test helpers, causing TypeError instead of skipping. Update
setup_module() in test_lammps_dpa1_graph_pt2.py to check that both generated
artifacts exist before writing fixtures, and if either is absent, skip the
module/tests early; ensure the same guard covers the later pbc-related setup
used by the test functions that read expected_e, expected_f, and expected_v.
- Around line 293-301: The empty-subdomain test is not actually enforcing the
documented `processors 2 1 1` decomposition, so `_run_mpi_subprocess()` may pick
a different 2-rank layout and miss the zero-atom rank case. Update
`test_pair_deepmd_mpi_dpa1_graph_empty_subdomain` to pass the processor grid
through to `_run_mpi_subprocess()` for the MPI run, using the same explicit
`processors 2 1 1` setup described in the docstring so the intended empty-rank
branch is always exercised.

---

Outside diff comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 568-585: The fallback identity mapping in
DeepPotPTExpt::buildGraphTensors is unsafe for folded single-rank graph
inference because it can let ghost indices slip into a graph built with n_node
based on nloc. Remove this silent fallback for the fold_to_local path and make
the code fail fast unless a real InputNlist.mapping is present; keep the
identity-only behavior restricted to cases where mapping is truly unused, such
as non-message-passing or zero-ghost paths. Use the existing symbols
buildGraphTensors, mapping_, and mapping_tensor to locate the branch and enforce
the check before constructing the tensor.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3f898962-8ea6-4852-9d6d-458ae52eb1ba

📥 Commits

Reviewing files that changed from the base of the PR and between b25fdfc and e2e07f1.

📒 Files selected for processing (4)

source/api_cc/include/commonPT.h
source/api_cc/src/DeepPotPTExpt.cc
source/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.cc
source/lmp/tests/test_lammps_dpa1_graph_pt2.py

coderabbitai · 2026-06-30T00:45:18Z

+try:
+    _ref = read_expected_ref(ref_file)["pbc"]
+    expected_e = float(np.sum(_ref["expected_e"]))
+    expected_f = _ref["expected_f"].reshape(6, 3)
+    # LAMMPS uses the opposite sign convention for virial vs DeepPot.
+    expected_v = -_ref["expected_v"].reshape(6, 9)
+except FileNotFoundError:
+    expected_e = expected_f = expected_v = None


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Skip the module when the generated graph artifacts are absent.

Lines 67-74 convert a missing .expected file into None, but the tests still index expected_f / expected_v and compare against expected_e, so this fails with a TypeError instead of skipping cleanly. setup_module() should gate on both generated artifacts before writing fixtures.

Suggested fix

def setup_module() -> None: if os.environ.get("ENABLE_PYTORCH", "1") != "1": pytest.skip( "Skip test because PyTorch support is not enabled.", ) + if not pb_file.exists() or not ref_file.exists(): + pytest.skip( + "Skip test because generated dpa1 graph .pt2 artifacts are unavailable.", + ) write_lmp_data(box, coord, type_OH, data_file)

Also applies to: 92-100

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/lmp/tests/test_lammps_dpa1_graph_pt2.py` around lines 67 - 74, The fallback in the test setup is turning missing graph artifacts into None, but the module still continues into comparisons and indexing in the test helpers, causing TypeError instead of skipping. Update setup_module() in test_lammps_dpa1_graph_pt2.py to check that both generated artifacts exist before writing fixtures, and if either is absent, skip the module/tests early; ensure the same guard covers the later pbc-related setup used by the test functions that read expected_e, expected_f, and expected_v.

coderabbitai · 2026-06-30T00:45:18Z

+def test_pair_deepmd_mpi_dpa1_graph_empty_subdomain() -> None:
+    """Multi-rank with one rank owning zero local atoms (elongated box,
+    ``processors 2 1 1``, split at x = 15).  The extended-region graph path
+    must still produce correct forces/virial on the populated rank and a
+    zero contribution from the empty rank — compared against a same-archive
+    single-rank reference of the same fixture.
+    """
+    out_mpi = _run_mpi_subprocess(nprocs=2, data_path=data_file_empty_subdomain)
+    out_ref = _run_mpi_subprocess(nprocs=1, data_path=data_file_empty_subdomain)


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Force the documented processor grid in the empty-subdomain test.

This case is supposed to exercise processors 2 1 1, but the test never passes that through to _run_mpi_subprocess(). If LAMMPS chooses a different 2-rank decomposition, rank 1 may not be empty and this stops covering the branch the docstring describes.

Suggested fix

- out_mpi = _run_mpi_subprocess(nprocs=2, data_path=data_file_empty_subdomain) + out_mpi = _run_mpi_subprocess( + nprocs=2, + data_path=data_file_empty_subdomain, + processors="2 1 1", + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_pair_deepmd_mpi_dpa1_graph_empty_subdomain() -> None:

"""Multi-rank with one rank owning zero local atoms (elongated box,

``processors 2 1 1``, split at x = 15). The extended-region graph path

must still produce correct forces/virial on the populated rank and a

zero contribution from the empty rank — compared against a same-archive

single-rank reference of the same fixture.

"""

out_mpi = _run_mpi_subprocess(nprocs=2, data_path=data_file_empty_subdomain)

out_ref = _run_mpi_subprocess(nprocs=1, data_path=data_file_empty_subdomain)

def test_pair_deepmd_mpi_dpa1_graph_empty_subdomain() -> None:

"""Multi-rank with one rank owning zero local atoms (elongated box,

``processors 2 1 1``, split at x = 15). The extended-region graph path

must still produce correct forces/virial on the populated rank and a

zero contribution from the empty rank — compared against a same-archive

single-rank reference of the same fixture.

"""

out_mpi = _run_mpi_subprocess(

nprocs=2,

data_path=data_file_empty_subdomain,

processors="2 1 1",

)

out_ref = _run_mpi_subprocess(nprocs=1, data_path=data_file_empty_subdomain)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/lmp/tests/test_lammps_dpa1_graph_pt2.py` around lines 293 - 301, The empty-subdomain test is not actually enforcing the documented `processors 2 1 1` decomposition, so `_run_mpi_subprocess()` may pick a different 2-rank layout and miss the zero-atom rank case. Update `test_pair_deepmd_mpi_dpa1_graph_empty_subdomain` to pass the processor grid through to `_run_mpi_subprocess()` for the MPI run, using the same explicit `processors 2 1 1` setup described in the docstring so the intended empty-rank branch is always exercised.

Han Wang added 8 commits June 29, 2026 18:42

fix(pt_expt): B1.1 review — test forward_lower_graph_exportable (both…

ee8db1b

… do_atomic_virial branches), dedup key-translation, drop redundant detach

feat(pt_expt): graph .pt2 export branch + lower_input_kind metadata

1051a0d

fix(pt_expt): B1.3 review — persist static edge_capacity (E_max) in g…

148fa0e

…raph .pt2 metadata for the C++ hub (B2)

docs(pt_expt): B1 final-review minors — document nloc==1 unravel-skip…

47fb700

… + legacy-gate assumption in the graph compile path

dosubot Bot added bug enhancement new feature labels Jun 29, 2026

github-actions Bot added the Python label Jun 29, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

b046874

for more information, see https://pre-commit.ci

github-advanced-security AI found potential problems Jun 29, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

Han Wang added 7 commits June 30, 2026 01:21

docs(pt_expt): B2.0 review — _strip_shape_assertions now documents th…

40487c4

…e graph (dynamic-E) caller + edge-axis safety

docs(infer): B2 final-review — correct gen_dpa1 graph-reference docst…

af92be1

…ring (nlist .pt2, 1e-5)

github-actions Bot added the C++ label Jun 29, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

b25fdfc

for more information, see https://pre-commit.ci

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

wanghan-iapcm changed the title ~~feat(pt_expt): export the graph lower to .pt2 + compiled training on the graph lower (NeighborGraph PR-B1)~~ feat(pt_expt): graph-native dpa1 — .pt2 export + compiled training + C++ inference (NeighborGraph PR-B1+B2) Jun 29, 2026

Han Wang added 2 commits June 30, 2026 08:36

fix(api_cc): guard graph remap single-rank-only + atomic-overload gte…

10d82a6

…st (B2.5 follow-ups)

feat(api_cc): non-MP multi-rank graph path (extended region + reverse…

7d37319

…-comm; no with-comm)

test(lammps): dpa1 graph .pt2 single + multi-rank (mpirun -n 2, local)

92c35a6

github-actions Bot added the LAMMPS label Jun 30, 2026

wanghan-iapcm changed the title ~~feat(pt_expt): graph-native dpa1 — .pt2 export + compiled training + C++ inference (NeighborGraph PR-B1+B2)~~ feat(pt_expt): graph-native dpa1 — .pt2 export + compiled training + C++ inference single & multi-rank (NeighborGraph PR-B) Jun 30, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

e2e07f1

for more information, see https://pre-commit.ci

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

		{0: nframes_dim} if fparam is not None else None, # fparam: (nf, ndf)
		{0: nframes_dim} if aparam is not None else None, # aparam: (nf, nloc, nda)

Uh oh!

Conversation

wanghan-iapcm commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NeighborGraph PR-B — graph .pt2 export, compiled training, and C++ inference (single & multi-rank)

B1 — graph .pt2 export + compiled training (Python)

B2 — C++ graph ingestion (dynamic edge axis, single-rank)

B3 — multi-rank C++ / LAMMPS (non-MP)

Tests / known limitations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

wanghan-iapcm commented Jun 30, 2026

B3: non-MP multi-rank C++ inference (single-rank ≡ multi-rank)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wanghan-iapcm commented Jun 29, 2026 •

edited

Loading

NeighborGraph PR-B — graph `.pt2` export, compiled training, and C++ inference (single & multi-rank)

B1 — graph `.pt2` export + compiled training (Python)

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading

codecov Bot commented Jun 29, 2026 •

edited

Loading