feat(pt_expt): graph-native dpa1 — .pt2 export + compiled training + C++ inference single & multi-rank (NeighborGraph PR-B)#5604
Conversation
… graph .pt2 export - Add make_model.forward_common_lower_graph_exportable: make_fx trace of forward_common_lower_graph with edge_vec as the autograd leaf; symbolic tracing enabled via two data-dependent shape fixes. - Add EnergyModel.forward_lower_graph_exportable: wraps the above with a second make_fx pass that translates internal keys to public names (atom_energy, energy, force, virial, atom_virial). - Fix edge_transform_output.py: replace int(n_node.sum()) with next(iter(fit_ret.values())).shape[0] (static shape attr, trace-safe); thread node_capacity through edge_energy_deriv -> edge_force_virial so that segment_sum sizes are static under make_fx symbolic mode. - Add edge_energy_deriv node_capacity param (None = eager fallback). - Test: source/tests/pt_expt/model/test_graph_export.py — TDD RED->GREEN, verifies traced module reproduces eager energy_redu (rtol=1e-10).
… do_atomic_virial branches), dedup key-translation, drop redundant detach
…r_graph Adds test_graph_static_capacity.py (4 tests): shape (2,64), real-edge count matches dynamic, real prefix identity + masked tail, overflow raises. No source change needed -- builder.py already threads layout.edge_capacity into pad_and_guard_edges.
…raph .pt2 metadata for the C++ hub (B2)
…pbc, 1e-10) Add a graph-form .pt2 DeepEval dispatch and parity test (NeighborGraph PR-B Phase B1.4). DeepEval: route lower_input_kind=="graph" archives through a new _eval_model_graph that builds a carry-all NeighborGraph (padded to the static edge_capacity baked in metadata), feeds the positional schema (atype, n_node, edge_index, edge_vec, edge_mask, fparam, aparam, charge_spin) to the AOTI forward, and reshapes the LOCAL public outputs by category (no communicate_extended_output, since the graph path is ghost-free). The dense (nlist) path is untouched. Fix dynamic-shape generalization of the graph .pt2 export: under symbolic make_fx/torch.export, int() on a SymInt SPECIALIZES the axis, baking the trace-time sample size (N=14, nf=2) into the autograd-derived force/energy/ virial outputs. Drop int() on node_capacity (edge_force_virial) and on n_node.shape[0] (fit_output_to_model_output_graph), and derive nf-1 in frame_id_from_n_node as a runtime 0-d tensor instead of xp.asarray(shape-1). The exported graph now generalizes across arbitrary N and nf. Test: dpa1(attn_layer=0) energy model exported with lower_kind="graph", evaluated through the pt_expt DeepPot and compared to the eager dense (sel-capped, neighbor_graph_method="legacy") reference at rtol=atol=1e-10 for pbc and nopbc. The fixture is a sparse 8-atom cluster and asserts non-binding sel (max neighbors < 30) so the carry-all/sel-capped neighbor sets coincide (anti-vacuity).
…); drop force_legacy_descriptor Retarget _CompiledModel to compile forward_common_lower_graph for graph-eligible descriptors (dpa1 attn_layer==0), gated by the same mixed_types()+uses_graph_lower() predicate the eager default-flip uses; se_e2_a/dpa2/dpa3 keep compiling the dense forward_lower. _trace_and_compile_graph builds a synthetic NeighborGraph with prime-distinct nf/N/E axes (no make_fx duck-shape merge) and edge_vec as the autograd leaf; _forward_graph builds the carry-all graph eagerly and unravels flat (N,*) node outputs to (nf,nloc,*). cpp.simdlen=0 for the graph compile avoids an inductor CPU scatter-vectorizer crash on the per-frame virial atomic_add. Also fixes an eager autograd bug in dpa1 call_graph: xp.asarray(type_embedding, device=dev) DETACHES under torch, so the type-embedding weights never trained in the graph path (grad None despite a real finite-diff dependency). make_fx traced through it, so compiled != eager and the optimizer diverged after step 0. Use type_embedding directly (mirrors the dense path); the tebd net now trains and eager==compiled to 1e-10 across the varying-natoms trajectory. Drops the force_legacy_descriptor workaround + uses_graph_lower monkeypatch.
… + legacy-gate assumption in the graph compile path
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughAdds a graph-native NeighborGraph lowering path across export, inference, training compilation, C++ runtime, and tests. The PR also updates dpmodel and neighbor-graph helpers to preserve gradients and symbolic shapes during graph tracing. ChangesGraph-form lower-forward export and runtime
Sequence Diagram(s)sequenceDiagram
participant deserialize_to_file
participant forward_lower_graph_exportable
participant _eval_model
participant _eval_model_graph
participant exported_module
deserialize_to_file->>forward_lower_graph_exportable: trace graph lower
deserialize_to_file->>deserialize_to_file: record lower_input_kind
_eval_model->>_eval_model: branch on lower_input_kind
_eval_model->>_eval_model_graph: graph archive path
_eval_model_graph->>exported_module: forward(atype, n_node, edge_index, edge_vec, edge_mask, ...)
exported_module-->>_eval_model_graph: graph outputs
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| DeepPot, | ||
| ) | ||
| from deepmd.pt_expt.utils.env import ( | ||
| DEVICE, |
| if task_buf_order: | ||
| try: | ||
| _fitting = model.get_fitting_net() | ||
| except AttributeError: |
| pass | ||
| try: | ||
| _atomic_model = model.atomic_model | ||
| except AttributeError: |
| _dim_fp = model.get_dim_fparam() | ||
| if _dim_fp > 1: | ||
| _forbidden.add(_dim_fp) | ||
| except Exception: |
| _dim_ap = model.get_dim_aparam() | ||
| if _dim_ap > 1: | ||
| _forbidden.add(_dim_ap) | ||
| except Exception: |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
deepmd/pt_expt/utils/serialization.py (1)
750-789: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject unknown
lower_kindvalues instead of falling back to nlist.A typo such as
lower_kind="grpah"currently reaches the dense path and recordslower_input_kind="nlist", producing the wrong artifact without any signal.Proposed fix
def deserialize_to_file( @@ """ + if lower_kind not in {"nlist", "graph"}: + raise ValueError( + f"Unknown lower_kind {lower_kind!r}; expected 'nlist' or 'graph'." + ) if model_file.endswith(".pt2"):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/pt_expt/utils/serialization.py` around lines 750 - 789, The deserialize entrypoint in serialization.py currently accepts any lower_kind value and silently falls back to the nlist path, which can produce the wrong artifact for typos. Update deserialize_to_file and the lower-path selection logic in _deserialize_to_file_pt2 / _deserialize_to_file_pte to validate lower_kind explicitly against the supported values ("nlist" and "graph") and raise a clear error for anything else instead of defaulting to nlist, so the recorded lower_input_kind always matches the requested schema.source/tests/pt_expt/test_training.py (1)
1355-1374: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winAssert that the DPA1 no-attn case really takes the graph lower.
These changes now document graph-lower routing, but the test still only proves eager/compiled parity. If
_CompiledModelsilently falls back to the dense lower and the sampled systems stay non-binding, this case still passes, so the dispatch change introduced by this PR is untested. Please add an explicit assertion for the DPA1 no-attn branch that the compiled model is graph-eligible / using the graph lower, while keeping the dense expectation for the other descriptors.Also applies to: 1391-1392, 1466-1479
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/tests/pt_expt/test_training.py` around lines 1355 - 1374, The current test only checks eager/compiled parity in _check_varying_natoms, so the DPA1 no-attn routing to the graph lower is not explicitly verified. Add an assertion in the DPA1 branch that the compiled model is graph-eligible and actually uses forward_common_lower_graph, while preserving the dense forward_lower expectation for the non-eligible descriptors. Extend the same explicit dispatch check in the related test sections that cover the same descriptor routing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Around line 968-970: The graph-compiled forward path is dropping the
do_atomic_virial request, so atom_virial cannot be returned from compiled
models. Update forward and _forward_graph in training.py to thread
do_atomic_virial through the graph-eligible dispatch, and make
_trace_and_compile_graph cache/trace separately based on that flag so the
compiled lower path matches forward_common_lower_graph behavior.
- Around line 1283-1293: The reshaping logic in the result loop currently relies
on the `N != nframes` heuristic, which fails for `nloc == 1` and leaves
node-level outputs flat. Update the `training.py` postprocessing in the
`result.items()` loop to use explicit key categories from the graph/lower-level
public keys instead of shape inference, and reshape known node-level outputs
like `atom_energy` and `force` to `(nf, 1, *)` even when `N == nframes`.
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 447-448: The dynamic-shape spec in _build_graph_dynamic_shapes
still only marks the frame axis for aparam, so its atom axis remains fixed to
the sample nloc. Update the aparam entry in the shape mapping to include both
the frame axis and the atom axis as dynamic, matching the intended (nf, nloc,
nda) behavior. Keep the change localized to the graph shape spec alongside the
existing fparam and aparam handling so exported graphs no longer specialize to a
single atom count.
---
Outside diff comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 750-789: The deserialize entrypoint in serialization.py currently
accepts any lower_kind value and silently falls back to the nlist path, which
can produce the wrong artifact for typos. Update deserialize_to_file and the
lower-path selection logic in _deserialize_to_file_pt2 /
_deserialize_to_file_pte to validate lower_kind explicitly against the supported
values ("nlist" and "graph") and raise a clear error for anything else instead
of defaulting to nlist, so the recorded lower_input_kind always matches the
requested schema.
In `@source/tests/pt_expt/test_training.py`:
- Around line 1355-1374: The current test only checks eager/compiled parity in
_check_varying_natoms, so the DPA1 no-attn routing to the graph lower is not
explicitly verified. Add an assertion in the DPA1 branch that the compiled model
is graph-eligible and actually uses forward_common_lower_graph, while preserving
the dense forward_lower expectation for the non-eligible descriptors. Extend the
same explicit dispatch check in the related test sections that cover the same
descriptor routing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 58e2933c-1a05-4d30-bdfa-0cec147b8f6a
📒 Files selected for processing (14)
deepmd/dpmodel/descriptor/dpa1.pydeepmd/dpmodel/utils/neighbor_graph/derivatives.pydeepmd/dpmodel/utils/neighbor_graph/graph.pydeepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/model/edge_transform_output.pydeepmd/pt_expt/model/ener_model.pydeepmd/pt_expt/model/make_model.pydeepmd/pt_expt/train/training.pydeepmd/pt_expt/utils/serialization.pysource/tests/common/dpmodel/test_graph_static_capacity.pysource/tests/pt_expt/infer/test_graph_deepeval.pysource/tests/pt_expt/model/test_graph_export.pysource/tests/pt_expt/test_training.pysource/tests/pt_expt/utils/test_graph_pt2_metadata.py
| return self._forward_graph( | ||
| coord, atype, box, fparam, aparam, charge_spin, nframes, nloc, rcut | ||
| ) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Preserve do_atomic_virial in the graph compiled path.
forward(..., do_atomic_virial=True) is dropped when graph-eligible models dispatch to _forward_graph, and _trace_and_compile_graph() always traces do_atomic_virial=False. That makes the compiled graph path unable to return atom_virial even though forward_common_lower_graph supports it.
Consider threading this flag through _forward_graph() and compiling/cache-keying the graph lower by the requested atomic-virial mode.
Also applies to: 743-759, 1223-1230
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/pt_expt/train/training.py` around lines 968 - 970, The graph-compiled
forward path is dropping the do_atomic_virial request, so atom_virial cannot be
returned from compiled models. Update forward and _forward_graph in training.py
to thread do_atomic_virial through the graph-eligible dispatch, and make
_trace_and_compile_graph cache/trace separately based on that flag so the
compiled lower path matches forward_common_lower_graph behavior.
| for key, val in result.items(): | ||
| # ``N != nframes`` distinguishes node-level keys (lead dim N) from | ||
| # frame-level keys (lead dim nf) by shape. DEGENERATE: when nloc==1, | ||
| # N == nframes, so node-level keys are NOT unravelled and stay | ||
| # (nf, *) instead of (nf, 1, *). Harmless for the varying-natoms | ||
| # trainer (nloc >> 1); a single-atom-per-frame system would need an | ||
| # explicit per-key category check instead of the shape heuristic. | ||
| if val is not None and val.shape[:1] == torch.Size([N]) and N != nframes: | ||
| out[key] = val.reshape(nframes, nloc, *val.shape[1:]) | ||
| else: | ||
| out[key] = val |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use key categories instead of the N != nframes heuristic.
For nloc == 1, node-level outputs like atom_energy and force remain flat (nf, *) instead of (nf, 1, *), breaking the forward output contract for valid single-atom systems. The graph lower already emits public keys, so reshape known node-level keys explicitly.
Proposed fix
+ node_level_keys = {"atom_energy", "force", "atom_virial", "mask"}
for key, val in result.items():
- # ``N != nframes`` distinguishes node-level keys (lead dim N) from
- # frame-level keys (lead dim nf) by shape. DEGENERATE: when nloc==1,
- # N == nframes, so node-level keys are NOT unravelled and stay
- # (nf, *) instead of (nf, 1, *). Harmless for the varying-natoms
- # trainer (nloc >> 1); a single-atom-per-frame system would need an
- # explicit per-key category check instead of the shape heuristic.
- if val is not None and val.shape[:1] == torch.Size([N]) and N != nframes:
+ if (
+ key in node_level_keys
+ and val is not None
+ and val.shape[:1] == torch.Size([N])
+ ):
out[key] = val.reshape(nframes, nloc, *val.shape[1:])
else:
out[key] = val📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for key, val in result.items(): | |
| # ``N != nframes`` distinguishes node-level keys (lead dim N) from | |
| # frame-level keys (lead dim nf) by shape. DEGENERATE: when nloc==1, | |
| # N == nframes, so node-level keys are NOT unravelled and stay | |
| # (nf, *) instead of (nf, 1, *). Harmless for the varying-natoms | |
| # trainer (nloc >> 1); a single-atom-per-frame system would need an | |
| # explicit per-key category check instead of the shape heuristic. | |
| if val is not None and val.shape[:1] == torch.Size([N]) and N != nframes: | |
| out[key] = val.reshape(nframes, nloc, *val.shape[1:]) | |
| else: | |
| out[key] = val | |
| node_level_keys = {"atom_energy", "force", "atom_virial", "mask"} | |
| for key, val in result.items(): | |
| if ( | |
| key in node_level_keys | |
| and val is not None | |
| and val.shape[:1] == torch.Size([N]) | |
| ): | |
| out[key] = val.reshape(nframes, nloc, *val.shape[1:]) | |
| else: | |
| out[key] = val |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/pt_expt/train/training.py` around lines 1283 - 1293, The reshaping
logic in the result loop currently relies on the `N != nframes` heuristic, which
fails for `nloc == 1` and leaves node-level outputs flat. Update the
`training.py` postprocessing in the `result.items()` loop to use explicit key
categories from the graph/lower-level public keys instead of shape inference,
and reshape known node-level outputs like `atom_energy` and `force` to `(nf, 1,
*)` even when `N == nframes`.
| {0: nframes_dim} if fparam is not None else None, # fparam: (nf, ndf) | ||
| {0: nframes_dim} if aparam is not None else None, # aparam: (nf, nloc, nda) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
set -euo pipefail
# Locate the target file and nearby dynamic-shape/export logic.
git ls-files deepmd/pt_expt/utils/serialization.py
ast-grep outline deepmd/pt_expt/utils/serialization.py --view expanded
# Inspect the relevant region around the cited lines.
sed -n '400,480p' deepmd/pt_expt/utils/serialization.py
# Find graph-export and dynamic-shape helpers.
rg -n "_build_graph_dynamic_shapes|deserialize_to_file|lower_kind|aparam|nloc_dim|nframes_dim" deepmd/pt_expt/utils/serialization.py deepmd/pt_expt -SRepository: deepmodeling/deepmd-kit
Length of output: 38920
🏁 Script executed:
set -euo pipefail
# Inspect tests/examples that cover graph export or aparam-specific behavior.
rg -n "dim_aparam|aparam|torch.export|dynamic_shapes|graph export|deserialize_to_file" deepmd -S --glob '!**/dist/**' --glob '!**/build/**'Repository: deepmodeling/deepmd-kit
Length of output: 50380
🏁 Script executed:
set -euo pipefail
# Read the graph-export branch and the graph sample-input builder with line numbers.
sed -n '313,420p' deepmd/pt_expt/utils/serialization.py
printf '\n---\n'
sed -n '850,960p' deepmd/pt_expt/utils/serialization.pyRepository: deepmodeling/deepmd-kit
Length of output: 7322
🏁 Script executed:
set -euo pipefail
# Pull just the aparam-related graph-export docs/comments from the inference side for contract context.
sed -n '1230,1260p' deepmd/pt_expt/infer/deep_eval.pyRepository: deepmodeling/deepmd-kit
Length of output: 1284
Make graph aparam’s atom axis dynamic
_build_graph_dynamic_shapes still leaves aparam.shape[1] static, so graph exports specialize it to the sample nloc=7. That breaks dim_aparam > 0 exports when runtime atom counts differ. Update the graph dynamic-shape spec for aparam to include the atom axis.
Proposed fix
nframes_dim = torch.export.Dim("nframes", min=1)
n_node_total_dim = torch.export.Dim("n_node_total", min=1)
+ nloc_dim = torch.export.Dim("nloc", min=1)
return (
{0: n_node_total_dim}, # atype: (N,)
{0: nframes_dim}, # n_node: (nf,)
@@
- {0: nframes_dim} if aparam is not None else None, # aparam: (nf, nloc, nda)
+ {0: nframes_dim, 1: nloc_dim} if aparam is not None else None, # aparam📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {0: nframes_dim} if fparam is not None else None, # fparam: (nf, ndf) | |
| {0: nframes_dim} if aparam is not None else None, # aparam: (nf, nloc, nda) | |
| nloc_dim = torch.export.Dim("nloc", min=1) | |
| {0: nframes_dim} if fparam is not None else None, # fparam: (nf, ndf) | |
| {0: nframes_dim, 1: nloc_dim} if aparam is not None else None, # aparam |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepmd/pt_expt/utils/serialization.py` around lines 447 - 448, The
dynamic-shape spec in _build_graph_dynamic_shapes still only marks the frame
axis for aparam, so its atom axis remains fixed to the sample nloc. Update the
aparam entry in the shape mapping to include both the frame axis and the atom
axis as dynamic, matching the intended (nf, nloc, nda) behavior. Keep the change
localized to the graph shape spec alongside the existing fparam and aparam
handling so exported graphs no longer specialize to a single atom count.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5604 +/- ##
==========================================
- Coverage 82.37% 73.34% -9.04%
==========================================
Files 902 880 -22
Lines 101527 96521 -5006
Branches 4056 3390 -666
==========================================
- Hits 83630 70790 -12840
- Misses 16432 24938 +8506
+ Partials 1465 793 -672 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
… static edge_capacity
The graph-form .pt2 export now marks the edge axis dynamic
(Dim("nedge", min=2)) instead of baking a static E_max=ceil(1.25*nloc*nnei)
capacity. The AOTI artifact accepts any system size with no capacity ceiling.
- _build_graph_dynamic_shapes: edge_index dim1 / edge_vec dim0 / edge_mask dim0
are now dynamic; mirrors the dense Dim("nnei", min=...) precedent.
- _trace_and_export graph branch: drop the metadata["edge_capacity"] write;
apply _strip_shape_assertions to neutralise the SIGFPE-prone deferred
shape guards on the dynamic E axis (same handling the spin dense path uses).
- deep_eval._eval_model_graph: build the carry-all graph at its tight edge
count (no edge_capacity padding).
- test_graph_deepeval: eval TWO different-size systems (8- and 20-atom,
56 vs 380 real edges) through the SAME exported artifact; both match eager
dense dpa1 at 1e-10 pbc+nopbc, both non-binding. The 20-atom system (380 > 263)
would have overflowed the B1 static artifact -> the RED.
- test_graph_pt2_metadata: graph metadata no longer carries edge_capacity.
…e graph (dynamic-E) caller + edge-axis safety
…) generator Section B of gen_dpa1.py produces deeppot_dpa1_graph.pt2 (lower_kind="graph", do_atomic_virial=True) and the accompanying deeppot_dpa1_graph.expected sidecar. Reference values come from an independent nlist .pt2 eval (NOT the graph .pt2) so the C++ gtest (B2.5) validates the graph AOTI path against a known-good reference. Sanity check: graph .pt2 vs nlist ref force diff 1.3e-18 (machine precision; sel=30 >> actual neighbors, so both paths see identical neighbor sets). Forces non-degenerate: max |F| ~5.3e-4 (PBC). Config: type_map=[O,H], sel=30, rcut=6.0, attn_layer=0, neuron=[2,4,8], axis_neuron=4, fitting neuron=[5,5,5], resnet_dt=True, seed=1 — mirrors DPA1_CONFIG in source/tests/pt_expt/utils/test_graph_pt2_metadata.py.
B2.2: read lower_input_kind="graph" -> lower_input_is_graph_. B2.3: run_model_graph with NeighborGraph AOTI input order (atype, n_node, edge_index, edge_vec, edge_mask, [fparam], [aparam], [charge_spin]); no coord / edge_scatter_index. B2.4: GraphTensorPack + buildGraphTensors (delegates to createEdgeTensors+compactEdgeTensors for the rcut filter, dynamic edge count and 2 masked dummy edges; drops edge_index_ext, adds n_node=[nloc], node types from atype_ext[0:nloc]) + compute_inner & standalone dispatch branches. Multi-rank graph fails fast (PR-B3).
… path Local std::vector<int64_t> mapping was declared in compute_inner and populated only inside if(ago==0). The graph branch called buildGraphTensors with this local vector on every step, causing an OOB heap read on ago>0 (mapping.size()==0). Fix: promote mapping to a member mapping_ (parallel to mapping_tensor) so it persists across steps. Edge-path (createEdgeTensors) and dense-path (mapping_tensor) are unaffected in behavior; only the vector source changes from local to member.
…xtraction
Add test_deeppot_dpa1_graph_ptexpt.cc — the first runtime exercise of the
NeighborGraph .pt2 C++ ingestion (B2.2-4). Four cases x {double,float}:
build-nlist parity vs the committed .expected, a second 12-atom system through
the same model (dynamic edge axis), the LAMMPS InputNlist+ago=0/ago=1 path
(compute_inner + cached mapping_), and a tiny no-edge system (nedge_min=2 guard).
The first run surfaced a B2.2-4 bug: the graph forward emits LOCAL flat-N PUBLIC
keys (atom_energy/energy/force/virial/atom_virial) but compute()'s output
extraction read the dense INTERNAL keys (energy_redu/energy_derv_r/...), so
output_map["energy_redu"].view() threw on an undefined tensor. The graph branch
had only ever been compiled, never run.
Fix: remap_graph_outputs_to_dense_keys() in commonPT.h, called after
extract_outputs in both compute overloads (gated on lower_input_is_graph_).
Rewrites the public keys into the dense internal-key layout; per-atom
force/atom_virial are local (nloc) and zero-padded up to nall so the existing
fold-back is a no-op on ghost rows (local rows already carry the folded ghost
contributions). Dense/edge paths untouched.
gen_dpa1.py now persists the dense nlist-ref .pt2 (deeppot_dpa1_graph_nlist_ref.pt2)
as a live graph-vs-dense oracle for the dynamic-edge cases.
…ring (nlist .pt2, 1e-5)
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
source/api_cc/src/DeepPotPTExpt.cc (1)
508-539: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winRequire
lmp_list.mappingfor graph artifacts with ghosts.The graph branch uses
mapping_inbuildGraphTensorsto fold ghost neighbors onto local owners, but the current guard only requires an atom map forhas_message_passing_models. A DPA1 graph artifact can therefore run PBC LAMMPS withoutatom_modify map yes, fall back to identity mapping, and feed invalid/wrong folded graph indices.Proposed fix
if (lower_input_is_graph_ && multi_rank) { throw deepmd::deepmd_exception( "Multi-rank graph (NeighborGraph) .pt2 inference is not yet " "supported (PR-B3). Run single-rank, or use a dense/edge .pt2 for " "multi-rank LAMMPS."); } + if (lower_input_is_graph_ && nghost > 0 && !atom_map_present) { + throw deepmd::deepmd_exception( + "Graph (NeighborGraph) .pt2 inference with ghost atoms requires " + "`atom_modify map yes` so ghost neighbors can be folded onto their " + "local owners. Re-run with an atom map enabled, or use a dense .pt2."); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 508 - 539, The graph-artifact path in DeepPotPTExpt::buildGraphTensors is using mapping_ to fold ghost neighbors, but the current validation only enforces atom-map presence for has_message_passing_ models. Add a guard for lower_input_is_graph_ with ghosts so graph artifacts also require lmp_list.mapping (or atom_modify map yes) before compute(), and fail fast when it is missing instead of falling back to identity mapping.deepmd/pt_expt/utils/serialization.py (1)
793-799: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject unknown
lower_kindvalues before dispatch.A typo currently falls through to the dense nlist export while recording nlist metadata, producing a valid-looking artifact with the wrong ABI.
Proposed fix
""" + if lower_kind not in {"nlist", "graph"}: + raise ValueError( + f"Unsupported lower_kind={lower_kind!r}; expected 'nlist' or 'graph'." + ) if model_file.endswith(".pt2"): _deserialize_to_file_pt2(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/pt_expt/utils/serialization.py` around lines 793 - 799, The dispatch in serialization.py currently lets unknown lower_kind values fall through to the dense nlist export path, which can produce a misleading artifact with incorrect ABI metadata. Add validation before the model_file.endswith(".pt2") branch so only the expected lower_kind values are accepted, and raise an error for anything else. Keep the check close to the dispatch logic in the deserialization/export flow that calls _deserialize_to_file_pt2 and _deserialize_to_file_pte.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 793-799: The dispatch in serialization.py currently lets unknown
lower_kind values fall through to the dense nlist export path, which can produce
a misleading artifact with incorrect ABI metadata. Add validation before the
model_file.endswith(".pt2") branch so only the expected lower_kind values are
accepted, and raise an error for anything else. Keep the check close to the
dispatch logic in the deserialization/export flow that calls
_deserialize_to_file_pt2 and _deserialize_to_file_pte.
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 508-539: The graph-artifact path in
DeepPotPTExpt::buildGraphTensors is using mapping_ to fold ghost neighbors, but
the current validation only enforces atom-map presence for has_message_passing_
models. Add a guard for lower_input_is_graph_ with ghosts so graph artifacts
also require lmp_list.mapping (or atom_modify map yes) before compute(), and
fail fast when it is missing instead of falling back to identity mapping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c4b21156-2b33-4392-b6b0-4819a492ea77
📒 Files selected for processing (9)
deepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/utils/serialization.pysource/api_cc/include/DeepPotPTExpt.hsource/api_cc/include/commonPT.hsource/api_cc/src/DeepPotPTExpt.ccsource/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.ccsource/tests/infer/gen_dpa1.pysource/tests/pt_expt/infer/test_graph_deepeval.pysource/tests/pt_expt/utils/test_graph_pt2_metadata.py
🚧 Files skipped from review as they are similar to previous changes (2)
- deepmd/pt_expt/infer/deep_eval.py
- source/tests/pt_expt/infer/test_graph_deepeval.py
…st (B2.5 follow-ups)
…-comm; no with-comm)
B3: non-MP multi-rank C++ inference (single-rank ≡ multi-rank)Adds the multi-rank graph inference path for dpa1 (commits Key design (dpa1 is non-message-passing): multi-rank reuses the same single-rank graph Validation (local, multi-CPU — no GPU needed for correctness): Known limits: pt_expt-only; dpa1 (non-MP) only — DPA2/3 multi-rank (forward halo / message passing) is a follow-on; CUDA multi-rank unvalidated locally. |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
source/api_cc/src/DeepPotPTExpt.cc (1)
568-585: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRequire a real atom map before folded single-rank graph inference.
buildGraphTensors(..., fold_to_local=true)still needsmapping_to collapse ghost neighbors onto owner indices. Falling back tomapping_[ii] = iihere means a single-rank graph run with ghosts but withoutInputNlist.mappingcan feed ghost indices into a graph whosen_nodeis onlynloc, so this becomes a silent wrong-answer path instead of a fail-fast.Suggested fix
} else { + if (lower_input_is_graph_ && !multi_rank && nghost > 0) { + throw deepmd::deepmd_exception( + "Single-rank NeighborGraph .pt2 inference with ghosts requires " + "`atom_modify map yes` in the LAMMPS input (so " + "InputNlist.mapping is populated)."); + } // Identity fallback. The fail-fast above guarantees we only // reach this branch when one of these is true: // - The model is non-message-passing (mapping is unused).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 568 - 585, The fallback identity mapping in DeepPotPTExpt::buildGraphTensors is unsafe for folded single-rank graph inference because it can let ghost indices slip into a graph built with n_node based on nloc. Remove this silent fallback for the fold_to_local path and make the code fail fast unless a real InputNlist.mapping is present; keep the identity-only behavior restricted to cases where mapping is truly unused, such as non-message-passing or zero-ghost paths. Use the existing symbols buildGraphTensors, mapping_, and mapping_tensor to locate the branch and enforce the check before constructing the tensor.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/lmp/tests/test_lammps_dpa1_graph_pt2.py`:
- Around line 67-74: The fallback in the test setup is turning missing graph
artifacts into None, but the module still continues into comparisons and
indexing in the test helpers, causing TypeError instead of skipping. Update
setup_module() in test_lammps_dpa1_graph_pt2.py to check that both generated
artifacts exist before writing fixtures, and if either is absent, skip the
module/tests early; ensure the same guard covers the later pbc-related setup
used by the test functions that read expected_e, expected_f, and expected_v.
- Around line 293-301: The empty-subdomain test is not actually enforcing the
documented `processors 2 1 1` decomposition, so `_run_mpi_subprocess()` may pick
a different 2-rank layout and miss the zero-atom rank case. Update
`test_pair_deepmd_mpi_dpa1_graph_empty_subdomain` to pass the processor grid
through to `_run_mpi_subprocess()` for the MPI run, using the same explicit
`processors 2 1 1` setup described in the docstring so the intended empty-rank
branch is always exercised.
---
Outside diff comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 568-585: The fallback identity mapping in
DeepPotPTExpt::buildGraphTensors is unsafe for folded single-rank graph
inference because it can let ghost indices slip into a graph built with n_node
based on nloc. Remove this silent fallback for the fold_to_local path and make
the code fail fast unless a real InputNlist.mapping is present; keep the
identity-only behavior restricted to cases where mapping is truly unused, such
as non-message-passing or zero-ghost paths. Use the existing symbols
buildGraphTensors, mapping_, and mapping_tensor to locate the branch and enforce
the check before constructing the tensor.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3f898962-8ea6-4852-9d6d-458ae52eb1ba
📒 Files selected for processing (4)
source/api_cc/include/commonPT.hsource/api_cc/src/DeepPotPTExpt.ccsource/api_cc/tests/test_deeppot_dpa1_graph_ptexpt.ccsource/lmp/tests/test_lammps_dpa1_graph_pt2.py
| try: | ||
| _ref = read_expected_ref(ref_file)["pbc"] | ||
| expected_e = float(np.sum(_ref["expected_e"])) | ||
| expected_f = _ref["expected_f"].reshape(6, 3) | ||
| # LAMMPS uses the opposite sign convention for virial vs DeepPot. | ||
| expected_v = -_ref["expected_v"].reshape(6, 9) | ||
| except FileNotFoundError: | ||
| expected_e = expected_f = expected_v = None |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Skip the module when the generated graph artifacts are absent.
Lines 67-74 convert a missing .expected file into None, but the tests still index expected_f / expected_v and compare against expected_e, so this fails with a TypeError instead of skipping cleanly. setup_module() should gate on both generated artifacts before writing fixtures.
Suggested fix
def setup_module() -> None:
if os.environ.get("ENABLE_PYTORCH", "1") != "1":
pytest.skip(
"Skip test because PyTorch support is not enabled.",
)
+ if not pb_file.exists() or not ref_file.exists():
+ pytest.skip(
+ "Skip test because generated dpa1 graph .pt2 artifacts are unavailable.",
+ )
write_lmp_data(box, coord, type_OH, data_file)Also applies to: 92-100
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@source/lmp/tests/test_lammps_dpa1_graph_pt2.py` around lines 67 - 74, The
fallback in the test setup is turning missing graph artifacts into None, but the
module still continues into comparisons and indexing in the test helpers,
causing TypeError instead of skipping. Update setup_module() in
test_lammps_dpa1_graph_pt2.py to check that both generated artifacts exist
before writing fixtures, and if either is absent, skip the module/tests early;
ensure the same guard covers the later pbc-related setup used by the test
functions that read expected_e, expected_f, and expected_v.
| def test_pair_deepmd_mpi_dpa1_graph_empty_subdomain() -> None: | ||
| """Multi-rank with one rank owning zero local atoms (elongated box, | ||
| ``processors 2 1 1``, split at x = 15). The extended-region graph path | ||
| must still produce correct forces/virial on the populated rank and a | ||
| zero contribution from the empty rank — compared against a same-archive | ||
| single-rank reference of the same fixture. | ||
| """ | ||
| out_mpi = _run_mpi_subprocess(nprocs=2, data_path=data_file_empty_subdomain) | ||
| out_ref = _run_mpi_subprocess(nprocs=1, data_path=data_file_empty_subdomain) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Force the documented processor grid in the empty-subdomain test.
This case is supposed to exercise processors 2 1 1, but the test never passes that through to _run_mpi_subprocess(). If LAMMPS chooses a different 2-rank decomposition, rank 1 may not be empty and this stops covering the branch the docstring describes.
Suggested fix
- out_mpi = _run_mpi_subprocess(nprocs=2, data_path=data_file_empty_subdomain)
+ out_mpi = _run_mpi_subprocess(
+ nprocs=2,
+ data_path=data_file_empty_subdomain,
+ processors="2 1 1",
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_pair_deepmd_mpi_dpa1_graph_empty_subdomain() -> None: | |
| """Multi-rank with one rank owning zero local atoms (elongated box, | |
| ``processors 2 1 1``, split at x = 15). The extended-region graph path | |
| must still produce correct forces/virial on the populated rank and a | |
| zero contribution from the empty rank — compared against a same-archive | |
| single-rank reference of the same fixture. | |
| """ | |
| out_mpi = _run_mpi_subprocess(nprocs=2, data_path=data_file_empty_subdomain) | |
| out_ref = _run_mpi_subprocess(nprocs=1, data_path=data_file_empty_subdomain) | |
| def test_pair_deepmd_mpi_dpa1_graph_empty_subdomain() -> None: | |
| """Multi-rank with one rank owning zero local atoms (elongated box, | |
| ``processors 2 1 1``, split at x = 15). The extended-region graph path | |
| must still produce correct forces/virial on the populated rank and a | |
| zero contribution from the empty rank — compared against a same-archive | |
| single-rank reference of the same fixture. | |
| """ | |
| out_mpi = _run_mpi_subprocess( | |
| nprocs=2, | |
| data_path=data_file_empty_subdomain, | |
| processors="2 1 1", | |
| ) | |
| out_ref = _run_mpi_subprocess(nprocs=1, data_path=data_file_empty_subdomain) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@source/lmp/tests/test_lammps_dpa1_graph_pt2.py` around lines 293 - 301, The
empty-subdomain test is not actually enforcing the documented `processors 2 1 1`
decomposition, so `_run_mpi_subprocess()` may pick a different 2-rank layout and
miss the zero-atom rank case. Update
`test_pair_deepmd_mpi_dpa1_graph_empty_subdomain` to pass the processor grid
through to `_run_mpi_subprocess()` for the MPI run, using the same explicit
`processors 2 1 1` setup described in the docstring so the intended empty-rank
branch is always exercised.
NeighborGraph PR-B — graph
.pt2export, compiled training, and C++ inference (single & multi-rank)This PR spans the full PR-B: B1 (Python: graph
.pt2export + compiled training on the graph lower), B2 (C++ single-rank inference of the graph.pt2, dynamic edge axis), and B3 (C++/LAMMPS multi-rank). Built on the merged PR-A (#5583). Scope: dpa1,attn_layer=0, pt_expt.B1 — graph
.pt2export + compiled training (Python)forward_common_lower_graph_exportabletrace target;serialization.pygraph export branch (lower_kind="graph",lower_input_kindmetadata);_eval_model_graphDeepEval dispatch (parity vs eager dpa1 1e-10 pbc+nopbc).force_legacy_descriptordeleted. Root cause was a real dpa1call_graphautograd detach bug (xp.asarray(tebd, device=)drops the tebd-net gradient under torch); fixed.B2 — C++ graph ingestion (dynamic edge axis, single-rank)
.pt2uses a dynamic edge axis (Dim("nedge", min=2)) — one artifact evals any system size (proven across 56- and 380-edge systems at 1e-10), no C++ capacity ceiling.DeepPotPTExpt:lower_input_is_graph_+run_model_graph(NeighborGraph ABI:atype, n_node, edge_index, edge_vec, edge_mask, …) +buildGraphTensors(mirrors the refactor(pt/dpa4): use edge schema for dpa4 and ignore sel #5562 edge path; node types fromatype_ext);remap_graph_outputs_to_dense_keys(single-rank).ago>0, tiny system, atomic-overload). The review process caught two bugs that would otherwise have shipped: anago>0heap-OOB (by inspection) and a public-vs-internal output-key mismatch (at runtime).B3 — multi-rank C++ / LAMMPS (non-MP)
border_op/with-comm artifact (that is a message-passing concern, deferred to PR-G). Multi-rank reuses the same single-rank graph.pt2, fed an extended-region graph (buildGraphTensors(fold_to_local=false),N=nall, ghost node types fromatype_extincl. halo), with owned energy =sum(atom_energy[0:nloc])and the extended force folded to owners through the existing denseselect_mapreverse-comm. The fail-fast forgraph && multi_rank && has_message_passingis retained.test_lammps_dpa1_graph_pt2.py— single-rank vs reference,mpirun -n 2≡ single-rank (energy + per-atom force + virial, atol 1e-8), plus an empty-subdomain (nloc=0) corner. Single-rank gtests stay 10/10 (multi-rank is purely additive). Multi-rank matched single-rank on the first run.Tests / known limitations
nonzero, eager-only), PR-D attention, PR-E angles, PR-F jax graph force, PR-G dpa2/3 message-passing (forward halo + with-comm). CUDA multi-rank unvalidated locally. Carried code-cleanup follow-ups: a ~60-line DRY duplication intraining.py; the multi-rank atomic output branch has no direct gtest (covered indirectly by the mpirun per-atom-virial assertion, since a single-process gtest can't setnprocs>1).Summary by CodeRabbit
New Features
Bug Fixes
Tests