feat(tf2): add eager TensorFlow array backend#5598
Conversation
for more information, see https://pre-commit.ci
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a vendored TensorFlow Array API, a TF2 eager backend with conversion and model wrappers, SavedModel export/load paths, JAX compatibility shims, and TF2 consistency test coverage. Changesndtensorflow Array API vendor package
TF2 backend, model stack, shims, and tests
Sequence Diagram(s)sequenceDiagram
participant Client
participant deserialize_to_file
participant SavedModel as tf.Module SavedModel
participant DeepEval
participant TF2SavedModelWrapper
Client->>deserialize_to_file: model dict
deserialize_to_file->>SavedModel: build tf.function entry points + metadata getters
deserialize_to_file->>SavedModel: tf.saved_model.save
Client->>DeepEval: load .savedmodel
DeepEval->>TF2SavedModelWrapper: load SavedModel and read metadata
Client->>DeepEval: eval(coords, atype, cells)
DeepEval->>TF2SavedModelWrapper: __call__(coord, atype, box, fparam, aparam)
TF2SavedModelWrapper-->>DeepEval: dict[str, np.ndarray]
DeepEval-->>Client: dict[str, np.ndarray]
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (22)
deepmd/_vendors/ndtensorflow/fft.py-360-375 (1)
360-375: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winSort
__all__for Ruff.The supplied Ruff result flags
RUF022here; sorting this list or applying the auto-fix avoids a CI failure. As per coding guidelines,**/*.py: Install linter and runruff check .before committing changes or the CI will fail.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/fft.py` around lines 360 - 375, The __all__ list in fft.py is not sorted, triggering Ruff rule RUF022 and a CI failure. Update the module-level __all__ definition by ordering the exported names alphabetically, keeping the symbols like fft, ifft, fftn, and fftshift in sorted order so Ruff’s auto-fix would pass cleanly.Sources: Coding guidelines, Linters/SAST tools
deepmd/_vendors/ndtensorflow/linalg.py-317-319 (1)
317-319: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRespect the requested matrix axes in
norm.
matrix_normalways reduces the last two dimensions, sonorm(x, axis=(0, 1))or any non-last axis pair currently computes over the wrong axes. Move the requested axes to the matrix positions before callingmatrix_norm, and preservekeepdimsin the original axis locations.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/linalg.py` around lines 317 - 319, The norm handling in linalg.norm ignores the requested matrix axes because matrix_norm always uses the last two dimensions. Update the tuple-axis branch in norm to first move the requested axis pair into the matrix positions before calling matrix_norm, then restore the reduced dimensions so keepdims is preserved in the original axis locations. Use the norm and matrix_norm symbols to make the axis handling work for any 2D axis pair, not just the last two axes.deepmd/_vendors/ndtensorflow/_namespace.py-37-37 (1)
37-37: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winResolve the remaining Ruff failures in this namespace.
The builtin-shadowing names are part of the Array API surface, so prefer targeted
per-file-ignores/noqaforA001/A002; theRUF021precedence andRUF022__all__findings can be auto-fixed or parenthesized/sorted directly. As per coding guidelines,**/*.py: Install linter and runruff check .before committing changes or the CI will fail.Also applies to: 698-698, 995-995, 1266-1266, 1300-1300, 1359-1360, 1385-1385, 1395-1395, 1405-1405, 1415-1415, 1454-1454, 1668-1682, 2075-2239
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/_namespace.py` at line 37, Resolve the remaining Ruff failures in the namespace module by handling the Array API builtin-shadowing names with targeted per-file ignores or inline noqa for A001/A002 in the namespace constants, and fix the RUF021 precedence and RUF022 __all__ issues directly by parenthesizing the affected expressions and sorting/auto-fixing the __all__ entries. Use the existing namespace symbols like bool, int, float, and __all__ to locate the remaining violations, then rerun ruff check . to confirm the file is clean.Sources: Coding guidelines, Linters/SAST tools
deepmd/_vendors/ndtensorflow/linalg.py-212-230 (1)
212-230: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winSupport batched vector RHS in
solve.For
x1shaped(..., n, n)andx2shaped(..., n), this falls into the matrix-RHS branch and broadcastsx2as if its last two dimensions were(m, k). Detectx2_.shape.rank == x1_.shape.rank - 1, expand a trailing singleton RHS dimension fortf.linalg.solve, then squeeze it back.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/linalg.py` around lines 212 - 230, The solve function currently mishandles batched vector right-hand sides by treating x2 shaped (..., n) as a matrix RHS. Update solve in linalg.py to detect when x2_.shape.rank is exactly x1_.shape.rank - 1, expand x2_ with a trailing singleton dimension before calling tf.linalg.solve, and squeeze that dimension from the result afterward. Keep the existing scalar/vector and matrix-RHS handling in solve consistent with _promote_two, _shape_tuple, and the final _wrap behavior.deepmd/_vendors/ndtensorflow/_namespace.py-1538-1540 (1)
1538-1540: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winBuild
include_initialshapes dynamically.
out.shape.as_list()can containNoneundertf.function, andtf.zeros([None, ...])will fail. Usetf.shape(out)and update the selected axis dynamically.Proposed fix
- shape = list(out.shape.as_list()) - shape[axis] = 1 - out = tf.concat([tf.zeros(shape, dtype=out.dtype), out], axis=axis) + shape = tf.tensor_scatter_nd_update(tf.shape(out), [[axis]], [1]) + out = tf.concat([tf.zeros(shape, dtype=out.dtype), out], axis=axis) @@ - shape = list(out.shape.as_list()) - shape[axis] = 1 - out = tf.concat([tf.ones(shape, dtype=out.dtype), out], axis=axis) + shape = tf.tensor_scatter_nd_update(tf.shape(out), [[axis]], [1]) + out = tf.concat([tf.ones(shape, dtype=out.dtype), out], axis=axis)Also applies to: 1564-1566
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/_namespace.py` around lines 1538 - 1540, The include_initial shape handling currently relies on out.shape.as_list(), which can break under tf.function when dimensions are None. Update the logic in the include_initial branch around the tf.concat call to build the shape dynamically from tf.shape(out), replacing only the selected axis at runtime before creating the zero tensor. Apply the same fix in both affected include_initial blocks so the concatenation works with dynamic shapes.deepmd/_vendors/ndtensorflow/_array.py-257-264 (1)
257-264: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy liftHandle prefix-shaped boolean assignment.
__getitem__accepts boolean masks over leading dimensions, but thistf.wherepath only works when the mask is scalar or fully broadcast-compatible with the whole tensor. A row mask like shape(n,)against(n, m)will fail instead of assigning selected rows.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/_array.py` around lines 257 - 264, `Array.__setitem__` currently handles boolean tensor keys with a plain `tf.where`, but that only works for fully broadcast-compatible masks and fails for prefix-shaped masks like a row selector over leading dimensions. Update the boolean-assignment branch in `__setitem__` (the `tf.Tensor`/`tf.bool` path) to detect when the mask matches only the leading dimensions and expand/broadcast it across the remaining dimensions before calling `tf.where`, so masked row assignments work correctly while preserving the existing scalar/full-shape behavior.deepmd/_vendors/ndtensorflow/linalg.py-165-165 (1)
165-165: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winClean up the Ruff violations.
ordis the public linalg API spelling, so use targetedA002ignores if keeping it; the tuple-concatenation and__all__findings can be auto-fixed. As per coding guidelines,**/*.py: Install linter and runruff check .before committing changes or the CI will fail.Also applies to: 194-194, 218-218, 242-242, 256-256, 313-313, 322-354
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/linalg.py` at line 165, Clean up the Ruff violations in the linalg module by keeping the public API parameter name ord but adding targeted A002 ignores where needed, rather than renaming the API. Auto-fix the tuple-concatenation and __all__ issues in the affected definitions, including the blocks around the linalg helpers and export list. Verify the changes in the affected signatures and module-level exports, then run ruff check . before committing to catch any remaining violations.Sources: Coding guidelines, Linters/SAST tools
deepmd/_vendors/ndtensorflow/_array.py-222-222 (1)
222-222: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winFix the Ruff violations before CI.
These supplied Ruff findings are straightforward: add
strict=False, use tuple unpacking, and parenthesize the mixedor/andcondition.Proposed fix
- key_dim in (x_dim, 0) for x_dim, key_dim in zip(self.shape, key.shape) + key_dim in (x_dim, 0) + for x_dim, key_dim in zip(self.shape, key.shape, strict=False) @@ - shape = (0,) + self.shape[key.shape.rank :] + shape = (0, *self.shape[key.shape.rank :]) @@ - isinstance(item, int) - or isinstance(item, tf.Tensor) - and _is_integer_index_tensor(item) + isinstance(item, int) + or (isinstance(item, tf.Tensor) and _is_integer_index_tensor(item))As per coding guidelines,
**/*.py: Install linter and runruff check .before committing changes or the CI will fail.Also applies to: 232-232, 389-390
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/_array.py` at line 222, The Ruff findings in the array logic need to be cleaned up in the affected expressions in _array.py, including the key_dim/x_dim comparison and the other reported spots. Add strict=False where the numpy/tensor conversion or shape-check call requires it, switch the relevant loops/comprehensions to tuple unpacking, and parenthesize the mixed or/and condition so the intent is explicit. Use the surrounding symbols in the array handling code to locate each occurrence and make the same style-safe update at all reported matches.Sources: Coding guidelines, Linters/SAST tools
deepmd/_vendors/ndtensorflow/__init__.py-20-25 (1)
20-25: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winSuppress or rewrite the namespace re-export for Ruff.
xpimports depend on this initializer exposing the full namespace, but the current star import and computed__all__tripF403/PLE0605. Add targeted suppressions or re-export via an explicit namespace module update. As per coding guidelines,**/*.py: Install linter and runruff check .before committing changes or the CI will fail.Minimal suppression option
-from ._namespace import * +from ._namespace import * # noqa: F403 - re-export Array API namespace @@ -__all__ = sorted( +__all__ = sorted( # noqa: PLE0605 - computed from vendored namespace __all__🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/__init__.py` around lines 20 - 25, The namespace re-export in the ndtensorflow initializer is triggering Ruff F403/PLE0605 because it uses a star import plus a computed __all__; update the module around the __init__ import and __all__ definition to either add the appropriate targeted lint suppressions or replace the pattern with an explicit namespace re-export approach via the namespace module, while keeping xp imports working. Make sure the fix preserves the full public surface exposed by _namespace_all and then run ruff check . before committing.Sources: Coding guidelines, Linters/SAST tools
deepmd/tf2/infer/deep_eval.py-314-323 (1)
314-323: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDerive
nframesafter normalizingcoords.
eval()accepts flat single-frame coordinates, but_eval_model()uses the raw first dimension asnframes. For shape(natoms * 3,), this tiles types/params tonatoms * 3frames while coordinates reshape to one frame.Proposed fix
- nframes = coords.shape[0] if len(atom_types.shape) == 1: natoms = len(atom_types) - atom_types = np.tile(atom_types, nframes).reshape(nframes, -1) else: natoms = len(atom_types[0]) + coords = np.reshape(np.array(coords), [-1, natoms * 3]) + nframes = coords.shape[0] + if len(atom_types.shape) == 1: + atom_types = np.tile(atom_types, nframes).reshape(nframes, -1) coord_input = coords.reshape([-1, natoms, 3])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/infer/deep_eval.py` around lines 314 - 323, Derive nframes after normalizing coords in _eval_model so flat single-frame inputs are handled correctly. The current logic in deep_eval.py uses coords.shape[0] before reshaping, which misinterprets a 1D coords array as many frames and over-tiles atom_types. Update the flow around nframes, atom_types, and coord_input so the frame count is based on the reshaped coordinate tensor (and stays consistent with cells), using the existing _eval_model and eval() entry points to locate the fix.deepmd/tf2/utils/serialization.py-32-35 (1)
32-35: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winAttach the deserialized model to the SavedModel root. The exported functions close over
model, buttf.saved_model.saveneeds it reachable fromtf_modelso its variables are serialized cleanly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/utils/serialization.py` around lines 32 - 35, The SavedModel export in the serialization flow is missing a direct reference from the root object to the deserialized model. In the function that builds tf_model after BaseModel.deserialize, attach model to tf_model before calling tf.saved_model.save so the variables remain reachable and serialize correctly; keep the existing model_def_script export logic unchanged.deepmd/tf2/common.py-149-161 (1)
149-161: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winMake lazy registration block concurrent readers until imports finish.
Line 152 returns immediately when another thread is still populating
_DPMODEL_TO_TF2. In that window,try_convert_module()can miss the intended converter and either leave the dpmodel object unwrapped or auto-wrap the wrongNativeOPpath. Use a lock/event so non-owner threads wait for registration completion.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/common.py` around lines 149 - 161, The lazy registration path in _ensure_registrations currently lets concurrent callers return while imports are still in progress, so try_convert_module can observe an incomplete _DPMODEL_TO_TF2 mapping. Update the registration flow in _ensure_registrations (and any caller like try_convert_module that depends on it) to use a synchronization primitive such as a lock or event so only one thread performs the imports and all other threads block until _REGISTRATIONS_READY is true. Keep the existing _REGISTRATIONS_IN_PROGRESS/_REGISTRATIONS_READY state, but make non-owner threads wait for completion instead of returning early.deepmd/tf2/region.py-96-104 (1)
96-104: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winUse an absolute cell volume for face distances.
tf.linalg.det(cell)is signed, but a face distance must be non-negative. A negative determinant makesto_face_distance()return negative values, which can undercount or eliminate periodic ghost images whenextend_coord_with_ghosts()computesceil(rcut / to_face).Proposed fix
def b_to_face_distance(cell: tnp.ndarray) -> tnp.ndarray: - volume = tf.linalg.det(cell) + volume = tf.abs(tf.linalg.det(cell))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/region.py` around lines 96 - 104, The b_to_face_distance helper is using the signed determinant from tf.linalg.det(cell), which can produce negative face distances for cells with negative orientation. Update b_to_face_distance to use the absolute cell volume before dividing by the cross-product norms, and keep the returned values non-negative so to_face_distance() and extend_coord_with_ghosts() cannot undercount ghost images.deepmd/tf2/format_nlist.py-42-68 (1)
42-68: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winApply the cutoff mask on pad and pass-through paths too.
rcutis only enforced whenn_nsel > nsel. If the incoming list is already short enough or exactlynsel, out-of-cutoff neighbors remain in the returned list.Proposed fix
elif n_nsel > nsel: # make a copy before revise m_real_nei = nlist >= 0 ret = tnp.where(m_real_nei, nlist, 0) @@ else: # n_nsel == nsel: ret = nlist + + m_real_nei = ret >= 0 + safe_ret = tnp.where(m_real_nei, ret, 0) + coord0 = extended_coord[:, :n_nloc, :] + index = tf.reshape(safe_ret, [n_nf, n_nloc * nsel, 1]) + index = tnp.repeat(index, 3, axis=2) + coord1 = tnp.take_along_axis(extended_coord, index, axis=1) + coord1 = tf.reshape(coord1, [n_nf, n_nloc, nsel, 3]) + rr2 = tnp.sum(tnp.square(coord0[:, :, None, :] - coord1), axis=-1) + ret = tnp.where(tnp.logical_or(~m_real_nei, rr2 > rcut * rcut), -1, ret)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/format_nlist.py` around lines 42 - 68, Apply the cutoff filtering in format_nlist for all branches, not just when n_nsel > nsel: after building ret in the pad path (n_nsel < nsel) and the pass-through path (n_nsel == nsel), mask any neighbors with distance beyond rcut to -1. Reuse the existing rr2/rcut logic in format_nlist, and keep the same behavior for padding while ensuring all returned neighbor lists are cutoff-enforced.deepmd/tf2/nlist.py-72-118 (1)
72-118: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDo not globally truncate before per-type neighbor selection.
When
distinguish_types=True, taking only the nearestsum(sel)neighbors beforenlist_distinguish_types()can drop valid neighbors of later types. For example, withsel=[1, 1], two closer type-0 atoms can exclude the closest type-1 atom even if it is withinrcut.Proposed direction
- rr, nlist = tf.cond(nsel <= nnei, truncate, pad) - nlist = tnp.where( - tnp.logical_or((rr > rcut), is_vir[:, :nloc, None]), - tnp.full_like(nlist, -1), - nlist, - ) - if distinguish_types: - return nlist_distinguish_types(nlist, atype, sel) - else: - return nlist + min_type_candidates = max(sel) + + def pad_type_candidates() -> tuple[tnp.ndarray, tnp.ndarray]: + return ( + tnp.concatenate( + [ + rr, + tnp.ones( + [batch_size, nloc, min_type_candidates - nnei], + dtype=rr.dtype, + ) + + rcut, + ], + axis=-1, + ), + tnp.concatenate( + [ + nlist, + tnp.full( + [batch_size, nloc, min_type_candidates - nnei], + -1, + dtype=nlist.dtype, + ), + ], + axis=-1, + ), + ) + + rr_t, nlist_t = tf.cond( + min_type_candidates <= nnei, + lambda: (rr, nlist), + pad_type_candidates, + ) + nlist_t = tnp.where( + tnp.logical_or((rr_t > rcut), is_vir[:, :nloc, None]), + tnp.full_like(nlist_t, -1), + nlist_t, + ) + return nlist_distinguish_types(nlist_t, atype, sel) + + rr, nlist = tf.cond(nsel <= nnei, truncate, pad) + nlist = tnp.where( + tnp.logical_or((rr > rcut), is_vir[:, :nloc, None]), + tnp.full_like(nlist, -1), + nlist, + ) + return nlist🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/nlist.py` around lines 72 - 118, The neighbor list logic in nlist should not truncate to sum(sel) before per-type selection when distinguish_types is enabled, because that can discard valid neighbors needed by later types. Update the flow around rr/nlist and nlist_distinguish_types so type-aware filtering happens before any per-type limit is applied, or otherwise ensure each type’s quota is selected independently within rcut. Keep the existing truncate/pad behavior only for the non-type-distinguishing path.deepmd/tf2/descriptor/se_t_tebd.py-20-22 (1)
20-22: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRegister the TF2 tebd descriptor entrypoint.
DescrptSeTTebdis another unregistered TF2 wrapper. The descriptor factory still has no name-based mapping for this backend class, so config/deserialization paths for the tebd descriptor will not resolve to the TF2 implementation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/descriptor/se_t_tebd.py` around lines 20 - 22, The TF2 tebd descriptor wrapper is defined but not actually registered in the descriptor factory, so name-based lookup still won’t resolve to the TF2 implementation. Update the TF2 registration path around DescrptSeTTebd and the `@tf2_module` decorator so this class is exported/registered under the tebd descriptor entrypoint the same way the other TF2 descriptor wrappers are.deepmd/tf2/descriptor/se_t.py-4-13 (1)
4-13: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRegister
DescrptSeTwithBaseDescriptor.Unlike
DescrptSeR, this TF2 wrapper is never added to the descriptor factory. Any path that resolves"se_t"by descriptor type will miss the TF2 class and fall back to the wrong implementation or fail to construct it.Proposed fix
from ..common import ( tf2_module, ) from ..utils import exclude_mask as _tf2_exclude_mask # noqa: F401 from ..utils import network as _tf2_network # noqa: F401 +from .base_descriptor import ( + BaseDescriptor, +) +@BaseDescriptor.register("se_t") `@tf2_module` class DescrptSeT(DescrptSeTDP): pass🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/descriptor/se_t.py` around lines 4 - 13, `DescrptSeT` is defined in the TF2 wrapper but never registered with `BaseDescriptor`, so descriptor lookup for "se_t" can miss the TF2 implementation. Update the `DescrptSeT` declaration in the `DescrptSeT` wrapper to include the same factory registration pattern used by `DescrptSeR`, ensuring the `BaseDescriptor` mapping resolves `"se_t"` to this class instead of falling back to another implementation.deepmd/tf2/descriptor/se_atten_v2.py-4-19 (1)
4-19: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRegister the
se_atten_v2descriptor type.
deepmd/dpmodel/descriptor/se_atten_v2.pypersists this descriptor as"se_atten_v2", anddeepmd/tf2/model/model.pylooks descriptors up throughBaseDescriptor.get_class_by_type(descriptor_type). This wrapper never registers that type, so loading any TF2 model/config carryingtype: "se_atten_v2"will fail at construction time.🔧 Proposed fix
from ..common import ( register_dpmodel_mapping, ) +from .base_descriptor import ( + BaseDescriptor, +) from .dpa1 import ( DescrptDPA1, ) @@ +@BaseDescriptor.register("se_atten_v2") class DescrptSeAttenV2(DescrptDPA1, DescrptSeAttenV2DP): pass🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/descriptor/se_atten_v2.py` around lines 4 - 19, The TF2 wrapper for DescrptSeAttenV2 is missing descriptor type registration, so BaseDescriptor.get_class_by_type cannot resolve "se_atten_v2" when models are loaded. Update the se_atten_v2 module to register DescrptSeAttenV2 under the exact "se_atten_v2" type using the existing registration flow used by other descriptors, alongside the current dpmodel mapping, so tf2/model/model.py can construct models with this descriptor.deepmd/tf2/model/__init__.py-2-29 (1)
2-29: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winExport
DPZBLModelfrom the model package as well.
deepmd.tf2.modelcurrently exposes the atomic-model helperDPZBLLinearEnergyAtomicModel, but not the actualDPZBLModelclass that this layer introduces. That leaves the package’s public model API incomplete for ZBL consumers.Suggested change
from deepmd.tf2.atomic_model.linear_atomic_model import ( DPZBLLinearEnergyAtomicModel, ) +from .dp_zbl_model import ( + DPZBLModel, +) from .dipole_model import ( DipoleModel, ) @@ __all__ = [ "DOSModel", "DPZBLLinearEnergyAtomicModel", + "DPZBLModel", "DipoleModel", "EnergyModel", "PolarModel", "PropertyModel", ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/model/__init__.py` around lines 2 - 29, The model package export list is missing the new DPZBLModel class, so update the deepmd.tf2.model package init to import DPZBLModel alongside the existing model classes and add it to __all__ while keeping DPZBLLinearEnergyAtomicModel exported; use the symbols in __init__.py to place the new public API entry in the same section as the other model exports.deepmd/tf2/descriptor/repformers.py-51-57 (1)
51-57: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy liftPatch the inherited Python shape assert before exposing
RepformerLayertotf.function.This TF2 wrapper currently brings the dpmodel repformer implementation into graph mode unchanged, and the pipeline is already failing on that path with
OperatorNotAllowedInGraphErrorfromassert list(atype_embd.shape) == [nf, nloc, self.g1_dim]. Please override that check here or patch the shared implementation to use TensorFlow-safe validation (tf.debugging.assert_*or staticTensorShapechecks) before merge.Possible direction
`@tf2_module` class RepformerLayer(RepformerLayerDP): _tf2_data_list_attrs: ClassVar[set[str]] = { "g1_residual", "g2_residual", "h2_residual", } + + # Override the parent method that currently does: + # assert list(atype_embd.shape) == [nf, nloc, self.g1_dim] + # and replace it with TF-safe shape validation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/descriptor/repformers.py` around lines 51 - 57, RepformerLayer still inherits a Python list-based shape assert from RepformerLayerDP that breaks under tf.function with OperatorNotAllowedInGraphError. Update RepformerLayer or the shared repformer implementation used by its graph path to replace the assert list(atype_embd.shape) == [nf, nloc, self.g1_dim] check with TensorFlow-safe validation such as tf.debugging.assert_* or static TensorShape inspection, and keep the fix localized around the RepformerLayer/RepformerLayerDP shape-check code path.Source: Pipeline failures
deepmd/tf2/model/model.py-68-72 (1)
68-72: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winDo not destructively edit the caller’s ZBL config.
Lines 69-72 mutate the original
dataandpop()both"type"keys. Reusing the same config dict for a secondget_model(data)call will fail, unlikeget_standard_model()which already protects the input withdeepcopy().Proposed fix
def get_zbl_model(data: dict) -> DPZBLModel: + data = deepcopy(data) data["descriptor"]["ntypes"] = len(data["type_map"]) descriptor_type = data["descriptor"].pop("type")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/model/model.py` around lines 68 - 72, get_zbl_model mutates the caller’s config by writing to data["descriptor"]["ntypes"] and popping both "type" fields, so repeated get_model(data) calls break. Update get_zbl_model to work on a copied config (like get_standard_model does with deepcopy) and read/remove fields from that copy, keeping the original data dict unchanged. Use get_zbl_model and get_standard_model as the key symbols when making the fix.deepmd/tf2/model/base_model.py-59-85 (1)
59-85: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDifferentiate the same reduction you expose in
*_redu.Line 84 always differentiates
sum(grad_atomic_ret[kk]), even when Lines 60-67 expose an intensive mean or masked mean. That makes*_derv_r/*_derv_c*scaled incorrectly for intensive outputs.Proposed fix
kk_redu = get_reduce_name(kk) - if vdef.intensive: - mask = atomic_ret["mask"] if "mask" in atomic_ret else None - if mask is not None: - model_predict[kk_redu] = jnp.sum(vv, axis=atom_axis) / jnp.sum( - mask, axis=-1, keepdims=True - ) - else: - model_predict[kk_redu] = jnp.mean(vv, axis=atom_axis) - else: - model_predict[kk_redu] = jnp.sum(vv, axis=atom_axis) + def _reduce_output(value, mask): + if vdef.intensive: + if mask is not None: + return jnp.sum(value, axis=atom_axis) / jnp.sum( + mask, axis=-1, keepdims=True + ) + return jnp.mean(value, axis=atom_axis) + return jnp.sum(value, axis=atom_axis) + + mask = atomic_ret["mask"] if "mask" in atomic_ret else None + model_predict[kk_redu] = _reduce_output(vv, mask) @@ - reduced_output = jnp.sum(grad_atomic_ret[kk], axis=atom_axis) + grad_mask = ( + grad_atomic_ret["mask"] if "mask" in grad_atomic_ret else None + ) + reduced_output = _reduce_output(grad_atomic_ret[kk], grad_mask)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/model/base_model.py` around lines 59 - 85, The derivative path in base_model.py is using a plain sum for reduced_output inside the vdef.r_differentiable block, which does not match the reduction used for model_predict[kk_redu] when vdef.intensive is true. Update the gradient computation in the same branch that calls self.atomic_model.forward_common_atomic and get_deriv_name(kk) so reduced_output applies the same intensive or masked-mean reduction as the exposed *_redu value, ensuring *_derv_r and *_derv_c remain consistent with the forward output.
🧹 Nitpick comments (3)
deepmd/_vendors/ndtensorflow/_namespace.py (1)
2005-2036: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy liftAvoid quadratic memory in
unique.
_uniquematerializes ann x nequality matrix, so large atom/type arrays can OOM. Prefertf.unique/tf.unique_with_countsand handle NaN policy separately.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/_vendors/ndtensorflow/_namespace.py` around lines 2005 - 2036, The _unique helper currently builds an n x n equality matrix, which can blow up memory for large inputs. Update _unique in _namespace.py to avoid pairwise materialization by using TensorFlow’s tf.unique / tf.unique_with_counts for values, indices, inverse, and counts where possible, and preserve the current NaN handling separately in the same function. Keep the existing return contract of _unique unchanged while replacing the quadratic equality logic.source/tests/consistent/test_tf2_backend.py (1)
131-183: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winAdd a SavedModel export/load smoke test.
This only covers eager execution and the innertf.functionpath; add a tempdir round-trip throughtf.saved_model.save(...)and reload to exercise the actual export path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/tests/consistent/test_tf2_backend.py` around lines 131 - 183, The current `test_tf_function_call_from_lower` only exercises eager execution and an inner `tf.function`, so it misses the SavedModel export/load path. Extend this test in `test_tf2_backend.py` to save the model to a temporary directory with `tf.saved_model.save(...)` and then reload it, using the existing `get_tf2_model`, `model_call_from_call_lower`, and `call_lower` flow to validate the restored model behaves the same as the original. Keep the existing assertions/outputs so the round-trip covers the full export/import smoke test.Sources: Coding guidelines, Pipeline failures
deepmd/tf2/transform_output.py (1)
54-54: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRemove the dead
mldimsassignment.
mldimsis never read, and CodeQL already reports it as unused.Proposed fix
- mldims = tf.shape(mapping) vldims = get_leading_dims(vv, vdef)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepmd/tf2/transform_output.py` at line 54, The mapping shape capture in transform_output is dead code because mldims is never used. Remove the unused mldims assignment from the function that computes the mapping shape, keeping the surrounding logic in transform_output unchanged and ensuring no leftover references to mldims remain.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Major comments:
In `@deepmd/_vendors/ndtensorflow/__init__.py`:
- Around line 20-25: The namespace re-export in the ndtensorflow initializer is
triggering Ruff F403/PLE0605 because it uses a star import plus a computed
__all__; update the module around the __init__ import and __all__ definition to
either add the appropriate targeted lint suppressions or replace the pattern
with an explicit namespace re-export approach via the namespace module, while
keeping xp imports working. Make sure the fix preserves the full public surface
exposed by _namespace_all and then run ruff check . before committing.
In `@deepmd/_vendors/ndtensorflow/_array.py`:
- Around line 257-264: `Array.__setitem__` currently handles boolean tensor keys
with a plain `tf.where`, but that only works for fully broadcast-compatible
masks and fails for prefix-shaped masks like a row selector over leading
dimensions. Update the boolean-assignment branch in `__setitem__` (the
`tf.Tensor`/`tf.bool` path) to detect when the mask matches only the leading
dimensions and expand/broadcast it across the remaining dimensions before
calling `tf.where`, so masked row assignments work correctly while preserving
the existing scalar/full-shape behavior.
- Line 222: The Ruff findings in the array logic need to be cleaned up in the
affected expressions in _array.py, including the key_dim/x_dim comparison and
the other reported spots. Add strict=False where the numpy/tensor conversion or
shape-check call requires it, switch the relevant loops/comprehensions to tuple
unpacking, and parenthesize the mixed or/and condition so the intent is
explicit. Use the surrounding symbols in the array handling code to locate each
occurrence and make the same style-safe update at all reported matches.
In `@deepmd/_vendors/ndtensorflow/_namespace.py`:
- Line 37: Resolve the remaining Ruff failures in the namespace module by
handling the Array API builtin-shadowing names with targeted per-file ignores or
inline noqa for A001/A002 in the namespace constants, and fix the RUF021
precedence and RUF022 __all__ issues directly by parenthesizing the affected
expressions and sorting/auto-fixing the __all__ entries. Use the existing
namespace symbols like bool, int, float, and __all__ to locate the remaining
violations, then rerun ruff check . to confirm the file is clean.
- Around line 1538-1540: The include_initial shape handling currently relies on
out.shape.as_list(), which can break under tf.function when dimensions are None.
Update the logic in the include_initial branch around the tf.concat call to
build the shape dynamically from tf.shape(out), replacing only the selected axis
at runtime before creating the zero tensor. Apply the same fix in both affected
include_initial blocks so the concatenation works with dynamic shapes.
In `@deepmd/_vendors/ndtensorflow/fft.py`:
- Around line 360-375: The __all__ list in fft.py is not sorted, triggering Ruff
rule RUF022 and a CI failure. Update the module-level __all__ definition by
ordering the exported names alphabetically, keeping the symbols like fft, ifft,
fftn, and fftshift in sorted order so Ruff’s auto-fix would pass cleanly.
In `@deepmd/_vendors/ndtensorflow/linalg.py`:
- Around line 317-319: The norm handling in linalg.norm ignores the requested
matrix axes because matrix_norm always uses the last two dimensions. Update the
tuple-axis branch in norm to first move the requested axis pair into the matrix
positions before calling matrix_norm, then restore the reduced dimensions so
keepdims is preserved in the original axis locations. Use the norm and
matrix_norm symbols to make the axis handling work for any 2D axis pair, not
just the last two axes.
- Around line 212-230: The solve function currently mishandles batched vector
right-hand sides by treating x2 shaped (..., n) as a matrix RHS. Update solve in
linalg.py to detect when x2_.shape.rank is exactly x1_.shape.rank - 1, expand
x2_ with a trailing singleton dimension before calling tf.linalg.solve, and
squeeze that dimension from the result afterward. Keep the existing
scalar/vector and matrix-RHS handling in solve consistent with _promote_two,
_shape_tuple, and the final _wrap behavior.
- Line 165: Clean up the Ruff violations in the linalg module by keeping the
public API parameter name ord but adding targeted A002 ignores where needed,
rather than renaming the API. Auto-fix the tuple-concatenation and __all__
issues in the affected definitions, including the blocks around the linalg
helpers and export list. Verify the changes in the affected signatures and
module-level exports, then run ruff check . before committing to catch any
remaining violations.
In `@deepmd/tf2/common.py`:
- Around line 149-161: The lazy registration path in _ensure_registrations
currently lets concurrent callers return while imports are still in progress, so
try_convert_module can observe an incomplete _DPMODEL_TO_TF2 mapping. Update the
registration flow in _ensure_registrations (and any caller like
try_convert_module that depends on it) to use a synchronization primitive such
as a lock or event so only one thread performs the imports and all other threads
block until _REGISTRATIONS_READY is true. Keep the existing
_REGISTRATIONS_IN_PROGRESS/_REGISTRATIONS_READY state, but make non-owner
threads wait for completion instead of returning early.
In `@deepmd/tf2/descriptor/repformers.py`:
- Around line 51-57: RepformerLayer still inherits a Python list-based shape
assert from RepformerLayerDP that breaks under tf.function with
OperatorNotAllowedInGraphError. Update RepformerLayer or the shared repformer
implementation used by its graph path to replace the assert
list(atype_embd.shape) == [nf, nloc, self.g1_dim] check with TensorFlow-safe
validation such as tf.debugging.assert_* or static TensorShape inspection, and
keep the fix localized around the RepformerLayer/RepformerLayerDP shape-check
code path.
In `@deepmd/tf2/descriptor/se_atten_v2.py`:
- Around line 4-19: The TF2 wrapper for DescrptSeAttenV2 is missing descriptor
type registration, so BaseDescriptor.get_class_by_type cannot resolve
"se_atten_v2" when models are loaded. Update the se_atten_v2 module to register
DescrptSeAttenV2 under the exact "se_atten_v2" type using the existing
registration flow used by other descriptors, alongside the current dpmodel
mapping, so tf2/model/model.py can construct models with this descriptor.
In `@deepmd/tf2/descriptor/se_t_tebd.py`:
- Around line 20-22: The TF2 tebd descriptor wrapper is defined but not actually
registered in the descriptor factory, so name-based lookup still won’t resolve
to the TF2 implementation. Update the TF2 registration path around
DescrptSeTTebd and the `@tf2_module` decorator so this class is
exported/registered under the tebd descriptor entrypoint the same way the other
TF2 descriptor wrappers are.
In `@deepmd/tf2/descriptor/se_t.py`:
- Around line 4-13: `DescrptSeT` is defined in the TF2 wrapper but never
registered with `BaseDescriptor`, so descriptor lookup for "se_t" can miss the
TF2 implementation. Update the `DescrptSeT` declaration in the `DescrptSeT`
wrapper to include the same factory registration pattern used by `DescrptSeR`,
ensuring the `BaseDescriptor` mapping resolves `"se_t"` to this class instead of
falling back to another implementation.
In `@deepmd/tf2/format_nlist.py`:
- Around line 42-68: Apply the cutoff filtering in format_nlist for all
branches, not just when n_nsel > nsel: after building ret in the pad path
(n_nsel < nsel) and the pass-through path (n_nsel == nsel), mask any neighbors
with distance beyond rcut to -1. Reuse the existing rr2/rcut logic in
format_nlist, and keep the same behavior for padding while ensuring all returned
neighbor lists are cutoff-enforced.
In `@deepmd/tf2/infer/deep_eval.py`:
- Around line 314-323: Derive nframes after normalizing coords in _eval_model so
flat single-frame inputs are handled correctly. The current logic in
deep_eval.py uses coords.shape[0] before reshaping, which misinterprets a 1D
coords array as many frames and over-tiles atom_types. Update the flow around
nframes, atom_types, and coord_input so the frame count is based on the reshaped
coordinate tensor (and stays consistent with cells), using the existing
_eval_model and eval() entry points to locate the fix.
In `@deepmd/tf2/model/__init__.py`:
- Around line 2-29: The model package export list is missing the new DPZBLModel
class, so update the deepmd.tf2.model package init to import DPZBLModel
alongside the existing model classes and add it to __all__ while keeping
DPZBLLinearEnergyAtomicModel exported; use the symbols in __init__.py to place
the new public API entry in the same section as the other model exports.
In `@deepmd/tf2/model/base_model.py`:
- Around line 59-85: The derivative path in base_model.py is using a plain sum
for reduced_output inside the vdef.r_differentiable block, which does not match
the reduction used for model_predict[kk_redu] when vdef.intensive is true.
Update the gradient computation in the same branch that calls
self.atomic_model.forward_common_atomic and get_deriv_name(kk) so reduced_output
applies the same intensive or masked-mean reduction as the exposed *_redu value,
ensuring *_derv_r and *_derv_c remain consistent with the forward output.
In `@deepmd/tf2/model/model.py`:
- Around line 68-72: get_zbl_model mutates the caller’s config by writing to
data["descriptor"]["ntypes"] and popping both "type" fields, so repeated
get_model(data) calls break. Update get_zbl_model to work on a copied config
(like get_standard_model does with deepcopy) and read/remove fields from that
copy, keeping the original data dict unchanged. Use get_zbl_model and
get_standard_model as the key symbols when making the fix.
In `@deepmd/tf2/nlist.py`:
- Around line 72-118: The neighbor list logic in nlist should not truncate to
sum(sel) before per-type selection when distinguish_types is enabled, because
that can discard valid neighbors needed by later types. Update the flow around
rr/nlist and nlist_distinguish_types so type-aware filtering happens before any
per-type limit is applied, or otherwise ensure each type’s quota is selected
independently within rcut. Keep the existing truncate/pad behavior only for the
non-type-distinguishing path.
In `@deepmd/tf2/region.py`:
- Around line 96-104: The b_to_face_distance helper is using the signed
determinant from tf.linalg.det(cell), which can produce negative face distances
for cells with negative orientation. Update b_to_face_distance to use the
absolute cell volume before dividing by the cross-product norms, and keep the
returned values non-negative so to_face_distance() and
extend_coord_with_ghosts() cannot undercount ghost images.
In `@deepmd/tf2/utils/serialization.py`:
- Around line 32-35: The SavedModel export in the serialization flow is missing
a direct reference from the root object to the deserialized model. In the
function that builds tf_model after BaseModel.deserialize, attach model to
tf_model before calling tf.saved_model.save so the variables remain reachable
and serialize correctly; keep the existing model_def_script export logic
unchanged.
---
Nitpick comments:
In `@deepmd/_vendors/ndtensorflow/_namespace.py`:
- Around line 2005-2036: The _unique helper currently builds an n x n equality
matrix, which can blow up memory for large inputs. Update _unique in
_namespace.py to avoid pairwise materialization by using TensorFlow’s tf.unique
/ tf.unique_with_counts for values, indices, inverse, and counts where possible,
and preserve the current NaN handling separately in the same function. Keep the
existing return contract of _unique unchanged while replacing the quadratic
equality logic.
In `@deepmd/tf2/transform_output.py`:
- Line 54: The mapping shape capture in transform_output is dead code because
mldims is never used. Remove the unused mldims assignment from the function that
computes the mapping shape, keeping the surrounding logic in transform_output
unchanged and ensuring no leftover references to mldims remain.
In `@source/tests/consistent/test_tf2_backend.py`:
- Around line 131-183: The current `test_tf_function_call_from_lower` only
exercises eager execution and an inner `tf.function`, so it misses the
SavedModel export/load path. Extend this test in `test_tf2_backend.py` to save
the model to a temporary directory with `tf.saved_model.save(...)` and then
reload it, using the existing `get_tf2_model`, `model_call_from_call_lower`, and
`call_lower` flow to validate the restored model behaves the same as the
original. Keep the existing assertions/outputs so the round-trip covers the full
export/import smoke test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 60dcc087-f41b-4b63-9f90-257c6b839a47
📒 Files selected for processing (70)
deepmd/_vendors/__init__.pydeepmd/_vendors/ndtensorflow/__init__.pydeepmd/_vendors/ndtensorflow/_array.pydeepmd/_vendors/ndtensorflow/_info.pydeepmd/_vendors/ndtensorflow/_namespace.pydeepmd/_vendors/ndtensorflow/fft.pydeepmd/_vendors/ndtensorflow/linalg.pydeepmd/_vendors/ndtensorflow/py.typeddeepmd/backend/tf2.pydeepmd/jax/jax2tf/__init__.pydeepmd/jax/jax2tf/format_nlist.pydeepmd/jax/jax2tf/make_model.pydeepmd/jax/jax2tf/nlist.pydeepmd/jax/jax2tf/region.pydeepmd/jax/jax2tf/serialization.pydeepmd/jax/jax2tf/transform_output.pydeepmd/jax/utils/serialization.pydeepmd/tf2/__init__.pydeepmd/tf2/atomic_model/__init__.pydeepmd/tf2/atomic_model/base_atomic_model.pydeepmd/tf2/atomic_model/dipole_atomic_model.pydeepmd/tf2/atomic_model/dos_atomic_model.pydeepmd/tf2/atomic_model/dp_atomic_model.pydeepmd/tf2/atomic_model/energy_atomic_model.pydeepmd/tf2/atomic_model/linear_atomic_model.pydeepmd/tf2/atomic_model/pairtab_atomic_model.pydeepmd/tf2/atomic_model/polar_atomic_model.pydeepmd/tf2/atomic_model/property_atomic_model.pydeepmd/tf2/common.pydeepmd/tf2/descriptor/__init__.pydeepmd/tf2/descriptor/base_descriptor.pydeepmd/tf2/descriptor/dpa1.pydeepmd/tf2/descriptor/dpa2.pydeepmd/tf2/descriptor/dpa3.pydeepmd/tf2/descriptor/hybrid.pydeepmd/tf2/descriptor/repflows.pydeepmd/tf2/descriptor/repformers.pydeepmd/tf2/descriptor/se_atten_v2.pydeepmd/tf2/descriptor/se_e2_a.pydeepmd/tf2/descriptor/se_e2_r.pydeepmd/tf2/descriptor/se_t.pydeepmd/tf2/descriptor/se_t_tebd.pydeepmd/tf2/env.pydeepmd/tf2/fitting/__init__.pydeepmd/tf2/fitting/base_fitting.pydeepmd/tf2/fitting/fitting.pydeepmd/tf2/format_nlist.pydeepmd/tf2/infer/__init__.pydeepmd/tf2/infer/deep_eval.pydeepmd/tf2/make_model.pydeepmd/tf2/model/__init__.pydeepmd/tf2/model/base_model.pydeepmd/tf2/model/dipole_model.pydeepmd/tf2/model/dos_model.pydeepmd/tf2/model/dp_model.pydeepmd/tf2/model/dp_zbl_model.pydeepmd/tf2/model/ener_model.pydeepmd/tf2/model/model.pydeepmd/tf2/model/polar_model.pydeepmd/tf2/model/property_model.pydeepmd/tf2/nlist.pydeepmd/tf2/region.pydeepmd/tf2/transform_output.pydeepmd/tf2/utils/__init__.pydeepmd/tf2/utils/exclude_mask.pydeepmd/tf2/utils/network.pydeepmd/tf2/utils/serialization.pydeepmd/tf2/utils/type_embed.pypyproject.tomlsource/tests/consistent/test_tf2_backend.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5598 +/- ##
==========================================
- Coverage 82.35% 81.88% -0.47%
==========================================
Files 896 953 +57
Lines 100923 104642 +3719
Branches 4058 4058
==========================================
+ Hits 83113 85689 +2576
- Misses 16348 17490 +1142
- Partials 1462 1463 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/tests/consistent/model/test_ener.py`:
- Around line 327-334: The TF2 comparison path in test_ener is skipping
atomic-virial outputs even though the TF2 eval helpers request
do_atomic_virial=True, so update the TF2 branches that return SKIP_FLAG to
compare the actual atom_virial and extended_virial results instead, or remove
the atomic-virial request from the TF2 eval helpers until support is complete.
Use the TF2 handling in the relevant test helper return paths to locate the
change and keep the consistency test aligned with the backend behavior.
In `@source/tests/consistent/test_tf2_backend.py`:
- Around line 17-56: The TF2 subprocess wrapper in
test_tf2_consistent_backend_subprocess only checks for TensorFlow’s presence, so
it can pass even when the tf2 backend is unavailable and all inner tests get
skipped. Update the guard to also verify the tf2 backend itself using
Backend.get_backend("tf2") and is_available() under the same DEEPMD_TEST_TF2
environment, or assert after subprocess.run that at least one tf2 test actually
executed. Use the existing test_tf2_consistent_backend_subprocess and
Backend.get_backend("tf2") symbols to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 84474bc4-2aca-4173-9ccf-6f85bd3848fc
📒 Files selected for processing (4)
source/tests/consistent/common.pysource/tests/consistent/model/common.pysource/tests/consistent/model/test_ener.pysource/tests/consistent/test_tf2_backend.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepmd/_vendors/ndtensorflow/_array.py`:
- Around line 456-459: The leading-dimension comparison in the mask validation
logic needs an explicit strict zip to satisfy Ruff B905. Update the zip used in
the tensor/key shape check inside the array-matching code so it iterates over
tensor_shape sliced to rank and uses strict=True, keeping the existing dimension
mismatch test intact. Use the existing mask-shape validation block in _array.py
to locate the fix.
In `@source/tests/consistent/fitting/test_ener.py`:
- Around line 523-526: Mirror the TF2 unsupported-case skip logic from TestEner
in TestEnerStat so it does not run the same invalid EnerFittingTF2
parameterizations when TF2 is installed. Update the TestEnerStat setup around
skip_tf2, tf_class, and tf2_class to also skip TF2 for bfloat16, default
fparams, use_aparam_as_mask, and non-empty atom_ener, matching the existing
TestEner conditions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 99eb1b7d-25a4-4e72-b162-388aa5c4c0a1
📒 Files selected for processing (18)
deepmd/_vendors/ndtensorflow/_array.pydeepmd/_vendors/ndtensorflow/_namespace.pysource/tests/consistent/descriptor/test_dpa1.pysource/tests/consistent/descriptor/test_dpa2.pysource/tests/consistent/descriptor/test_dpa3.pysource/tests/consistent/descriptor/test_hybrid.pysource/tests/consistent/descriptor/test_se_atten_v2.pysource/tests/consistent/descriptor/test_se_e2_a.pysource/tests/consistent/descriptor/test_se_r.pysource/tests/consistent/descriptor/test_se_t.pysource/tests/consistent/descriptor/test_se_t_tebd.pysource/tests/consistent/fitting/test_dipole.pysource/tests/consistent/fitting/test_dos.pysource/tests/consistent/fitting/test_ener.pysource/tests/consistent/fitting/test_polar.pysource/tests/consistent/fitting/test_property.pysource/tests/consistent/test_array_api.pysource/tests/consistent/test_tf2_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/_vendors/ndtensorflow/_namespace.py
Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
njzjz-bot
left a comment
There was a problem hiding this comment.
Quick review on the latest head (14368bb): thanks for splitting the .savedmodel / .savedmodeltf suffix handling and adding the dedicated C++ coverage for both export paths. The Python TF2 consistency matrix and backend detection coverage look much stronger now.
One merge blocker remains from CI: Test C++ (false, true, true, false) is failing in the LAMMPS DPA JAX SavedModel tests. The failure is consistently an out-of-range TensorFlow gather with a -1 neighbor index, e.g.
source/lmp/tests/test_lammps_dpa_jax.py::test_pair_deepmd
TensorFlow C API Error: indices[0,49,4] = [0, -1, 4] does not index into param shape [1,6,5]
The same pattern appears in the virial/model-deviation variants. This suggests the SavedModel lower-call path still lets padded/invalid neighbor entries reach a GatherNd after the new TF2/JAX2TF sharing and .savedmodeltf split. Before merge, I would fix this by preserving the previous invalid-neighbor masking/sentinel handling around the gather path, or by ensuring the exported lower functions sanitize -1 neighbor indices before any TensorFlow gather while keeping the mask applied to the resulting values.
Once that C++/LAMMPS job is green, I do not see another obvious blocker from the latest diff/check summary.
Reviewed by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
Some pytest-split groups can have no tests selected after the TF2 eager path and -k filters, which makes pytest exit with code 5. Treat that code as success for this CI shard while preserving other pytest failures.\n\nAuthored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
for more information, see https://pre-commit.ci
73de44b
## Summary - vendor ndtensorflow under deepmd/_vendors and add a TensorFlow eager Array API backend under deepmd/tf2 - implement tf2 model wrappers, tf.function SavedModel export/deep-eval glue, and backend registration for .savedmodel - replace jax/jax2tf TensorFlow helper implementations with tf2 compatibility exports - add focused tf2 consistency tests for eager and tf.function model paths ## Tests - ruff check . - ruff format . - pytest source/tests/consistent/test_tf2_backend.py -q - manual SavedModel export/load smoke test Note: local pre-commit hook attempted to fetch astral-sh/ruff-pre-commit and timed out on GitHub port 443; the commit was created with --no-verify after running the checks above manually. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a TensorFlow 2 eager backend with SavedModel inference plus TF2 model/descriptor/fitting wrappers (energy/dipole/polar/DOS/property). * Introduced TF2 neighbor-list construction, region/PBC geometry utilities, and output/ghost-atom aggregation. * Expanded the TensorFlow-backed Array API with namespace, FFT, and linear algebra utilities. * **Bug Fixes** * Improved the non-eager TensorFlow compatibility error message. * **Refactor** * Converted multiple JAX2TF modules into TF2 compatibility shims that re-export TF2 implementations. * **Tests** * Added TF2 consistency coverage, including subprocess-based checks. * **Chores** * Added license headers/docs and updated lint configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jinzhe Zeng <njzjz@qq.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: njzjz-bot (driven by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5))[bot] <48687836+njzjz-bot@users.noreply.github.com>
Summary
Tests
Note: local pre-commit hook attempted to fetch astral-sh/ruff-pre-commit and timed out on GitHub port 443; the commit was created with --no-verify after running the checks above manually.
Summary by CodeRabbit