Skip to content

Generalized FMG hierarchy: custom-P transfers for arbitrary nested/non-nested grids (Layer 1)#290

Open
lmoresi wants to merge 15 commits into
developmentfrom
feature/custom-mg-prolongation
Open

Generalized FMG hierarchy: custom-P transfers for arbitrary nested/non-nested grids (Layer 1)#290
lmoresi wants to merge 15 commits into
developmentfrom
feature/custom-mg-prolongation

Conversation

@lmoresi

@lmoresi lmoresi commented Jun 28, 2026

Copy link
Copy Markdown
Member

What

Generalize UW3's geometric FMG so the multigrid hierarchy can use arbitrary coarse grids — nested or non-nested, uniform or locally (SBR) refined — by building the prolongations ourselves (barycentric / local-RBF) and installing them via PC.setMGInterpolation, with Galerkin RAP forming the coarse operators. This decouples geometric multigrid from uniform refine() nesting, enabling locally-adapted hierarchies while keeping FMG.

Independent of any adapter ("adapt-on-top" is a separate follow-on). The existing Mesh(refinement=N) uniform FMG path is unchanged.

Status — Layer 1, Phase-1 hardening complete

Home: utilities/custom_mg.py. Entry point: set_custom_fmg(solver, coarse_meshes, builder=, field_id=).

Working + tested — serial test_1014/1015/1016/1017 (20) + parallel tests/parallel/test_1017_custom_mg_parallel_mpi.py (np=2):

  • CustomMGHierarchy, set_custom_fmg, sbr_refine / sbr_refine_where (local Skeleton-Based Refinement; no MMG; on-rank; conforming).
  • BC applied at every level (transfers reduced→reduced) with a zero-column guard — the load-bearing invariant (PETSc native FMG does this via constrained sections; custom-P must replicate it).
  • Scalar / single-field-vector (top-level PC) and Stokes velocity block (field_id=0).
  • Serial and parallel (np>1), nested co-partitioned.

Hardening steps (all done)

  1. Stokes / saddle-point velocity block. The velocity sub-PC is unreachable until the monolithic Jacobian is assembled (PCFieldSplit forms A_vv via MatCreateSubMatrix; snes.setUp builds structure only). So _install_velocity_block_transfers forces a Jacobian assembly (computeFunction+computeJacobian at the zero guess; max_it=0 fallback), reaches the velocity sub-PC, resets it and rebuilds a fresh PCMG from our P (mirrors the proven standalone recipe; sidesteps the MatProductReplaceMats live-swap bug), and re-attaches the coupled Stokes nullspace. Wired into SNES_Stokes_SaddlePt.solve as a guarded no-op.
  2. Dropped the throwaway-solver factory. Each coarse level's BC-constrained reduced map is derived directly from its DM via copyFields + copyDS + createDS (_coarse_reduced_map) — the DS carries UW's exact essential-BC definitions and is topology-independent, so it constrains any coarse mesh sharing the solver's boundary labels. Validated byte-identical to the old factory path; leak-free (no SNES / JIT). set_custom_fmg / build no longer take a level_solver_factory.
  3. Parallel (np>1). Nested co-partitioned transfers: each rank builds its block of P rank-locally (ghost-inclusive coarse coords → every owned fine node lands in a local coarse simplex), reduced global numbering rides the DM global section (_level_dof_layout), transfers assemble as MPIAIJ (constrained coarse DOFs drop → reduced→reduced), parallel zero-column guard via Pᵀ·1. The legacy finest-only set_custom_mg path stays serial-only and raises loudly at np>1.

Validation

iters vs GAMG notes
Scalar jump-coeff Poisson, 5-level (3 uniform + 2 SBR) 3 FMG 46
SolCx velocity block (η-jump 1e6), 3-level 6 ~198 matches GAMG solution to 1.5e-9
SolCx velocity block vs native FMG 6 (native 5) ties native FMG; value = works where native FMG cannot (non-nested / adapted)
Parallel np=1/2/4 (Poisson + Stokes) 4 / 6 matches GAMG and each other across rank counts

Out of scope (later)

Non-nested custom-P in parallel (cross-rank point location) — serial-only. Layer-2 adapt-on-top adapter is a separate follow-on.

Design: docs/developer/design/GENERALIZED_FMG_HIERARCHY_AND_ADAPT.md

Underworld development team with AI support from Claude Code

lmoresi added 11 commits June 28, 2026 10:31
Add set_custom_mg() + utilities/custom_mg.py: drive PCMG geometric multigrid
with a prolongation we build ourselves (barycentric or RBF) from independent,
possibly non-nested coarse meshes. Coarse operators come from Galerkin RAP
(P^T A P), so only the prolongation list is supplied. This decouples geometric
multigrid from a nested refine() hierarchy.

Mechanism: inject the custom P before the first PCSetUp (avoids the
MatProductReplaceMats shape error a live matrix swap triggers under Galerkin),
and KSPSetDMActive(OPERATOR, False) so PETSc uses our explicit P instead of
re-deriving DM-hierarchy interpolation. Finest level reduced to the
BC-eliminated global ordering; coarse levels use full DOF coords.

Validated (scalar Poisson, refinement=2 box): custom barycentric P from
independent coarse meshes reaches FMG iteration counts (1-3 vs FMG 2) and the
same solution. On a mesh with NO refine hierarchy (FMG unavailable -> GAMG, 13
iters) custom-P geometric MG converges in ~9 iters. test_1015 (4 tests);
test_1014 (11) still pass.

Single-field (scalar/vector) only; Stokes velocity-block path is a follow-up.

Underworld development team with AI support from Claude Code
…coped SBR

Phase 1 of hardening the custom-prolongation work into a generalized FMG hierarchy.

- CustomMGHierarchy: adapter-agnostic multi-level hierarchy that builds custom-P
  transfers (barycentric/rbf) with essential BCs applied at EVERY level
  (transfers map reduced->reduced). This is the load-bearing invariant: on an
  exactly-nested hierarchy, omitting per-level reduction makes a coarse boundary
  DOF coincide with a BC-removed fine DOF -> zero column -> singular Galerkin
  coarse operator. build() guards against zero columns.
- set_custom_fmg(): register a hierarchy + per-level solver factory; built and
  installed at solve() time (build-time injection, DMActive(OPERATOR,False),
  Galerkin RAP). Scalar / single-field-vector (top-level PC); Stokes velocity
  block is Phase 2.
- sbr_refine / sbr_refine_where: local Skeleton-Based Refinement (no MMG,
  on-rank, conforming) with SCOPED dm_plex_transform_type (leaking it globally
  breaks UW's uniform refine() with err73).
- Legacy finest-only path kept for back-compat (test_1015).

Validated: scalar jump-coefficient Poisson on a 5-level (3 uniform + 2 SBR)
hierarchy converges in 3 FMG iters vs GAMG 46, solution matches to 2.4e-8.
test_1016 (3 tests); test_1015 (4) + test_1014 (11) still pass.

Design: docs/developer/design/GENERALIZED_FMG_HIERARCHY_AND_ADAPT.md

Underworld development team with AI support from Claude Code
Layer 1 (generalized FMG hierarchy) is an independent, working capability
(arbitrary nested/non-nested coarse grids -> geometric FMG, BC-per-level).
Current scope experimental: scalar/single-field, serial, factory API. Remaining
before merge-ready general feature: Stokes velocity-block, drop the factory
(label-based BC reduction), parallel.

Underworld development team with AI support from Claude Code
Integrate the generalized FMG hierarchy into the saddle-point solver's
velocity sub-block. set_custom_fmg(..., field_id=0) now drives geometric
multigrid on the velocity block with our supplied (barycentric / RBF)
prolongations + Galerkin RAP.

The velocity sub-PC is unreachable until the monolithic Jacobian is
assembled (PCFieldSplit forms A_vv via MatCreateSubMatrix; snes.setUp
builds structure only -> err73). So _install_velocity_block_transfers:
forces a Jacobian assembly (computeFunction + computeJacobian at the zero
guess; throwaway max_it=0 fallback), reaches the velocity sub-PC,
reset()s it and rebuilds a FRESH PCMG from our P (mirrors the proven
standalone recipe; sidesteps the MatProductReplaceMats live-swap bug),
re-attaches the coupled Stokes nullspace. _configure_pcmg now derives the
options prefix from pc.getOptionsPrefix() so both the scalar top-level PC
and the velocity sub-PC are configured correctly.

inject_custom_mg is wired into SNES_Stokes_SaddlePt.solve (guarded no-op
unless set_custom_fmg was called), injected after setFromOptions /
nullspace, before the real solve.

Validated (SolCx eta_B=1e6, 3-level nested hierarchy, in-solver via
solver.solve()): velocity block converges in 6 MG iters vs GAMG 198,
solution matches the GAMG reference to 1.5e-9. test_1017 (barycentric +
rbf). test_1014/1015/1016 unchanged (20 pass total).

Underworld development team with AI support from Claude Code
…, Step 1)

Underworld development team with AI support from Claude Code
Each coarse level's BC-constrained reduced map is now derived directly
from the coarse mesh DM via _coarse_reduced_map: clone the DM, copy the
(built) finest solver's fields + DS onto it, createDS, read the global
section. The DS carries UW's exact essential-BC definitions and is a
topology-independent discretisation spec, so it constrains the matching
boundary DOFs on ANY coarse mesh that shares the solver's boundary labels
(nested or not).

This removes the throwaway-solver factory (heavy, leak-prone, and an API
wart): set_custom_fmg and CustomMGHierarchy.build no longer take a
level_solver_factory. Leak-free — DM ops only, no SNES / JIT.

Validated identical to the old factory path (reduced maps byte-equal:
126 + 510 DOFs on the SolCx 3-level hierarchy). test_1014/1015/1016/1017
all green (20). custom-P vs native FMG unchanged (6 vs 5 iters, sol match
3.7e-10).

Underworld development team with AI support from Claude Code
Underworld development team with AI support from Claude Code
…tial)

The reduced maps use rank-local DOF indices and the prolongations assemble
as serial AIJ, so at np>1 custom-P would silently build wrong P (or hit a
cryptic broadcast error). Add _require_serial: set_custom_fmg and
inject_custom_mg now raise a clear NotImplementedError on >1 ranks,
pointing to preconditioner='fmg'/'gamg'. Serial behaviour unchanged.

Parallel test tests/parallel/test_1017_custom_mg_serial_guard_mpi.py
(np=2) asserts both the set-time and legacy solve-time guards fire.

Full parallel custom-P (nested co-partitioned, rank-local point location +
ghosting, MPIAIJ assembly with global-section reduction) remains a
designed fast-follow.

Underworld development team with AI support from Claude Code
Underworld development team with AI support from Claude Code
…(Phase 1, Step 3)

The hierarchy path now builds parallel-correct transfers. Each rank builds
its block of P rank-locally: ghost-inclusive coarse coords mean every owned
fine node lands in a local coarse simplex (verified 0 misses np=2/4), and
the reduced global numbering rides the DM global section via a clean
local->global-reduced map (_level_dof_layout scatters owned global indices
out through globalToLocal; constrained DOFs stay -1, ghosts resolve to the
owner's global index). Transfers assemble as MPIAIJ (owned fine rows, global
coarse cols incl. off-rank); constrained coarse DOFs drop -> reduced->reduced.
Parallel zero-column guard via P^T·1 + allreduce.

_coarse_dof_layout reuses the leak-free copyDS trick for coarse levels.
CustomMGHierarchy.build branches serial (scipy CSR) vs parallel (MPIAIJ).
The guard now blocks only the legacy finest-only path (still serial).

Validated np=1/2/4: scalar Poisson custom-P 4 iters and Stokes SolCx
velocity block 6 iters, both matching the GAMG reference (and each other
across rank counts). New tests/parallel/test_1017_custom_mg_parallel_mpi.py
(scalar + Stokes correctness + legacy serial-guard). Serial 20 unchanged.

Underworld development team with AI support from Claude Code
Underworld development team with AI support from Claude Code
@lmoresi lmoresi marked this pull request as ready for review June 29, 2026 04:20
Copilot AI review requested due to automatic review settings June 29, 2026 04:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

lmoresi added 2 commits June 29, 2026 14:43
…+ Constrained tests)

Verify custom-P works for every solver family that consumes the mesh, so a
mesh-owned adapted hierarchy drives MG uniformly:

- SNES_Vector.solve was MISSING the inject_custom_mg hook (only SNES_Scalar
  and SNES_Stokes_SaddlePt had it) — so Vector_Projection silently stayed on
  GAMG. Add the hook (mirrors the scalar one). All 8 solver-subclass solve()
  overrides delegate to a base solve(), so this completes coverage:
  scalar branch -> Poisson/Darcy/Projection/AdvDiff/Diffusion;
  vector branch -> Vector_Projection; velocity-block -> Stokes/VE/Constrained/NS.

- test_1017: add SNES_Vector (Vector_Projection, top-level vector PC,
  field_id=None) and Stokes_Constrained (free-slip multipliers, grouped [p,h]
  split, field_id=0) serial tests — both drive custom-P 'mg', constrained
  matches analytic SolCx to 4.4e-4.

- parallel test: add Vector (passes np=2) and Constrained. The Constrained
  parallel test is SKIPPED: Stokes_Constrained is not parallel-safe yet — it
  segfaults at np>1 independently of custom-P (canonical
  test_1062_constrained_solcx also segfaults at np=2 under plain GAMG). It
  auto-enables when the constrained solver becomes parallel-ready.

Serial: custom_mg 22 + projections + constrained green. Parallel np=2:
scalar + Stokes-velocity + vector + legacy-guard pass.

Underworld development team with AI support from Claude Code
… limitation

Underworld development team with AI support from Claude Code
lmoresi added 2 commits June 29, 2026 15:01
Underworld development team with AI support from Claude Code
…k fieldsplit

The Stokes velocity-block injection forces computeJacobian, then reaches the
fieldsplit BEFORE the SNES solve. SNESSolve normally wires the freshly-assembled
Jacobian into the KSP/outer-PC; doing it ourselves was missing, so for some
configurations (e.g. Stokes on an SBR-refined child mesh with a mesh-variable
bodyforce) the outer PC carried an UNASSEMBLED operator and PCSetUp failed with
"Matrix must be set first" (err73). Fix: snes.getKSP().setOperators(J, Pmat)
after forcing assembly, before outer_pc.setUp().

Found via the Layer-2 adapt-on-top prototype (Stokes on a fault-refined child).
Diagnosed: ksp.A set but asm=False while J.assembled=True -> operator not wired.

New regression test_custom_fmg_stokes_on_sbr_child (Stokes velocity-block custom-P
on an SBR child with a mesh-variable bodyforce) — fails pre-fix, passes after.
test_1014/1015/1016/1017 green (serial); parallel np2 unchanged.

Underworld development team with AI support from Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants