Generalized FMG hierarchy: custom-P transfers for arbitrary nested/non-nested grids (Layer 1) by lmoresi · Pull Request #290 · underworldcode/underworld3

lmoresi · 2026-06-28T20:37:24Z

What

Generalize UW3's geometric FMG so the multigrid hierarchy can use arbitrary coarse grids — nested or non-nested, uniform or locally (SBR) refined — by building the prolongations ourselves (barycentric / local-RBF) and installing them via PC.setMGInterpolation, with Galerkin RAP forming the coarse operators. This decouples geometric multigrid from uniform refine() nesting, enabling locally-adapted hierarchies while keeping FMG.

Independent of any adapter ("adapt-on-top" is a separate follow-on). The existing Mesh(refinement=N) uniform FMG path is unchanged.

Status — Layer 1, Phase-1 hardening complete

Home: utilities/custom_mg.py. Entry point: set_custom_fmg(solver, coarse_meshes, builder=, field_id=).

Working + tested — serial test_1014/1015/1016/1017 (20) + parallel tests/parallel/test_1017_custom_mg_parallel_mpi.py (np=2):

CustomMGHierarchy, set_custom_fmg, sbr_refine / sbr_refine_where (local Skeleton-Based Refinement; no MMG; on-rank; conforming).
BC applied at every level (transfers reduced→reduced) with a zero-column guard — the load-bearing invariant (PETSc native FMG does this via constrained sections; custom-P must replicate it).
Scalar / single-field-vector (top-level PC) and Stokes velocity block (field_id=0).
Serial and parallel (np>1), nested co-partitioned.

Hardening steps (all done)

Stokes / saddle-point velocity block. The velocity sub-PC is unreachable until the monolithic Jacobian is assembled (PCFieldSplit forms A_vv via MatCreateSubMatrix; snes.setUp builds structure only). So _install_velocity_block_transfers forces a Jacobian assembly (computeFunction+computeJacobian at the zero guess; max_it=0 fallback), reaches the velocity sub-PC, resets it and rebuilds a fresh PCMG from our P (mirrors the proven standalone recipe; sidesteps the MatProductReplaceMats live-swap bug), and re-attaches the coupled Stokes nullspace. Wired into SNES_Stokes_SaddlePt.solve as a guarded no-op.
Dropped the throwaway-solver factory. Each coarse level's BC-constrained reduced map is derived directly from its DM via copyFields + copyDS + createDS (_coarse_reduced_map) — the DS carries UW's exact essential-BC definitions and is topology-independent, so it constrains any coarse mesh sharing the solver's boundary labels. Validated byte-identical to the old factory path; leak-free (no SNES / JIT). set_custom_fmg / build no longer take a level_solver_factory.
Parallel (np>1). Nested co-partitioned transfers: each rank builds its block of P rank-locally (ghost-inclusive coarse coords → every owned fine node lands in a local coarse simplex), reduced global numbering rides the DM global section (_level_dof_layout), transfers assemble as MPIAIJ (constrained coarse DOFs drop → reduced→reduced), parallel zero-column guard via Pᵀ·1. The legacy finest-only set_custom_mg path stays serial-only and raises loudly at np>1.

Validation

	iters	vs GAMG	notes
Scalar jump-coeff Poisson, 5-level (3 uniform + 2 SBR)	3 FMG	46
SolCx velocity block (η-jump 1e6), 3-level	6	~198	matches GAMG solution to 1.5e-9
SolCx velocity block vs native FMG	6	(native 5)	ties native FMG; value = works where native FMG cannot (non-nested / adapted)
Parallel np=1/2/4 (Poisson + Stokes)	4 / 6		matches GAMG and each other across rank counts

Out of scope (later)

Non-nested custom-P in parallel (cross-rank point location) — serial-only. Layer-2 adapt-on-top adapter is a separate follow-on.

Design: docs/developer/design/GENERALIZED_FMG_HIERARCHY_AND_ADAPT.md

Underworld development team with AI support from Claude Code

Add set_custom_mg() + utilities/custom_mg.py: drive PCMG geometric multigrid with a prolongation we build ourselves (barycentric or RBF) from independent, possibly non-nested coarse meshes. Coarse operators come from Galerkin RAP (P^T A P), so only the prolongation list is supplied. This decouples geometric multigrid from a nested refine() hierarchy. Mechanism: inject the custom P before the first PCSetUp (avoids the MatProductReplaceMats shape error a live matrix swap triggers under Galerkin), and KSPSetDMActive(OPERATOR, False) so PETSc uses our explicit P instead of re-deriving DM-hierarchy interpolation. Finest level reduced to the BC-eliminated global ordering; coarse levels use full DOF coords. Validated (scalar Poisson, refinement=2 box): custom barycentric P from independent coarse meshes reaches FMG iteration counts (1-3 vs FMG 2) and the same solution. On a mesh with NO refine hierarchy (FMG unavailable -> GAMG, 13 iters) custom-P geometric MG converges in ~9 iters. test_1015 (4 tests); test_1014 (11) still pass. Single-field (scalar/vector) only; Stokes velocity-block path is a follow-up. Underworld development team with AI support from Claude Code

…coped SBR Phase 1 of hardening the custom-prolongation work into a generalized FMG hierarchy. - CustomMGHierarchy: adapter-agnostic multi-level hierarchy that builds custom-P transfers (barycentric/rbf) with essential BCs applied at EVERY level (transfers map reduced->reduced). This is the load-bearing invariant: on an exactly-nested hierarchy, omitting per-level reduction makes a coarse boundary DOF coincide with a BC-removed fine DOF -> zero column -> singular Galerkin coarse operator. build() guards against zero columns. - set_custom_fmg(): register a hierarchy + per-level solver factory; built and installed at solve() time (build-time injection, DMActive(OPERATOR,False), Galerkin RAP). Scalar / single-field-vector (top-level PC); Stokes velocity block is Phase 2. - sbr_refine / sbr_refine_where: local Skeleton-Based Refinement (no MMG, on-rank, conforming) with SCOPED dm_plex_transform_type (leaking it globally breaks UW's uniform refine() with err73). - Legacy finest-only path kept for back-compat (test_1015). Validated: scalar jump-coefficient Poisson on a 5-level (3 uniform + 2 SBR) hierarchy converges in 3 FMG iters vs GAMG 46, solution matches to 2.4e-8. test_1016 (3 tests); test_1015 (4) + test_1014 (11) still pass. Design: docs/developer/design/GENERALIZED_FMG_HIERARCHY_AND_ADAPT.md Underworld development team with AI support from Claude Code

Layer 1 (generalized FMG hierarchy) is an independent, working capability (arbitrary nested/non-nested coarse grids -> geometric FMG, BC-per-level). Current scope experimental: scalar/single-field, serial, factory API. Remaining before merge-ready general feature: Stokes velocity-block, drop the factory (label-based BC reduction), parallel. Underworld development team with AI support from Claude Code

Integrate the generalized FMG hierarchy into the saddle-point solver's velocity sub-block. set_custom_fmg(..., field_id=0) now drives geometric multigrid on the velocity block with our supplied (barycentric / RBF) prolongations + Galerkin RAP. The velocity sub-PC is unreachable until the monolithic Jacobian is assembled (PCFieldSplit forms A_vv via MatCreateSubMatrix; snes.setUp builds structure only -> err73). So _install_velocity_block_transfers: forces a Jacobian assembly (computeFunction + computeJacobian at the zero guess; throwaway max_it=0 fallback), reaches the velocity sub-PC, reset()s it and rebuilds a FRESH PCMG from our P (mirrors the proven standalone recipe; sidesteps the MatProductReplaceMats live-swap bug), re-attaches the coupled Stokes nullspace. _configure_pcmg now derives the options prefix from pc.getOptionsPrefix() so both the scalar top-level PC and the velocity sub-PC are configured correctly. inject_custom_mg is wired into SNES_Stokes_SaddlePt.solve (guarded no-op unless set_custom_fmg was called), injected after setFromOptions / nullspace, before the real solve. Validated (SolCx eta_B=1e6, 3-level nested hierarchy, in-solver via solver.solve()): velocity block converges in 6 MG iters vs GAMG 198, solution matches the GAMG reference to 1.5e-9. test_1017 (barycentric + rbf). test_1014/1015/1016 unchanged (20 pass total). Underworld development team with AI support from Claude Code

…, Step 1) Underworld development team with AI support from Claude Code

Each coarse level's BC-constrained reduced map is now derived directly from the coarse mesh DM via _coarse_reduced_map: clone the DM, copy the (built) finest solver's fields + DS onto it, createDS, read the global section. The DS carries UW's exact essential-BC definitions and is a topology-independent discretisation spec, so it constrains the matching boundary DOFs on ANY coarse mesh that shares the solver's boundary labels (nested or not). This removes the throwaway-solver factory (heavy, leak-prone, and an API wart): set_custom_fmg and CustomMGHierarchy.build no longer take a level_solver_factory. Leak-free — DM ops only, no SNES / JIT. Validated identical to the old factory path (reduced maps byte-equal: 126 + 510 DOFs on the SolCx 3-level hierarchy). test_1014/1015/1016/1017 all green (20). custom-P vs native FMG unchanged (6 vs 5 iters, sol match 3.7e-10). Underworld development team with AI support from Claude Code

Underworld development team with AI support from Claude Code

…tial) The reduced maps use rank-local DOF indices and the prolongations assemble as serial AIJ, so at np>1 custom-P would silently build wrong P (or hit a cryptic broadcast error). Add _require_serial: set_custom_fmg and inject_custom_mg now raise a clear NotImplementedError on >1 ranks, pointing to preconditioner='fmg'/'gamg'. Serial behaviour unchanged. Parallel test tests/parallel/test_1017_custom_mg_serial_guard_mpi.py (np=2) asserts both the set-time and legacy solve-time guards fire. Full parallel custom-P (nested co-partitioned, rank-local point location + ghosting, MPIAIJ assembly with global-section reduction) remains a designed fast-follow. Underworld development team with AI support from Claude Code

Underworld development team with AI support from Claude Code

…(Phase 1, Step 3) The hierarchy path now builds parallel-correct transfers. Each rank builds its block of P rank-locally: ghost-inclusive coarse coords mean every owned fine node lands in a local coarse simplex (verified 0 misses np=2/4), and the reduced global numbering rides the DM global section via a clean local->global-reduced map (_level_dof_layout scatters owned global indices out through globalToLocal; constrained DOFs stay -1, ghosts resolve to the owner's global index). Transfers assemble as MPIAIJ (owned fine rows, global coarse cols incl. off-rank); constrained coarse DOFs drop -> reduced->reduced. Parallel zero-column guard via P^T·1 + allreduce. _coarse_dof_layout reuses the leak-free copyDS trick for coarse levels. CustomMGHierarchy.build branches serial (scipy CSR) vs parallel (MPIAIJ). The guard now blocks only the legacy finest-only path (still serial). Validated np=1/2/4: scalar Poisson custom-P 4 iters and Stokes SolCx velocity block 6 iters, both matching the GAMG reference (and each other across rank counts). New tests/parallel/test_1017_custom_mg_parallel_mpi.py (scalar + Stokes correctness + legacy serial-guard). Serial 20 unchanged. Underworld development team with AI support from Claude Code

Underworld development team with AI support from Claude Code

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

…+ Constrained tests) Verify custom-P works for every solver family that consumes the mesh, so a mesh-owned adapted hierarchy drives MG uniformly: - SNES_Vector.solve was MISSING the inject_custom_mg hook (only SNES_Scalar and SNES_Stokes_SaddlePt had it) — so Vector_Projection silently stayed on GAMG. Add the hook (mirrors the scalar one). All 8 solver-subclass solve() overrides delegate to a base solve(), so this completes coverage: scalar branch -> Poisson/Darcy/Projection/AdvDiff/Diffusion; vector branch -> Vector_Projection; velocity-block -> Stokes/VE/Constrained/NS. - test_1017: add SNES_Vector (Vector_Projection, top-level vector PC, field_id=None) and Stokes_Constrained (free-slip multipliers, grouped [p,h] split, field_id=0) serial tests — both drive custom-P 'mg', constrained matches analytic SolCx to 4.4e-4. - parallel test: add Vector (passes np=2) and Constrained. The Constrained parallel test is SKIPPED: Stokes_Constrained is not parallel-safe yet — it segfaults at np>1 independently of custom-P (canonical test_1062_constrained_solcx also segfaults at np=2 under plain GAMG). It auto-enables when the constrained solver becomes parallel-ready. Serial: custom_mg 22 + projections + constrained green. Parallel np=2: scalar + Stokes-velocity + vector + legacy-guard pass. Underworld development team with AI support from Claude Code

… limitation Underworld development team with AI support from Claude Code

Underworld development team with AI support from Claude Code

…k fieldsplit The Stokes velocity-block injection forces computeJacobian, then reaches the fieldsplit BEFORE the SNES solve. SNESSolve normally wires the freshly-assembled Jacobian into the KSP/outer-PC; doing it ourselves was missing, so for some configurations (e.g. Stokes on an SBR-refined child mesh with a mesh-variable bodyforce) the outer PC carried an UNASSEMBLED operator and PCSetUp failed with "Matrix must be set first" (err73). Fix: snes.getKSP().setOperators(J, Pmat) after forcing assembly, before outer_pc.setUp(). Found via the Layer-2 adapt-on-top prototype (Stokes on a fault-refined child). Diagnosed: ksp.A set but asm=False while J.assembled=True -> operator not wired. New regression test_custom_fmg_stokes_on_sbr_child (Stokes velocity-block custom-P on an SBR child with a mesh-variable bodyforce) — fails pre-fix, passes after. test_1014/1015/1016/1017 green (serial); parallel np2 unchanged. Underworld development team with AI support from Claude Code

lmoresi added 11 commits June 28, 2026 10:31

docs(custom_mg): mark Stokes velocity-block integration done (Phase 1…

cb77289

…, Step 1) Underworld development team with AI support from Claude Code

docs(custom_mg): mark factory removal done (Phase 1, Step 2)

2e2c4f9

Underworld development team with AI support from Claude Code

docs(custom_mg): record np>1 serial guard (Phase 1, Step 3)

caff202

Underworld development team with AI support from Claude Code

docs(custom_mg): record parallel np>1 done (Phase 1, Step 3)

af8e734

Underworld development team with AI support from Claude Code

lmoresi marked this pull request as ready for review June 29, 2026 04:20

Copilot AI review requested due to automatic review settings June 29, 2026 04:20

Copilot AI reviewed Jun 29, 2026

lmoresi added 2 commits June 29, 2026 14:43

docs(custom_mg): solver-family coverage matrix + constrained-parallel…

32fec5a

… limitation Underworld development team with AI support from Claude Code

lmoresi mentioned this pull request Jun 29, 2026

Stokes_Constrained segfaults at np>1 in the interior-multiplier section reduction #291

Open

lmoresi added 2 commits June 29, 2026 15:01

docs(custom_mg): link constrained-parallel segfault to issue #291

d7a4ca1

Underworld development team with AI support from Claude Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generalized FMG hierarchy: custom-P transfers for arbitrary nested/non-nested grids (Layer 1)#290

Generalized FMG hierarchy: custom-P transfers for arbitrary nested/non-nested grids (Layer 1)#290
lmoresi wants to merge 15 commits into
developmentfrom
feature/custom-mg-prolongation

lmoresi commented Jun 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lmoresi commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Status — Layer 1, Phase-1 hardening complete

Hardening steps (all done)

Validation

Out of scope (later)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmoresi commented Jun 28, 2026 •

edited

Loading