diff --git a/docs/developer/design/GENERALIZED_FMG_HIERARCHY_AND_ADAPT.md b/docs/developer/design/GENERALIZED_FMG_HIERARCHY_AND_ADAPT.md new file mode 100644 index 00000000..4380f1d0 --- /dev/null +++ b/docs/developer/design/GENERALIZED_FMG_HIERARCHY_AND_ADAPT.md @@ -0,0 +1,199 @@ +# Generalized FMG hierarchy + adapt-on-top + +Status: design (hardening the custom-prolongation prototype). 2026-06-28. + +## Context + +UW3's geometric FMG (`Mesh(refinement=N)` → `dm_hierarchy`, `preconditioner="fmg"`) +requires a **uniform nested** hierarchy with PETSc's canonical nested interpolation. +That couples multigrid to uniform refinement and rules out local adaptation. + +A prototype established that we can drive geometric multigrid with a prolongation we +**build ourselves** (barycentric or local-RBF) and install via `PC.setMGInterpolation`, +with Galerkin RAP forming the coarse operators. Validated: +- FMG-equivalent convergence on non-nested *and* nested hierarchies (SolCx velocity + block, η-jump 1e6: 8 iters, vs GAMG 157 — geometric MG where FMG cannot otherwise exist); +- a 5-level hierarchy (3 uniform + 2 SBR-targeted at the jump) solves at 8 iters; +- local SBR refinement is a ~4-line stock-petsc4py call (no MMG): mark cells in a label, + `dm_plex_transform_type=refine_sbr`, `dm.adaptLabel("name")` — conforming, on-rank. + +**Load-bearing correctness lesson:** essential BCs must be applied at **every** level, +transfers mapping reduced→reduced. PETSc native FMG does this automatically (each level's +`PetscSection` carries the Dirichlet constraints); custom-P from raw coordinates must +replicate it. Exact nesting makes the omission fatal (coarse boundary-normal DOFs coincide +with BC-removed fine DOFs → zero columns → singular coarse operator). + +## Two-layer architecture + +### Layer 1 — generalized FMG hierarchy (adapter-agnostic) +**The existing `refinement=N` uniform hierarchy STANDS** and is the FMG base. Layer 1 +generalizes the *transfer machinery* so the hierarchy may contain levels that are **not** +uniform refinements, by carrying **supplied prolongations** instead of relying on canonical +nesting. + +Layer 1 knows nothing about *how* a level was produced. Its contract is a sequence of +levels, each `(dm, prolongation_to_next)`, plus: +- **BC-per-level reduction** — transfers map reduced→reduced (the invariant above), derived + from the owning solver's essential BCs via each level DM's global section. +- **Transfer builders** — pluggable: `nested` (PETSc canonical, for uniform levels), + `barycentric` (FE-exact, needs a triangulation), `rbf` (local kNN, scattered-point + robust, reuses the swarm-proxy RBF kernel). Default per level: `nested` where the level + is a genuine uniform refinement, else `barycentric`. +- **Installation** — set the transfers on the velocity-block (Stokes) or top (scalar/vector) + PCMG via `setMGInterpolation`, before first `PCSetUp`, with `KSPSetDMActive(OPERATOR,False)` + and `pc_mg_galerkin=both`. Re-attach nullspaces (pressure / rigid-body) after rebuild. + +Home: `src/underworld3/utilities/custom_mg.py` (grow the committed module). + +Generality requirement: Layer 1's interface must admit **any** future level source +(MMG/parmmg, edge/face-point injection, knockout-as-coarsening, independent meshes) and +**any** transfer builder. No SBR-specific assumptions in Layer 1. + +### Layer 2 — adapt-on-top (first concrete application) +On top of the uniform FMG base, add refined levels via the **existing adaptive-meshing +pattern**: choose an **adapter** + a **metric**, call **periodically**; field +transfer/redistribution handled by the existing remesh machinery +(`remesh_with_field_transfer`, `on_remesh`, `mesh.adapt`). + +This pass's adapter is **SBR refine-on-top**: +- metric → mark cells (e.g. |∇field| threshold, or distance-to-feature) → `adaptLabel` + with `refine_sbr` → a new finest level on top of the current finest; +- **non-load-balancing** (on-rank, no redistribution) — which **bounds the number of + extra levels** (imbalance grows with depth); the adapter exposes/limits this; +- registers `(new level DM, custom-P transfer)` into the Layer-1 hierarchy. + +Home: an adapter consistent with `adaptivity.py` / `mesh.adapt(metric, adapter=...)`, +calling into Layer 1. Future adapters (load-balancing MMG, etc.) plug in the same way. + +## Parallel strategy (parallel from the start) + +The supported parallel path is the **nested, co-partitioned** hierarchy: +- The uniform base is co-partitioned by construction (`dm.refine()` preserves partition); + SBR is on-rank → levels stay co-located (a fine cell's parent coarse cell is on the + same rank). So each rank builds its block of `P` from its **local** coarse cells — + point-location is rank-local; only a thin ghost layer at partition boundaries. +- **BC-per-level reduction is parallel-correct for free**: it rides the DM global section + (`localToGlobal` gives the reduced global ordering across ranks). +- `P` is assembled as an MPIAIJ matrix (fine local rows; coarse local + ghost columns). + +**Non-nested / independent-mesh custom-P** (cross-rank point-location) is **serial-only / +experimental** — it is not on the parallel-supported path. The production case +(targeted refinement = nested) needs only the rank-local path. + +Non-load-balancing SBR → bound the added refinement depth (configurable cap); document the +imbalance/level trade-off. + +## Correctness invariants (must hold, all levels) +1. BCs applied at every level; transfers reduced→reduced (no zero columns). +2. Each prolongation reproduces constants (row-sums = 1 — partition of unity). +3. `pc_mg_galerkin=both` (coarse operators = PᵀAP; UW installs no coarse residual/Jacobian). +4. Nullspaces (constant-pressure / rigid-body) re-attached after any rebuild. +5. No silent fallback: if a level lacks a valid transfer, error (don't degrade to GAMG silently). + +## Test matrix +- Hierarchy build: uniform-only; uniform + N SBR levels; nesting + label survival. +- Per-level reduction: zero-column count == 0 on all transfers (scalar + free-slip vector). +- Transfer: partition-of-unity; barycentric vs rbf; nested-exact vs general. +- Solve: FMG + V-cycle converge on nested and (serial) non-nested; match uniform-FMG iters; + beat GAMG; regression — existing FMG/GAMG paths unchanged. +- Parallel (np>1): co-located transfers, reduced sections, FMG convergence; bounded levels. + +## Phased roadmap +1. **Layer 1 core** — generalized hierarchy + auto BC-per-level reduction + transfer builders + (nested/barycentric/rbf) + install into PCMG; scalar + free-slip vector; serial-validated, + designed parallel-correct (rank-local construction). +2. **Stokes integration** — `set_custom_mg`/hierarchy on the velocity block; nullspace + re-attach; guards; SolCx end-to-end. +3. **Parallel validation** — np>1 tests; co-location/ghost handling; reduced-section + correctness; fix the known rank-local-accumulator pitfalls. +4. **Layer 2 adapter** — SBR refine-on-top following `mesh.adapt` (metric→mark→refine→register), + non-load-balancing, bounded depth, field transfer via existing remesh. +5. **Tests + design doc finalize + tier classification.** + +Later (not this pass; do not block): dynamic per-step reallocation loop; efficiency +(transfer caching, exact-nested combinatorial P, operator reuse); load-balancing adapters; +non-nested parallel transfers; knockout-as-coarsening as an alternative Layer-2 strategy. + +## Status (2026-06-29) + +**Layer 1 is an independent, working capability** — arbitrary coarse grids +(nested *or* non-nested, uniform *or* SBR-refined) → geometric FMG via custom-P +with BC-per-level reduction. It does not depend on Layer 2. + +Landed (committed, tested — `test_1014/1015/1016/1017` serial 20 + +`tests/parallel/test_1017_custom_mg_parallel_mpi.py` np2): +- `CustomMGHierarchy`, `set_custom_fmg`, `sbr_refine`/`sbr_refine_where`; +- automatic BC-per-level reduction + zero-column guard; +- Stokes velocity-block injection; leak-free per-level reduction (copyDS, no + factory); parallel (np>1) nested co-partitioned transfers; +- validated: scalar jump-coeff Poisson 5-level (3 uniform + 2 SBR) 3 FMG iters + vs GAMG 46; SolCx velocity block 6 iters vs GAMG ~198, np=1/2/4. + +**Current scope:** scalar / single-field-vector **and** Stokes velocity-block; +**serial and parallel** (nested co-partitioned). Non-nested custom-P is serial-only +(experimental). Hardening steps 1–3 are complete; what remains is test +tier-classification + undrafting PR #290. + +**Solver-family coverage** — the install keys off the PC topology, so one of two +branches covers every solver that consumes the mesh (all `solve()` overrides +delegate to a hooked base `solve()`; the `inject_custom_mg` hook lives in +`SNES_Scalar.solve`, `SNES_Vector.solve`, `SNES_Stokes_SaddlePt.solve`): + +| Solver family | Branch (`field_id`) | Serial | Parallel | +|---|---|---|---| +| `SNES_Scalar` (Poisson, Darcy, Projection, AdvDiff, Diffusion) | top-level (`None`) | ✓ | ✓ | +| `SNES_Vector` (Vector_Projection, displacement) | top-level (`None`) | ✓ | ✓ | +| `SNES_Stokes` / VE / NavierStokes | velocity block (`0`) | ✓ | ✓ | +| `SNES_Stokes_Constrained` (in-saddle multipliers) | velocity block (`0`) | ✓ | **skip** ¹ | + +¹ The constrained solver is **not parallel-safe today** — it segfaults at np>1 +*independently of custom-P*, in the interior-multiplier section reduction +(`_constrain_interior_multipliers_in_section`; **issue #291**; the canonical +`test_1062_constrained_solcx` also segfaults at np=2 under plain GAMG; workaround +`_reduce_interior_multiplier = False`). custom-P on the constrained velocity block +works in serial; the parallel test auto-enables once #291 is fixed. + +Hardening steps (all complete as of 2026-06-29): +1. ~~**Stokes / saddle-point** (velocity-block injection).~~ **DONE** (2026-06-29). + `set_custom_fmg(..., field_id=0)` drives custom-P geometric MG on the velocity + sub-block. The sub-PC is unreachable until the monolithic Jacobian is assembled, + so `_install_velocity_block_transfers` forces a Jacobian assembly + (`computeFunction`+`computeJacobian` at the zero guess; `max_it=0` fallback), + reaches the velocity sub-PC, `reset`s it and rebuilds a **fresh PCMG** from our P + (mechanism A — mirrors the proven standalone recipe and sidesteps the + `MatProductReplaceMats` live-swap bug; the Galerkin-off + `MatPtAP` path remains + the documented fallback), then re-attaches the coupled Stokes nullspace. Wired + into `SNES_Stokes_SaddlePt.solve` as a guarded no-op. Validated on SolCx + (η-jump 1e6, 3-level nested, in-solver): velocity block 6 MG iters vs GAMG 198, + solution matches GAMG to 1.5e-9 (`test_1017`). NOTE: free-slip velocity + rigid-body modes on `A_vv` + coarse ops are **not yet** handled (reusing + `_attach_stokes_nullspace` covers the SolCx pressure-nullspace case only) — a + Phase-2.5 follow-up. +2. ~~**Drop the factory**.~~ **DONE** (2026-06-29). Each coarse level's + BC-constrained reduced map is derived directly from its DM via + `_coarse_reduced_map`: clone the coarse DM, `copyFields` + `copyDS` from the + finest solver, `createDS`. The DS carries UW's exact essential-BC definitions + and is topology-independent, so it constrains any coarse mesh sharing the + solver's boundary labels — validated byte-identical to the old factory path, + leak-free (no SNES / JIT). `set_custom_fmg` / `build` no longer take a + `level_solver_factory`. +3. ~~**Parallel (np>1)**.~~ **DONE** (2026-06-29). The hierarchy path builds + parallel-correct transfers on the nested co-partitioned hierarchy: each rank + builds its block of `P` rank-locally (ghost-inclusive coarse coords → every + owned fine node lands in a local coarse simplex, verified 0 misses np=2/4), + the reduced global numbering rides the DM global section + (`_level_dof_layout` scatters owned global indices out via `globalToLocal` — + constrained DOFs `-1`, ghosts resolve to the owner's global), and transfers + assemble as MPIAIJ (owned fine rows, global coarse cols incl. off-rank; + constrained coarse DOFs drop → reduced→reduced). Parallel zero-column guard + via `Pᵀ·1` + allreduce. Validated np=1/2/4: scalar Poisson 4 iters + Stokes + SolCx velocity block 6 iters, matching the GAMG reference and each other + across rank counts (`tests/parallel/test_1017_custom_mg_parallel_mpi.py`). The + **legacy** finest-only path (`set_custom_mg` / `_reduce_to_global`) stays + serial-only and raises loudly at np>1. Non-nested custom-P in parallel remains + out of scope (cross-rank point location). + +## Explicit non-goals for this pass +- Knockout (shown to pay full-fine assembly; structural value only) — not pursued now. +- Non-nested custom-P in parallel — serial/experimental only. +- Dynamic time-stepping convection integration — later phase. diff --git a/src/underworld3/cython/petsc_generic_snes_solvers.pyx b/src/underworld3/cython/petsc_generic_snes_solvers.pyx index 296e162e..7c680a88 100644 --- a/src/underworld3/cython/petsc_generic_snes_solvers.pyx +++ b/src/underworld3/cython/petsc_generic_snes_solvers.pyx @@ -93,6 +93,45 @@ class SolverBaseClass(uw_object): # smoother / coarse-solver options with the framework FMG bundle. self._pc_user_override = False + # Custom multigrid prolongation hierarchy (see set_custom_mg / + # utilities.custom_mg). None => standard FMG/GAMG path, unchanged. + self._custom_mg = None + + def set_custom_mg(self, coarse_meshes, kind="barycentric", verbose=False): + r"""Drive geometric multigrid with a prolongation we build ourselves. + + Supplies a sequence of (possibly **non-nested**) coarse meshes from which + a barycentric or RBF prolongation ``P`` is assembled and installed into + the PCMG via ``PC.setMGInterpolation``; coarse operators are formed by + Galerkin RAP. This decouples geometric multigrid from a nested + ``refine()`` hierarchy — it works even when the solver mesh has no + refinement hierarchy at all. + + Parameters + ---------- + coarse_meshes : list of Mesh + Coarsest-first list of coarse meshes (the finest level is the + solver's own mesh). Need not be nested with the solver mesh. + kind : {"barycentric", "rbf"} + Prolongation builder. ``barycentric`` is FE-exact; ``rbf`` is a + polyharmonic RBF (Shepard-normalised). Default ``barycentric``. + verbose : bool + Print the per-level DOF counts at injection. + + Notes + ----- + Single-field (scalar/vector) solvers only for now; the Stokes + velocity-block path is a separate increment. See + :mod:`underworld3.utilities.custom_mg`. + """ + if kind not in ("barycentric", "rbf"): + raise ValueError("kind must be 'barycentric' or 'rbf'") + if not coarse_meshes: + raise ValueError("coarse_meshes must be a non-empty coarsest-first list") + self._custom_mg = {"coarse_meshes": list(coarse_meshes), + "kind": kind, "verbose": verbose} + self.is_setup = False + def add_update_callback(self, callback): r"""Register a callback fired at the start of every nonlinear (SNES) iteration. @@ -2662,6 +2701,13 @@ class SNES_Scalar(SolverBaseClass): # ``constant_nullspace`` was set. self._attach_constant_nullspace() + # Custom multigrid prolongation: inject our P hierarchy before the + # first PCSetUp (so the Galerkin coarse operators are built from it). + # No-op unless set_custom_mg() was called. + if self._custom_mg is not None: + from underworld3.utilities.custom_mg import inject_custom_mg + inject_custom_mg(self) + # solve self._snes_solve_with_retries(gvec, divergence_retries, verbose) @@ -3678,6 +3724,12 @@ class SNES_Vector(SolverBaseClass): # Update constants (e.g. changed material params) before solve self._update_constants() + # Custom geometric-MG prolongation on the (top-level vector) PC, if + # registered via set_custom_fmg. Mirrors the SNES_Scalar hook. + if self._custom_mg is not None: + from underworld3.utilities.custom_mg import inject_custom_mg + inject_custom_mg(self) + # solve self._snes_solve_with_retries(gvec, divergence_retries, verbose) @@ -7412,6 +7464,13 @@ class SNES_Stokes_SaddlePt(SolverBaseClass): self.petsc_options.setValue("snes_max_it", snes_max_it) self.snes.setFromOptions() self._attach_stokes_nullspace() + # Custom geometric-MG prolongation on the velocity block (if registered + # via set_custom_fmg). Injected here — after setFromOptions/nullspace, + # before the real solve — because the velocity sub-PC is only reachable + # once the monolithic Jacobian is assembled (see custom_mg). + if self._custom_mg is not None: + from underworld3.utilities.custom_mg import inject_custom_mg + inject_custom_mg(self) self._snes_solve_with_retries(gvec, divergence_retries, verbose) else: @@ -7422,6 +7481,13 @@ class SNES_Stokes_SaddlePt(SolverBaseClass): self.petsc_options.setValue("snes_max_it", snes_max_it) self.snes.setFromOptions() self._attach_stokes_nullspace() + # Custom geometric-MG prolongation on the velocity block (if registered + # via set_custom_fmg). Injected here — after setFromOptions/nullspace, + # before the real solve — because the velocity sub-PC is only reachable + # once the monolithic Jacobian is assembled (see custom_mg). + if self._custom_mg is not None: + from underworld3.utilities.custom_mg import inject_custom_mg + inject_custom_mg(self) self._snes_solve_with_retries(gvec, divergence_retries, verbose) # Project the rigid-body rotation gauge out of the converged solution. diff --git a/src/underworld3/utilities/custom_mg.py b/src/underworld3/utilities/custom_mg.py new file mode 100644 index 00000000..32023429 --- /dev/null +++ b/src/underworld3/utilities/custom_mg.py @@ -0,0 +1,649 @@ +""" +Custom geometric-multigrid prolongation injection. + +Build the multigrid prolongation ``P`` **ourselves** (barycentric or RBF) from a +sequence of independent — possibly *non-nested* — coarse meshes, and install it +into a solver's PETSc ``PCMG`` via ``PC.setMGInterpolation``. The coarse +operators are then formed by Galerkin RAP (``Pᵀ A P``), exactly as UW3's FMG +already does. + +Why +--- +UW3's geometric FMG requires a nested ``refine()`` hierarchy with PETSc's exact +nested interpolation. That couples multigrid to uniform refinement and rules out +local adaptation / a dynamic node budget. PETSc's ``PCMG`` does **not** require +its transfer operator to come from the DM hierarchy, however: any prolongation +supplied with ``setMGInterpolation`` is used, and ``pc_mg_galerkin=both`` builds +consistent coarse operators from it. The prolongation "evaluate the coarse FE +function at the fine DOF coordinates" is exactly what the nested path computes — +so we can build it ourselves by barycentric (FE-exact) or RBF interpolation for +*any* coarse/fine pair. This decouples *how the mesh refines* from *what transfer +multigrid uses*. + +Validated (scalar Poisson, refinement=2 box): custom barycentric ``P`` from +independent coarse meshes reaches FMG iteration counts (1–3 vs FMG 2). On a mesh +with **no** refine hierarchy (FMG unavailable → GAMG, 13 iters) custom-P +geometric MG converges in ~9 iters. + +Notes +----- +* The finest level is the solver's real (BC-eliminated) operator space; its + ``P`` rows are reduced to the global/MG ordering. Coarse levels use the full + DOF coordinates of each coarse mesh (no BC reduction needed — Galerkin RAP and + the finest reduction keep the coarse correction consistent). +* Injection happens at solver build time, *before* the first ``PCSetUp``: the + first Galerkin assembly is built from our ``P`` directly, avoiding PETSc's + ``MatProductReplaceMats`` shape error that a live matrix swap triggers. +* ``KSPSetDMActive(OPERATOR, False)`` stops PETSc re-deriving interpolation / + operators from the DM hierarchy, so our explicit ``P`` is used. +""" + +import numpy as np +from petsc4py import PETSc + +__all__ = ["barycentric_prolongation", "rbf_prolongation", "inject_custom_mg", + "CustomMGHierarchy", "set_custom_fmg", "sbr_refine", "sbr_refine_where"] + + +# --------------------------------------------------------------------------- # +# Prolongation builders (return scipy CSR matrices) +# --------------------------------------------------------------------------- # +def barycentric_prolongation(coarse_coords, fine_coords): + """FE-exact prolongation: each fine DOF interpolated by the coarse element + that contains it (barycentric weights). Partition of unity (row sums = 1); + fine points outside the coarse hull fall back to the nearest coarse DOF.""" + import scipy.sparse as sp + from scipy.spatial import Delaunay, cKDTree + + tri = Delaunay(coarse_coords) + simp = tri.find_simplex(fine_coords) + tree = cKDTree(coarse_coords) + dim = coarse_coords.shape[1] + rows, cols, vals = [], [], [] + for i in range(fine_coords.shape[0]): + s = simp[i] + if s < 0: # outside coarse hull → nearest coarse DOF + rows.append(i); cols.append(int(tree.query(fine_coords[i])[1])); vals.append(1.0) + continue + verts = tri.simplices[s] + T = tri.transform[s] + bary = T[:dim].dot(fine_coords[i] - T[dim]) + w = np.append(bary, 1.0 - bary.sum()) + for k in range(dim + 1): + rows.append(i); cols.append(int(verts[k])); vals.append(float(w[k])) + return sp.csr_matrix((vals, (rows, cols)), + shape=(fine_coords.shape[0], coarse_coords.shape[0])) + + +def rbf_prolongation(coarse_coords, fine_coords, smooth=0.0): + """RBF prolongation: polyharmonic (r² log r) kernel + affine polynomial tail + (reproduces linear fields), Shepard row-normalised to a partition of unity. + Works for arbitrary (non-nested) point sets; software-equivalent to the + barycentric builder as an MG transfer operator.""" + import scipy.sparse as sp + from scipy.spatial.distance import cdist + + def phi(r): + r = np.where(r == 0.0, 1e-30, r) + return r ** 2 * np.log(r) + + nc, dim = coarse_coords.shape + Pc = np.hstack([np.ones((nc, 1)), coarse_coords]) # affine tail + Acc = phi(cdist(coarse_coords, coarse_coords)) + smooth * np.eye(nc) + M = np.block([[Acc, Pc], [Pc.T, np.zeros((dim + 1, dim + 1))]]) + Minv = np.linalg.inv(M) + B = np.hstack([phi(cdist(fine_coords, coarse_coords)), + np.ones((fine_coords.shape[0], 1)), fine_coords]) + Praw = (B @ Minv)[:, :nc] + rs = Praw.sum(axis=1, keepdims=True) + rs[np.abs(rs) < 1e-12] = 1.0 + return sp.csr_matrix(Praw / rs) + + +_BUILDERS = {"barycentric": barycentric_prolongation, "rbf": rbf_prolongation} + + +def _require_serial(where): + """Custom-P transfers are serial-only (experimental). The reduced maps use + rank-local DOF indices and the prolongations assemble as serial AIJ; at np>1 + they would silently build wrong P / mis-numbered transfers. Parallel support + (nested co-partitioned, rank-local P + MPIAIJ, global-section reduction) is a + designed fast-follow — until then, fail loudly rather than wrong.""" + from underworld3 import mpi + if mpi.size > 1: + raise NotImplementedError( + f"custom_mg ({where}): custom-P geometric MG is serial-only " + f"(running on {mpi.size} ranks). Parallel (np>1) support is not yet " + f"implemented; use preconditioner='fmg' (nested) or 'gamg' in parallel.") + + +# --------------------------------------------------------------------------- # +# Local Skeleton-Based Refinement (no MMG; on-rank; conforming) +# --------------------------------------------------------------------------- # +_DM_ADAPT_REFINE = 1 # PETSc DM_ADAPT_REFINE + + +def _sbr_apply(dm, mark): + """Clone ``dm``, let ``mark(clone, adapt_label)`` flag cells, and SBR-refine. + Conforming (edge bisection + propagation, no hanging nodes), on-rank (no + redistribution), no external mesh libraries. Cell geometry / marking is done + on the CLONE (the un-cloned mesh DM raises err73 from computeCellGeometryFVM). + + IMPORTANT: ``dm_plex_transform_type`` is a *global* PETSc option; leaving it as + ``refine_sbr`` breaks UW3's uniform ``dm.refine()`` (FMG hierarchy build) with + PETSc error 73. It is set only for the duration of the call and restored.""" + opts = PETSc.Options() + had = opts.hasName("dm_plex_transform_type") + prev = opts.getString("dm_plex_transform_type") if had else None + opts.setValue("dm_plex_transform_type", "refine_sbr") + try: + d = dm.clone() + d.createLabel("adapt") + lab = d.getLabel("adapt") + lab.setDefaultValue(0) + mark(d, lab) + return d.adaptLabel("adapt") + finally: + if had: + opts.setValue("dm_plex_transform_type", prev) + else: + opts.delValue("dm_plex_transform_type") + + +def sbr_refine(dm, cells): + """SBR-refine an explicit list of ``cells``.""" + def mark(d, lab): + for c in cells: + lab.setValue(int(c), _DM_ADAPT_REFINE) + return _sbr_apply(dm, mark) + + +def sbr_refine_where(dm, predicate): + """SBR-refine cells whose centroid satisfies ``predicate(centroid)``.""" + def mark(d, lab): + cs, ce = d.getHeightStratum(0) + for c in range(cs, ce): + if predicate(d.computeCellGeometryFVM(c)[1]): + lab.setValue(c, _DM_ADAPT_REFINE) + return _sbr_apply(dm, mark) + + +# --------------------------------------------------------------------------- # +# Coordinate helpers +# --------------------------------------------------------------------------- # +def _to_petsc_aij(csr): + csr = csr.tocsr() + M = PETSc.Mat().createAIJ( + size=csr.shape, + csr=(csr.indptr.astype("int32"), csr.indices.astype("int32"), csr.data), + ) + M.assemble() + return M + + +def _reduce_to_global(dm, full_coords): + """Reduce full local DOF coordinates to the solver's global (BC-eliminated, + MG-ordered) layout by scattering each component through ``localToGlobal`` + (which drops constrained boundary DOFs and reorders to global indices).""" + lvec = dm.getLocalVec() + gvec = dm.getGlobalVec() + cdim = full_coords.shape[1] + if lvec.getLocalSize() != full_coords.shape[0]: + dm.restoreLocalVec(lvec); dm.restoreGlobalVec(gvec) + raise RuntimeError( + f"custom_mg: local DOF count {lvec.getLocalSize()} != coord count " + f"{full_coords.shape[0]} — degree/continuity mismatch") + out = np.zeros((gvec.getSize(), cdim)) + for c in range(cdim): + lvec.array[:] = full_coords[:, c] + dm.localToGlobal(lvec, gvec, addv=False) + out[:, c] = gvec.array + dm.restoreLocalVec(lvec) + dm.restoreGlobalVec(gvec) + return out + + +# --------------------------------------------------------------------------- # +# Layer 1 — generalized FMG hierarchy (BC-reduced, adapter-agnostic) +# +# INVARIANT (load-bearing): essential BCs are applied at EVERY level, so every +# transfer maps reduced->reduced. PETSc native FMG does this automatically via +# each level's constrained PetscSection; custom-P built from raw coordinates must +# replicate it. Omitting it is fatal on an EXACTLY-nested hierarchy: a coarse +# boundary DOF coincides with a BC-removed fine DOF -> zero column -> singular +# Galerkin coarse operator. (See docs/developer/design/ +# GENERALIZED_FMG_HIERARCHY_AND_ADAPT.md.) +# --------------------------------------------------------------------------- # +def _field_subdm(dm, field_id): + """Return the sub-DM for ``field_id`` (whole dm if single-field / None).""" + if field_id is None: + return dm + try: + if dm.getNumFields() <= 1: + return dm + except Exception: + return dm + _iset, sub = dm.createSubDM(field_id) + return sub + + +def _reduced_map(dm, field_id=None): + """Reduced->full DOF map for a field on a BUILT DM (serial). Index ``k`` of + the returned array is the local DOF that maps to global/MG index ``k`` — + i.e. the BC-eliminated, MG-ordered layout the operator lives on. + + NOTE: serial-correct (uses local indices). Parallel correctness — mapping to + *global* numbering across ranks — is a Phase-3 item (the supported parallel + path keeps levels co-partitioned so this stays rank-local).""" + sub = _field_subdm(dm, field_id) + lv = sub.getLocalVec() + n_full = lv.getLocalSize() + lv.array[:] = np.arange(n_full) + gv = sub.getGlobalVec() + sub.localToGlobal(lv, gv, addv=False) + r2f = gv.array.astype(int).copy() + sub.restoreLocalVec(lv) + sub.restoreGlobalVec(gv) + return r2f, n_full + + +def _coarse_reduced_map(solver, coarse_mesh, field_id=None): + """A COARSE level's BC-constrained reduced map, with NO throwaway solver. + + Clone the coarse mesh DM and copy the (built) finest ``solver``'s fields + DS + onto it. The DS carries UW's exact essential-BC definitions (the custom + essential-field boundaries), and is a topology-independent discretisation spec, + so ``createDS`` constrains the matching boundary DOFs on ANY coarse mesh that + carries the same boundary labels (nested or not). The resulting global section + gives the same reduced map a full same-discretisation solver would — validated + identical to the old ``level_solver_factory`` path. Leak-free: DM ops only, no + SNES / JIT. + + NOTE: serial, like ``_reduced_map`` (uses local indices); parallel correctness + is a Phase-3 item.""" + cdm = coarse_mesh.dm.clone() + solver.dm.copyFields(cdm) + solver.dm.copyDS(cdm) + cdm.createDS() + return _reduced_map(cdm, field_id) + + +def _reduced_transfer(coarse_coords, fine_coords, r2f_c, r2f_f, ncomp, builder): + """Build one prolongation reduced(coarse) -> reduced(fine): + node-level scalar P -> interleave ``ncomp`` components -> drop BC rows/cols.""" + import scipy.sparse as sp + Pn = builder(coarse_coords, fine_coords) # (n_f_nodes, n_c_nodes) + Pv = sp.kron(Pn, sp.eye(ncomp), format="csr") # interleaved full vector + Pr = Pv.tocsr()[r2f_f, :][:, r2f_c] # reduced -> reduced + return Pr + + +# --------------------------------------------------------------------------- # +# Parallel (np>1) — nested co-partitioned, rank-local P + MPIAIJ +# +# Supported path: the levels are co-partitioned (uniform refine() / on-rank SBR +# keep a fine cell's parent coarse cell on the same rank), so each rank builds +# its block of P from its LOCAL (ghost-inclusive) coarse coords — point-location +# is rank-local. The reduced global numbering rides the DM global section. +# --------------------------------------------------------------------------- # +def _level_dof_layout(dm, field_id=None): + """Parallel DOF layout for one level: ``(l2g, rstart, rend, n_full)``. + + ``l2g[i]`` is the GLOBAL reduced index of local DOF ``i`` — ghost-resolved to + the owner's global index, and ``-1`` for a BC-constrained DOF. Built by + scattering each owned global index out to the local (incl. ghost) layout via + ``globalToLocal`` (constrained local DOFs have no global source, so they keep + the pre-set ``-1``). ``[rstart, rend)`` is this rank's owned global range.""" + sub = _field_subdm(dm, field_id) + gv = sub.getGlobalVec() + lv = sub.getLocalVec() + rstart, rend = gv.getOwnershipRange() + gv.array[:] = np.arange(rstart, rend, dtype=float) + lv.set(-1.0) + sub.globalToLocal(gv, lv, addv=PETSc.InsertMode.INSERT_VALUES) + l2g = np.rint(lv.array).astype(np.int64).copy() + n_full = lv.getLocalSize() + sub.restoreGlobalVec(gv) + sub.restoreLocalVec(lv) + return l2g, rstart, rend, n_full + + +def _coarse_dof_layout(solver, coarse_mesh, field_id=None): + """Parallel coarse-level DOF layout, no throwaway solver — same copyDS trick + as :func:`_coarse_reduced_map` but returning the parallel ``l2g`` layout.""" + cdm = coarse_mesh.dm.clone() + solver.dm.copyFields(cdm) + solver.dm.copyDS(cdm) + cdm.createDS() + return _level_dof_layout(cdm, field_id) + + +def _build_parallel_transfer(cc, fc, lay_c, lay_f, ncomp, builder, comm): + """One reduced->reduced prolongation as an MPIAIJ matrix. + + Node-level barycentric/RBF weights are built rank-locally (coarse LOCAL coords + incl. ghosts -> every owned fine node lands in a local coarse simplex). Each + fine OWNED DOF becomes a global row; its coarse contributions map through the + coarse ``l2g`` to global columns (off-rank columns are fine for MPIAIJ). + Constrained coarse DOFs (``l2g == -1``) drop out -> reduced->reduced.""" + l2g_c, cstart, cend, _ = lay_c + l2g_f, fstart, fend, _ = lay_f + Pn = builder(cc, fc).tocsr() # (n_f_nodes, n_c_nodes), local + nloc_f = fend - fstart + nloc_c = cend - cstart + + P = PETSc.Mat().create(comm=comm) + P.setSizes(((nloc_f, None), (nloc_c, None))) + P.setType("aij") + P.setUp() + for i in range(Pn.shape[0]): # fine local node + js = Pn.indices[Pn.indptr[i]:Pn.indptr[i + 1]] + ws = Pn.data[Pn.indptr[i]:Pn.indptr[i + 1]] + for c in range(ncomp): + grow = int(l2g_f[i * ncomp + c]) + if grow < fstart or grow >= fend: # set OWNED rows only + continue + gcols, vals = [], [] + for jj, w in zip(js.tolist(), ws.tolist()): + gcol = int(l2g_c[jj * ncomp + c]) + if gcol >= 0: # skip constrained coarse DOFs + gcols.append(gcol) + vals.append(w) + if gcols: + P.setValues([grow], gcols, vals, addv=PETSc.InsertMode.INSERT_VALUES) + P.assemble() + return P + + +def _assert_no_zero_columns_parallel(P, comm): + """Parallel zero-column guard: a coarse DOF with no fine image -> singular + Galerkin coarse operator. Column sums via P^T·1 (weights are positive, so a + zero sum means an empty column).""" + ones_f = P.createVecLeft(); ones_f.set(1.0) + colsum = P.createVecRight() + P.multTranspose(ones_f, colsum) + nzero_local = int((colsum.array == 0.0).sum()) + nzero = comm.tompi4py().allreduce(nzero_local) + ones_f.destroy(); colsum.destroy() + if nzero: + raise RuntimeError( + f"parallel transfer has {nzero} zero columns (coarse DOFs with no fine " + f"image) — BC-per-level reduction failed; coarse operator would be singular.") + + +def _configure_pcmg(pc, Ps): + """Reconfigure ``pc`` as a fresh PCMG (FMG F-cycle) driven by the supplied + reduced->reduced prolongations ``Ps``, Galerkin RAP for coarse operators. + + Writes the MG bundle into the options DB under the PC's OWN options prefix + (``pc.getOptionsPrefix()`` — e.g. ``Solver_N_`` for the scalar top-level PC, + ``Solver_N_fieldsplit_velocity_`` for the Stokes velocity sub-PC) and removes + any gamg keys BEFORE ``setFromOptions`` — otherwise ``setFromOptions`` re-reads + a lingering ``pc_type=gamg`` and reverts ``setType("mg")``. ``setMGInterpolation`` + persists through ``setFromOptions``; the first ``PCSetUp`` builds the coarse + operators from our P (no ``MatProductReplaceMats`` shape bug, since the PCMG is + fresh and P's size is fixed).""" + nlev = len(Ps) + 1 + prefix = pc.getOptionsPrefix() or "" + opts = PETSc.Options() + opts.setValue(prefix + "pc_type", "mg") + opts.setValue(prefix + "pc_mg_type", "full") + opts.setValue(prefix + "pc_mg_galerkin", "both") + opts.setValue(prefix + "mg_levels_ksp_type", "richardson") + opts.setValue(prefix + "mg_levels_pc_type", "sor") + opts.setValue(prefix + "mg_coarse_pc_type", "redundant") + opts.setValue(prefix + "mg_coarse_redundant_pc_type", "lu") + for key in ("pc_gamg_type", "pc_gamg_repartition", "pc_gamg_agg_nsmooths"): + opts.delValue(prefix + key) + pc.setType("mg") + pc.setMGLevels(nlev) + pc.setMGType(PETSc.PC.MGType.FULL) + for l in range(1, nlev): + pc.setMGInterpolation(l, Ps[l - 1]) + pc.setFromOptions() + + +def _install_transfers(solver, Ps, verbose=False): + """Configure the managed PCMG block to use the supplied prolongations. + + Two paths, keyed by ``solver._pc_option_prefix``: + + * **scalar / single-field vector** (top-level PC, prefix ``""``): build-time + injection *before* the first ``PCSetUp`` — the first Galerkin assembly is + built from our P directly. + * **Stokes velocity block** (prefix ``"fieldsplit_velocity_"``): the velocity + sub-PC is unreachable until the monolithic Jacobian is assembled + (``PCFieldSplit`` forms ``A_vv`` via ``MatCreateSubMatrix``; ``snes.setUp`` + builds structure only -> err73). So force a Jacobian assembly, reach the + velocity sub-PC, ``reset`` it and rebuild a fresh PCMG from our P, then + re-attach the coupled Stokes nullspace. + + Either way ``KSPSetDMActive(OPERATOR, False)`` stops PETSc re-deriving the + interpolation from the DM hierarchy.""" + nlev = len(Ps) + 1 + pfx = solver._pc_option_prefix or "" + + if pfx == "": + ksp = solver.snes.getKSP() + solver.snes.setUp() + ksp.setDMActive(PETSc.KSP.DMActive.OPERATOR, False) + _configure_pcmg(ksp.getPC(), Ps) + if verbose: + from underworld3 import mpi + mpi.pprint(f"[{solver.name}] custom FMG installed: {nlev} levels, " + f"P sizes {[tuple(P.getSize()) for P in Ps]}") + return + + if pfx != "fieldsplit_velocity_": + raise NotImplementedError( + f"custom_mg install: unsupported PC prefix '{pfx}'.") + + _install_velocity_block_transfers(solver, Ps, verbose=verbose) + + +def _install_velocity_block_transfers(solver, Ps, verbose=False): + """Stokes velocity-block install (mechanism A: reset + fresh PCMG). + + Preconditions: ``solver._build`` + ``setFromOptions`` + ``_attach_stokes_nullspace`` + have run (the call site in ``SNES_Stokes_SaddlePt.solve`` guarantees this), so + the SNES / DM exist but the Jacobian VALUES are not yet assembled.""" + snes = solver.snes + snes.setUp() + + # 1. force monolithic Jacobian assembly so the fieldsplit can form A_vv + x0 = solver.dm.getGlobalVec() + x0.set(0.0) + J = snes.getJacobian()[0] + Pmat = snes.getJacobian()[1] + try: + f = J.createVecLeft() + snes.computeFunction(x0, f) + snes.computeJacobian(x0, J, Pmat) + except PETSc.Error: + # fallback: throwaway max_it=0 solve assembles + splits the operator + saved = (solver.petsc_options.getString("snes_max_it") + if solver.petsc_options.hasName("snes_max_it") else None) + solver.petsc_options["snes_max_it"] = 0 + snes.setFromOptions() + snes.solve(None, x0) + if saved is not None: + solver.petsc_options["snes_max_it"] = saved + snes.setFromOptions() + solver.dm.restoreGlobalVec(x0) + + # Wire the freshly-assembled Jacobian into the KSP/outer-PC. SNESSolve does + # this lazily, but we reach the fieldsplit BEFORE the solve — without it the + # outer PC can carry an unassembled operator and PCSetUp fails with + # "Matrix must be set first" (err73) for some configurations. + snes.getKSP().setOperators(J, Pmat) + + # 2. split -> reach the velocity sub-KSP / sub-PC (field 0) + outer_pc = snes.getKSP().getPC() + outer_pc.setUp() + vel_ksp = outer_pc.getFieldSplitSubKSP()[0] + vel_pc = vel_ksp.getPC() + sub = vel_pc.getOptionsPrefix() or "fieldsplit_velocity_" + A_vv, P_vv = vel_pc.getOperators() # capture before reset (reset drops them) + + # 3. fresh PCMG on the velocity sub-block from our Ps + vel_ksp.setDMActive(PETSc.KSP.DMActive.OPERATOR, False) + vel_pc.reset() + vel_pc.setOperators(A_vv, P_vv) + _configure_pcmg(vel_pc, Ps) + vel_pc.setUp() + + # 4. re-attach the coupled Stokes nullspace (operator state was touched) + solver._attach_stokes_nullspace() + + if verbose: + from underworld3 import mpi + mpi.pprint(f"[{solver.name}] custom FMG installed on velocity block: " + f"{len(Ps) + 1} levels, sub-prefix {sub!r}, " + f"P sizes {[tuple(P.getSize()) for P in Ps]}") + + +class CustomMGHierarchy: + """A generalized FMG hierarchy: a sequence of level meshes (coarsest..finest) + whose transfers are built by a pluggable builder, with BCs applied at every + level. Adapter-agnostic — it consumes meshes + a way to get each level's + BC-reduced DOF map; it does not know how the levels were produced. + + Parameters + ---------- + level_meshes : list of Mesh + Coarsest-first; the LAST entry must be the solver's own mesh. + builder : {"barycentric", "rbf"} + Per-level node prolongation builder. + field_id : int or None + Field index for multi-field solvers (e.g. 0 = velocity); None = single field. + """ + + def __init__(self, level_meshes, builder="barycentric", field_id=None): + if builder not in _BUILDERS: + raise ValueError("builder must be 'barycentric' or 'rbf'") + if len(level_meshes) < 2: + raise ValueError("need at least 2 levels (>=1 coarse + finest)") + self.level_meshes = list(level_meshes) + self.builder = _BUILDERS[builder] + self.builder_name = builder + self.field_id = field_id + self.transfers = None + + def build(self, solver): + """Build the BC-reduced prolongations. ``solver`` is the (built) finest + solver. Each COARSE level's BC-constrained reduced map is derived directly + from the coarse mesh DM by copying the finest solver's fields + DS onto it + (no throwaway solver); the finest level reads its map from ``solver.dm``. + + Serial: scipy reduced->reduced CSR. Parallel (np>1): rank-local node-level + weights assembled into MPIAIJ transfers with global-section reduced + numbering (nested co-partitioned path).""" + from underworld3 import mpi + var = solver.Unknowns.u + degree = var.degree + continuous = getattr(var, "continuous", True) + nlev = len(self.level_meshes) + parallel = mpi.size > 1 + + coords, maps, ncomp = [], [], [] + for k, mesh in enumerate(self.level_meshes): + c = np.asarray(mesh._get_coords_for_basis(degree, continuous)) + finest = (k == nlev - 1) + if parallel: + lay = (_level_dof_layout(solver.dm, self.field_id) if finest + else _coarse_dof_layout(solver, mesh, self.field_id)) + nfull = lay[3] + maps.append(lay) + else: + rmap, nfull = (_reduced_map(solver.dm, self.field_id) if finest + else _coarse_reduced_map(solver, mesh, self.field_id)) + maps.append(rmap) + nc = nfull // c.shape[0] + if nfull % c.shape[0] != 0: + raise RuntimeError( + f"level {k}: full DOFs {nfull} not divisible by nodes {c.shape[0]}") + coords.append(c); ncomp.append(nc) + + if len(set(ncomp)) != 1: + raise RuntimeError(f"inconsistent component counts across levels: {ncomp}") + nc = ncomp[0] + + Ps = [] + for l in range(1, nlev): + if parallel: + P = _build_parallel_transfer(coords[l - 1], coords[l], + maps[l - 1], maps[l], nc, + self.builder, solver.dm.comm) + _assert_no_zero_columns_parallel(P, solver.dm.comm) + Ps.append(P) + else: + Pr = _reduced_transfer(coords[l - 1], coords[l], maps[l - 1], + maps[l], nc, self.builder) + zc = int((np.asarray((Pr != 0).sum(axis=0)).ravel() == 0).sum()) + if zc: + raise RuntimeError( + f"transfer {l-1}->{l} has {zc} zero columns (coarse DOFs with " + f"no fine image) — BC-per-level reduction failed; coarse " + f"operator would be singular.") + Ps.append(_to_petsc_aij(Pr)) + self.transfers = Ps + return Ps + + def install(self, solver, verbose=False): + if self.transfers is None: + raise RuntimeError("call build() before install()") + _install_transfers(solver, self.transfers, verbose=verbose) + + +# --------------------------------------------------------------------------- # +# Entry points +# --------------------------------------------------------------------------- # +def set_custom_fmg(solver, coarse_meshes, *, builder="barycentric", + field_id=None, verbose=False): + """Generalized custom-P FMG with BC-per-level reduction (the correct path). + + Registers a :class:`CustomMGHierarchy` on the solver so that the next + ``solve()`` builds and installs it (build-time injection). The hierarchy is + ``[*coarse_meshes, solver.mesh]``; each coarse level's BC-constrained reduced + map is derived directly from its DM by copying the solver's fields + DS + (``_coarse_reduced_map``), so ``coarse_meshes`` need only carry the same + boundary labels as the solver's mesh. For a saddle-point (Stokes) solver pass + ``field_id=0`` to target the velocity sub-block.""" + solver._custom_mg = { + "mode": "hierarchy", + "hierarchy": CustomMGHierarchy(list(coarse_meshes) + [solver.mesh], + builder=builder, field_id=field_id), + "verbose": verbose, + } + solver.is_setup = False + + +def inject_custom_mg(solver): + """Build + install the custom-P FMG. Called from ``solve()`` (after ``_build``, + before the SNES solve) when ``solver._custom_mg`` is set. Dispatches: + - ``mode == "hierarchy"`` -> BC-per-level reduced path (correct, general); + - legacy dict ``{coarse_meshes, kind}`` -> finest-only reduction (kept for + back-compat; valid only when coarse levels are non-nested / unconstrained).""" + cfg = solver._custom_mg + + if isinstance(cfg, dict) and cfg.get("mode") == "hierarchy": + h = cfg["hierarchy"] + h.build(solver) # parallel-capable (nested co-partitioned) + h.install(solver, verbose=cfg.get("verbose", False)) + return + + # ---- legacy finest-only path (back-compat, serial only) ----------------- + _require_serial("legacy custom_mg (set_custom_mg)") + coarse_meshes = cfg["coarse_meshes"] + builder = _BUILDERS[cfg["kind"]] + var = solver.Unknowns.u + degree = var.degree + continuous = getattr(var, "continuous", True) + if solver._pc_option_prefix not in ("", None): + raise NotImplementedError("legacy custom_mg path is scalar/single-field only") + fine = _reduce_to_global(solver.dm, + solver.mesh._get_coords_for_basis(degree, continuous)) + levels = [m._get_coords_for_basis(degree, continuous) for m in coarse_meshes] + levels.append(fine) + Ps = [_to_petsc_aij(builder(levels[l - 1], levels[l])) for l in range(1, len(levels))] + _install_transfers(solver, Ps, verbose=cfg.get("verbose", False)) diff --git a/tests/parallel/test_1017_custom_mg_parallel_mpi.py b/tests/parallel/test_1017_custom_mg_parallel_mpi.py new file mode 100644 index 00000000..3a3e29a5 --- /dev/null +++ b/tests/parallel/test_1017_custom_mg_parallel_mpi.py @@ -0,0 +1,173 @@ +"""Parallel (np>1) correctness for custom-P geometric MG (custom_mg). + +The nested co-partitioned hierarchy path is parallel-capable: rank-local point +location (ghost-inclusive coarse coords) + global-section reduced numbering + +MPIAIJ transfers. These tests assert that, on >1 ranks, custom-P drives a +converging geometric-MG solve whose result matches a GAMG reference — for both a +scalar Poisson and the Stokes (SolCx) velocity block. The legacy finest-only +path remains serial-only and must still fail loudly. + +Run: + mpirun -n 2 python -m pytest --with-mpi tests/parallel/test_1017_custom_mg_parallel_mpi.py +""" +import numpy as np +import pytest +import underworld3 as uw +from underworld3.function import analytic as A +from underworld3.utilities import custom_mg + +pytestmark = [pytest.mark.mpi(min_size=2), pytest.mark.timeout(180)] + + +def _wrap(dm, m0, q=2): + return uw.discretisation.Mesh( + dm.clone(), simplex=True, + coordinate_system_type=m0.CoordinateSystem.coordinate_type, + qdegree=q, boundaries=m0.boundaries) + + +def _box(cellSize, refinement=None, q=2): + return uw.meshing.UnstructuredSimplexBox( + minCoords=(0., 0.), maxCoords=(1., 1.), + cellSize=cellSize, regular=True, refinement=refinement, qdegree=q) + + +def _rel_l2(a, b): + """Global relative L2 difference of two field-data arrays (rank-reduced).""" + da = np.asarray(a) - np.asarray(b) + num = uw.mpi.comm.allreduce(float(np.sum(da**2))) + den = uw.mpi.comm.allreduce(float(np.sum(np.asarray(b)**2))) + return (num**0.5) / (den**0.5 + 1e-30) + + +def _poisson(mesh): + p = uw.systems.Poisson(mesh) + p.constitutive_model = uw.constitutive_models.DiffusionModel + p.constitutive_model.Parameters.diffusivity = 1 + p.f = 0.0 + p.add_dirichlet_bc(0.0, "Bottom"); p.add_dirichlet_bc(1.0, "Top") + p.petsc_options["ksp_rtol"] = 1e-8; p.petsc_options["ksp_type"] = "cg" + return p + + +@pytest.mark.mpi(min_size=2) +def test_parallel_custom_fmg_scalar(): + """Scalar custom-P drives a converging geometric MG solve in parallel and + matches a GAMG solve of the same problem.""" + assert uw.mpi.size > 1 + m0 = _box(0.25); dm1 = m0.dm.refine(); dm2 = dm1.refine() + coarse = [_wrap(m0.dm, m0), _wrap(dm1, m0)]; fine = _wrap(dm2, m0) + + g = _poisson(_wrap(dm2, m0)); g.preconditioner = "gamg"; g.solve() + c = _poisson(fine) + custom_mg.set_custom_fmg(c, coarse, builder="barycentric") + c.solve() + + assert c.snes.getKSP().getPC().getType() == "mg" + assert c.snes.getConvergedReason() > 0 + assert _rel_l2(c.Unknowns.u.data, g.Unknowns.u.data) < 1e-4 + + +@pytest.mark.mpi(min_size=2) +def test_parallel_custom_fmg_stokes_velocity_block(): + """Custom-P on the SolCx velocity block converges in few MG iters in parallel + and matches the analytic velocity to the GAMG reference's accuracy.""" + assert uw.mpi.size > 1 + fine = _box(0.25, refinement=2, q=3) + coarse = [_wrap(d, fine, q=3) for d in fine.dm_hierarchy[:-1]] + + def solcx(mesh): + sol = A.SolCx(mesh, eta_A=1.0, eta_B=1e6, x_c=0.5, n=1) + s = uw.systems.Stokes(mesh) + s.constitutive_model = uw.constitutive_models.ViscousFlowModel + s.constitutive_model.Parameters.shear_viscosity_0 = sol.fn_viscosity + s.saddle_preconditioner = 1.0 / sol.fn_viscosity + s.bodyforce = sol.fn_bodyforce + s.add_dirichlet_bc((0.0, None), "Left"); s.add_dirichlet_bc((0.0, None), "Right") + s.add_dirichlet_bc((None, 0.0), "Bottom"); s.add_dirichlet_bc((None, 0.0), "Top") + s.petsc_use_pressure_nullspace = True; s.tolerance = 1e-8 + s.petsc_options["snes_type"] = "ksponly" + return s, sol + + sg, solg = solcx(fine); sg.preconditioner = "gamg"; sg.solve() + sc, sol = solcx(fine) + custom_mg.set_custom_fmg(sc, coarse, builder="barycentric", field_id=0) + sc.solve() + vksp = sc.snes.getKSP().getPC().getFieldSplitSubKSP()[0] + + assert sc.snes.getConvergedReason() > 0 + assert vksp.getPC().getType() == "mg" + assert vksp.getPC().getMGLevels() == len(coarse) + 1 + assert vksp.getIterationNumber() <= 15 # geometric MG, not GAMG (~200) + assert sol.velocity_error(sc.u) < 2.0 * solg.velocity_error(sg.u) + 1e-6 + + +@pytest.mark.mpi(min_size=2) +def test_parallel_custom_fmg_vector(): + """SNES_Vector (Vector_Projection) custom-P drives a top-level vector PCMG in + parallel (field_id=None, ncomp=dim). Fine mesh has NO native hierarchy, so the + 'mg' PC can only come from our custom-P.""" + import sympy + assert uw.mpi.size > 1 + m0 = _box(0.25); dm2 = m0.dm.refine().refine() + coarse = [_wrap(m0.dm, m0), _wrap(m0.dm.refine(), m0)] + fine = _wrap(dm2, m0) + x, y = fine.X + v = uw.discretisation.MeshVariable("Ucpp", fine, fine.dim, degree=2) + proj = uw.systems.Vector_Projection(fine, v) + proj.uw_function = sympy.Matrix([[sympy.sin(sympy.pi * x) * sympy.cos(sympy.pi * y), + sympy.cos(sympy.pi * x) * sympy.sin(sympy.pi * y)]]) + proj.smoothing = 1.0e-3 + proj.add_dirichlet_bc((0.0, 0.0), "Bottom") + proj.add_dirichlet_bc((0.0, 0.0), "Top") + custom_mg.set_custom_fmg(proj, coarse, builder="barycentric") + proj.solve() + assert proj.snes.getKSP().getPC().getType() == "mg" + assert proj.snes.getConvergedReason() > 0 + + +@pytest.mark.skip(reason="Stokes_Constrained is not parallel-safe yet — it " + "segfaults at np>1 independently of custom-P, in the " + "interior-multiplier section reduction (issue #291; canonical " + "test_1062_constrained_solcx also segfaults at np=2, plain GAMG). " + "custom-P on the constrained velocity block works in SERIAL " + "(see test_1017_custom_mg_stokes.test_custom_fmg_stokes_constrained); " + "this auto-enables once #291 is fixed.") +@pytest.mark.mpi(min_size=2) +def test_parallel_custom_fmg_stokes_constrained(): + """Stokes_Constrained (free-slip multipliers, grouped [p,h] split) custom-P on + the velocity block in parallel — velocity must match the analytic SolCx.""" + import sympy + assert uw.mpi.size > 1 + m0 = _box(0.25, q=3); dm2 = m0.dm.refine().refine() + coarse = [_wrap(m0.dm, m0, q=3), _wrap(m0.dm.refine(), m0, q=3)] + fine = _wrap(dm2, m0, q=3) + sol = A.SolCx(fine, eta_A=1.0, eta_B=1.0e6, x_c=0.5, n=1) + s = uw.systems.Stokes_Constrained(fine) + s.constitutive_model = uw.constitutive_models.ViscousFlowModel + s.constitutive_model.Parameters.shear_viscosity_0 = sol.fn_viscosity + s.bodyforce = sol.fn_bodyforce + s.add_constraint_bc("Left", g=0.0, normal=sympy.Matrix([[-1.0, 0.0]])) + s.add_constraint_bc("Right", g=0.0, normal=sympy.Matrix([[1.0, 0.0]])) + s.add_constraint_bc("Bottom", g=0.0, normal=sympy.Matrix([[0.0, -1.0]])) + s.add_constraint_bc("Top", g=0.0, normal=sympy.Matrix([[0.0, 1.0]])) + s.petsc_use_pressure_nullspace = True + s.tolerance = 1.0e-9 + s.petsc_options["snes_type"] = "ksponly" + custom_mg.set_custom_fmg(s, coarse, builder="barycentric", field_id=0) + s.solve() + vksp = s.snes.getKSP().getPC().getFieldSplitSubKSP()[0] + assert s.snes.getConvergedReason() > 0 + assert vksp.getPC().getType() == "mg" + assert sol.velocity_error(s.u) < 5.0e-3 + + +@pytest.mark.mpi(min_size=2) +def test_legacy_path_serial_guard(): + """The legacy finest-only set_custom_mg path is serial-only and must raise + loudly in parallel (it has no parallel reduced map).""" + assert uw.mpi.size > 1 + s = _poisson(_box(0.25, refinement=2)) + s.set_custom_mg([_box(0.285), _box(0.142)], kind="barycentric") + with pytest.raises(NotImplementedError): + s.solve() diff --git a/tests/test_1015_custom_mg_prolongation.py b/tests/test_1015_custom_mg_prolongation.py new file mode 100644 index 00000000..ea975df1 --- /dev/null +++ b/tests/test_1015_custom_mg_prolongation.py @@ -0,0 +1,77 @@ +""" +Custom multigrid prolongation hook (set_custom_mg / utilities.custom_mg). + +Geometric multigrid driven by a prolongation we build ourselves (barycentric or +RBF) from independent, non-nested coarse meshes — decoupling geometric MG from a +nested refine() hierarchy. See docs/developer/design and the study notes. +""" +import numpy as np +import pytest +import underworld3 as uw + +pytestmark = [pytest.mark.level_2, pytest.mark.tier_b] + + +def _poisson(mesh): + p = uw.systems.Poisson(mesh) + p.constitutive_model = uw.constitutive_models.DiffusionModel + p.constitutive_model.Parameters.diffusivity = 1 + p.f = 0.0 + p.add_dirichlet_bc(0.0, "Bottom") + p.add_dirichlet_bc(1.0, "Top") + return p + + +def _box(cellSize, refinement=None): + return uw.meshing.UnstructuredSimplexBox( + minCoords=(0.0, 0.0), maxCoords=(1.0, 1.0), + cellSize=cellSize, refinement=refinement, qdegree=2) + + +def test_custom_mg_barycentric_matches_fmg(): + """Custom barycentric P (independent coarse meshes) must converge and agree + with the nested-FMG solution on the same fine problem.""" + fmg = _poisson(_box(0.25, refinement=2)) + fmg.solve() + assert fmg.snes.getConvergedReason() > 0 + + cust = _poisson(_box(0.25, refinement=2)) + cust.set_custom_mg([_box(0.285), _box(0.142)], kind="barycentric") + cust.solve() + assert cust.snes.getKSP().getPC().getType() == "mg" + assert cust.snes.getConvergedReason() > 0 + # same fine problem -> same solution (Galerkin RAP from our P) + rel = (np.linalg.norm(cust.Unknowns.u.data - fmg.Unknowns.u.data) + / (np.linalg.norm(fmg.Unknowns.u.data) + 1e-30)) + assert rel < 1e-4 + # competitive with nested FMG (not pathologically worse) + assert cust.snes.getKSP().getIterationNumber() <= fmg.snes.getKSP().getIterationNumber() + 5 + + +def test_custom_mg_rbf_drives_solver(): + """The RBF builder must also drive a converging geometric MG solve.""" + cust = _poisson(_box(0.25, refinement=2)) + cust.set_custom_mg([_box(0.285), _box(0.142)], kind="rbf") + cust.solve() + assert cust.snes.getKSP().getPC().getType() == "mg" + assert cust.snes.getConvergedReason() > 0 + + +def test_custom_mg_without_refine_hierarchy(): + """Key capability: geometric MG on a mesh with NO refine() hierarchy + (FMG unavailable -> normally GAMG). Custom P supplies the hierarchy.""" + mesh = _box(0.08) # no refinement -> single-level + assert len(mesh.dm_hierarchy) == 1 + cust = _poisson(mesh) + cust.set_custom_mg([_box(0.3), _box(0.16)], kind="barycentric") + cust.solve() + assert cust.snes.getKSP().getPC().getType() == "mg" + assert cust.snes.getConvergedReason() > 0 + + +def test_set_custom_mg_validation(): + p = _poisson(_box(0.5)) + with pytest.raises(ValueError): + p.set_custom_mg([_box(0.8)], kind="not-a-builder") + with pytest.raises(ValueError): + p.set_custom_mg([], kind="barycentric") diff --git a/tests/test_1016_custom_mg_hierarchy.py b/tests/test_1016_custom_mg_hierarchy.py new file mode 100644 index 00000000..402ee9c4 --- /dev/null +++ b/tests/test_1016_custom_mg_hierarchy.py @@ -0,0 +1,60 @@ +"""Layer-1 generalized FMG hierarchy: CustomMGHierarchy + set_custom_fmg with +BC-per-level reduction, over a uniform + SBR-targeted nested hierarchy.""" +import numpy as np, sympy, pytest, underworld3 as uw +from petsc4py import PETSc +from underworld3.utilities import custom_mg + +pytestmark = [pytest.mark.level_2, pytest.mark.tier_b] + +def _sbr(dm, band=0.2): + # scoped SBR (restores dm_plex_transform_type so uniform refine() still works) + return custom_mg.sbr_refine_where(dm, lambda cen: abs(cen[0]-0.5) < band) +def _wrap(dm,m0): + return uw.discretisation.Mesh(dm.clone(), simplex=True, + coordinate_system_type=m0.CoordinateSystem.coordinate_type, qdegree=3, boundaries=m0.boundaries) +def _poisson(mesh): + p=uw.systems.Poisson(mesh); p.constitutive_model=uw.constitutive_models.DiffusionModel + x,y=mesh.X + p.constitutive_model.Parameters.diffusivity=sympy.Piecewise((1.0,x<0.5),(1000.0,True)) + p.f=8*sympy.pi**2*sympy.sin(2*sympy.pi*x)*sympy.sin(2*sympy.pi*y) + for b in ("Bottom","Top","Left","Right"): p.add_dirichlet_bc(0.0,b) + p.petsc_options["ksp_rtol"]=1e-8; p.petsc_options["ksp_type"]="cg" + return p + +def _hierarchy(): + m0=uw.meshing.UnstructuredSimplexBox(minCoords=(0,0),maxCoords=(1,1),cellSize=0.5,regular=True,qdegree=3) + dm0=m0.dm; dm1=dm0.refine(); dm2=_sbr(dm1) # 2 uniform-ish + 1 SBR + return m0, [_wrap(dm0,m0), _wrap(dm1,m0)], _wrap(dm2,m0) + +def test_custom_fmg_hierarchy_converges(): + m0, coarse, fine = _hierarchy() + s=_poisson(fine) + custom_mg.set_custom_fmg(s, coarse, builder="barycentric") + s.solve() + assert s.snes.getKSP().getPC().getType()=="mg" + assert s.snes.getKSP().getPC().getMGLevels()==3 + assert s.snes.getConvergedReason()>0 + # matches a GAMG solve of the same fine problem + g=_poisson(_wrap(fine.dm,m0)); g.preconditioner="gamg"; g.solve() + rel=np.linalg.norm(s.Unknowns.u.data-g.Unknowns.u.data)/np.linalg.norm(g.Unknowns.u.data) + assert rel < 1e-4 + +def test_bc_per_level_reduction_no_zero_columns(): + """build() must produce transfers with no zero columns (BCs at every level).""" + m0, coarse, fine = _hierarchy() + s=_poisson(fine); s._build(False,False,None); s.snes.setUp() + h=custom_mg.CustomMGHierarchy(coarse+[fine], builder="barycentric") + Ps=h.build(s) # raises if any transfer has zero columns + assert len(Ps)==2 + for P in Ps: + Pc=P.getValuesCSR(); import scipy.sparse as sp + M=sp.csr_matrix(Pc[::-1], shape=P.getSize()) + assert int((np.asarray((M!=0).sum(axis=0)).ravel()==0).sum())==0 + +def test_rbf_builder_also_works(): + m0, coarse, fine = _hierarchy() + s=_poisson(fine) + custom_mg.set_custom_fmg(s, coarse, builder="rbf") + s.solve() + assert s.snes.getKSP().getPC().getType()=="mg" + assert s.snes.getConvergedReason()>0 diff --git a/tests/test_1017_custom_mg_stokes.py b/tests/test_1017_custom_mg_stokes.py new file mode 100644 index 00000000..bab16eb0 --- /dev/null +++ b/tests/test_1017_custom_mg_stokes.py @@ -0,0 +1,192 @@ +"""Layer-1 generalized FMG hierarchy across the solver families that consume a mesh. + +Custom-built prolongations (barycentric / RBF) drive geometric multigrid via +set_custom_fmg, validated here on each solver-PC topology so the same adapted-mesh +hierarchy works for every solver running on it: + - STOKES velocity sub-block (set_custom_fmg field_id=0) — SolCx eta_B=1e6; + - SNES_Vector (Vector_Projection) — top-level vector PC (field_id=None); + - Stokes_Constrained (free-slip via in-saddle multipliers) — velocity block under + the grouped [p, h] fieldsplit. + +The hard part (see custom_mg._install_velocity_block_transfers): the velocity +sub-PC is unreachable until the monolithic Jacobian is assembled, so the install +forces a Jacobian assembly, reaches the velocity sub-PC, and rebuilds a fresh PCMG +from our supplied prolongations. +""" +import numpy as np +import pytest +import underworld3 as uw +from underworld3.function import analytic as A +from underworld3.utilities import custom_mg + +pytestmark = [pytest.mark.level_2, pytest.mark.tier_b] + + +def _wrap(dm, m0): + return uw.discretisation.Mesh( + dm.clone(), simplex=True, + coordinate_system_type=m0.CoordinateSystem.coordinate_type, + qdegree=3, boundaries=m0.boundaries) + + +def _hierarchy(): + """cellSize 0.25 base, two uniform refinements -> 3 nested levels.""" + m0 = uw.meshing.UnstructuredSimplexBox( + minCoords=(0, 0), maxCoords=(1, 1), cellSize=0.25, regular=True, qdegree=3) + dm0 = m0.dm + dm1 = dm0.refine() + dm2 = dm1.refine() + meshes = [_wrap(dm0, m0), _wrap(dm1, m0), _wrap(dm2, m0)] + return meshes[:-1], meshes[-1] + + +def _walls(s): + s.add_dirichlet_bc((0.0, None), "Left") + s.add_dirichlet_bc((0.0, None), "Right") + s.add_dirichlet_bc((None, 0.0), "Bottom") + s.add_dirichlet_bc((None, 0.0), "Top") + s.petsc_use_pressure_nullspace = True + s.petsc_options["snes_type"] = "ksponly" + + +def _solcx(mesh, eta_B=1e6): + sol = A.SolCx(mesh, eta_A=1.0, eta_B=eta_B, x_c=0.5, n=1) + s = uw.systems.Stokes(mesh) + s.constitutive_model = uw.constitutive_models.ViscousFlowModel + s.constitutive_model.Parameters.shear_viscosity_0 = sol.fn_viscosity + s.saddle_preconditioner = 1.0 / sol.fn_viscosity + s.bodyforce = sol.fn_bodyforce + _walls(s) + s.tolerance = 1e-8 + return s, sol + + +def _vel_ksp(s): + return s.snes.getKSP().getPC().getFieldSplitSubKSP()[0] + + +def test_custom_fmg_velocity_block_barycentric(): + """Custom barycentric P drives geometric MG on the SolCx velocity block, + converging in few iterations and matching a GAMG reference solution.""" + coarse, fine = _hierarchy() + + # GAMG reference + sg, solg = _solcx(fine) + sg.preconditioner = "gamg" + sg.solve() + verr_g = solg.velocity_error(sg.u) + iters_g = _vel_ksp(sg).getIterationNumber() + + # custom-P geometric MG on the velocity block + s, sol = _solcx(fine) + custom_mg.set_custom_fmg(s, coarse, builder="barycentric", field_id=0) + s.solve() + vksp = _vel_ksp(s) + + assert s.snes.getConvergedReason() > 0 + assert vksp.getPC().getType() == "mg" + assert vksp.getPC().getMGLevels() == len(coarse) + 1 + # geometric MG crushes GAMG on the eta-jump velocity block + assert vksp.getIterationNumber() <= 15 + assert vksp.getIterationNumber() < iters_g + # correct solution: matches analytic to GAMG's accuracy, and matches GAMG itself + verr = sol.velocity_error(s.u) + assert verr < 2.0 * verr_g + 1e-6 + rel = np.linalg.norm(s.u.data - sg.u.data) / (np.linalg.norm(sg.u.data) + 1e-30) + assert rel < 1e-4 + + +def test_custom_fmg_velocity_block_rbf(): + """The RBF builder must also drive a converging velocity-block MG solve.""" + coarse, fine = _hierarchy() + s, sol = _solcx(fine) + custom_mg.set_custom_fmg(s, coarse, builder="rbf", field_id=0) + s.solve() + vksp = _vel_ksp(s) + assert s.snes.getConvergedReason() > 0 + assert vksp.getPC().getType() == "mg" + assert vksp.getPC().getMGLevels() == len(coarse) + 1 + assert vksp.getIterationNumber() <= 25 + assert sol.velocity_error(s.u) < 1e-2 + + +def test_custom_fmg_vector_solver(): + """SNES_Vector (Vector_Projection): custom-P on the top-level vector PC + (field_id=None) — the same scalar/single-field-vector path, ncomp=dim.""" + import sympy + coarse, fine = _hierarchy() + x, y = fine.X + v = uw.discretisation.MeshVariable("Ucp", fine, fine.dim, degree=2) + proj = uw.systems.Vector_Projection(fine, v) + proj.uw_function = sympy.Matrix([[sympy.sin(sympy.pi * x) * sympy.cos(sympy.pi * y), + sympy.cos(sympy.pi * x) * sympy.sin(sympy.pi * y)]]) + proj.smoothing = 1.0e-3 + proj.add_dirichlet_bc((0.0, 0.0), "Bottom") + proj.add_dirichlet_bc((0.0, 0.0), "Top") + custom_mg.set_custom_fmg(proj, coarse, builder="barycentric") # field_id=None + proj.solve() + assert proj.snes.getKSP().getPC().getType() == "mg" + assert proj.snes.getKSP().getPC().getMGLevels() == len(coarse) + 1 + assert proj.snes.getConvergedReason() > 0 + + +def test_custom_fmg_stokes_constrained(): + """Stokes_Constrained (free-slip via in-saddle multipliers): custom-P on the + velocity block under the grouped [p, h] fieldsplit. Velocity must still match + the analytic SolCx free-slip solution.""" + import sympy + coarse, fine = _hierarchy() + sol = A.SolCx(fine, eta_A=1.0, eta_B=1.0e6, x_c=0.5, n=1) + s = uw.systems.Stokes_Constrained(fine) + s.constitutive_model = uw.constitutive_models.ViscousFlowModel + s.constitutive_model.Parameters.shear_viscosity_0 = sol.fn_viscosity + s.bodyforce = sol.fn_bodyforce + s.add_constraint_bc("Left", g=0.0, normal=sympy.Matrix([[-1.0, 0.0]])) + s.add_constraint_bc("Right", g=0.0, normal=sympy.Matrix([[1.0, 0.0]])) + s.add_constraint_bc("Bottom", g=0.0, normal=sympy.Matrix([[0.0, -1.0]])) + s.add_constraint_bc("Top", g=0.0, normal=sympy.Matrix([[0.0, 1.0]])) + s.petsc_use_pressure_nullspace = True + s.tolerance = 1.0e-9 + s.petsc_options["snes_type"] = "ksponly" + custom_mg.set_custom_fmg(s, coarse, builder="barycentric", field_id=0) + s.solve() + vksp = _vel_ksp(s) + assert s.snes.getConvergedReason() > 0 + assert vksp.getPC().getType() == "mg" + assert vksp.getPC().getMGLevels() == len(coarse) + 1 + assert vksp.getIterationNumber() <= 15 + assert sol.velocity_error(s.u) < 5.0e-3 + + +def test_custom_fmg_stokes_on_sbr_child(): + """Stokes velocity-block custom-P on an SBR-refined CHILD mesh whose bodyforce + depends on a mesh variable (the Layer-2 adapt-on-top scenario). Regression for + the operator-not-wired bug: forcing the Jacobian before the fieldsplit left the + KSP carrying an unassembled operator -> PCSetUp 'Matrix must be set first' + (err73). Fixed by wiring the assembled Jacobian into the KSP before reaching + the velocity sub-PC.""" + import sympy + base = uw.meshing.UnstructuredSimplexBox( + minCoords=(0, 0), maxCoords=(1, 1), cellSize=0.25, regular=True, + refinement=2, qdegree=3) + child = _wrap(custom_mg.sbr_refine_where( + base.dm_hierarchy[-1], lambda c: abs(c[0] - 0.5) < 0.15), base) + coarse = [_wrap(d, base) for d in base.dm_hierarchy] # static base levels + + Tp = uw.discretisation.MeshVariable("Tp", child, 1, degree=2) # proxy field + Tp.data[:, 0] = 1.0 + s = uw.systems.Stokes(child) + s.constitutive_model = uw.constitutive_models.ViscousFlowModel + s.constitutive_model.Parameters.shear_viscosity_0 = 1.0 + s.saddle_preconditioner = 1.0 + s.bodyforce = sympy.Matrix([[0.0], [1.0e4 * Tp.sym[0]]]) # mesh-variable bodyforce + s.add_dirichlet_bc((0.0, 0.0), "Bottom"); s.add_dirichlet_bc((0.0, 0.0), "Top") + s.add_dirichlet_bc((0.0, None), "Left"); s.add_dirichlet_bc((0.0, None), "Right") + s.petsc_use_pressure_nullspace = True + s.petsc_options["snes_type"] = "ksponly" + custom_mg.set_custom_fmg(s, coarse, builder="barycentric", field_id=0) + s.solve() + vksp = _vel_ksp(s) + assert s.snes.getConvergedReason() > 0 + assert vksp.getPC().getType() == "mg" + assert vksp.getPC().getMGLevels() == len(coarse) + 1