GPU (CUDA) port of the CppNCorr 2D Digital Image Correlation (DIC) engine.
CppNCorr runs the DIC hot path — inverse-compositional Gauss-Newton (IC-GN) subset
matching with a ZNCC criterion and biquintic-B-spline subpixel interpolation — one
pixel at a time, one subset at a time in scalar CPU loops. cuNCorr reformulates that
work as batched array/matrix operations across all subsets at once and runs it on
NVIDIA GPUs.
cuNCorr is a standalone project. It honors the exact public contract of
ncorr/session.h (ImageBuffer → DICResult, SessionConfig) so it can act as a
drop-in DIC backend, and it validates every result against CppNCorr as the CPU parity
oracle. CppNCorr is never modified.
A complete, working DIC engine. The CPU path runs anywhere (it's also the parity reference); the CUDA path is auto-used when a GPU is present and reuses the exact same numeric cores, so GPU output matches CPU by construction.
- ✅ M0 — CMake
CUNCORR_BACKENDtoggle,cuncorr/session.hcontract, backend HAL, smoke test. - ✅ M1 — interpolation core: bilinear + bicubic-Keys, value + analytic gradients, shared host/device core, batched sampling (analytic parity tests).
- ✅ M2/M3 — IC-GN subset solver (ZNSSD, inverse-compositional, 6×6 Hessian) as a shared
__host__ __device__core; CPU + batched CUDA. Recovers synthetic deformation to ~1e-4 px. - ✅ M4 — coarse ZNCC seed + reliability-guided propagation (CPU) / batch coarse+refine (GPU).
- ✅ M5 — LS strain, multi-frame session, self-contained
proxyncorr_gpuCLI (stb_image IO, JSON + strain CSV output) — validated end to end. - ✅ M6 — CUDA + CPU Dockerfiles, SLURM GPU script, Apptainer def, GitHub CI.
- ⏳ M1b — quintic B-spline (recursive prefilter): the one deferred stage, pending the CppNCorr oracle link (no closed-form self-check). Engine default is bicubic-Keys.
5 CPU tests pass (smoke, interp, icgn, engine, cli); a cuda_parity test gates the
GPU path on a GPU node. See docs/cluster.md to run on a server/cluster.
cmake -S . -B build -DCUNCORR_BACKEND=CPU
cmake --build build -j
ctest --test-dir build --output-on-failurecmake -S . -B build -DCUNCORR_BACKEND=CUDA
cmake --build build -jCppNCorr is vendored as a git submodule at Tools/CppNCorr and built directly by cuNCorr's
CMake — the project never depends on a CppNCorr checkout outside this repo.
git submodule update --init --recursive # fetch Tools/CppNCorr
# Build the bundled engine -> lib/libncorr.a (needs OpenCV, FFTW, SuiteSparse, BLAS):
cmake -S . -B build -DCUNCORR_BUILD_CPPNCORR=ON
cmake --build build --target ncorr -j # produces lib/libncorr.a
# Or build the full parity oracle (implies the above):
cmake -S . -B build -DCUNCORR_BUILD_ORACLE=ON
cmake --build build -j
./build/test/dump_oracle <images_dir> <out.json>Both options are off by default so the core CPU/GPU builds stay dependency-free. See docs/parity.md.