Skip to content

xdic-dev/cuNCorr

Repository files navigation

cuNCorr

GPU (CUDA) port of the CppNCorr 2D Digital Image Correlation (DIC) engine.

CppNCorr runs the DIC hot path — inverse-compositional Gauss-Newton (IC-GN) subset matching with a ZNCC criterion and biquintic-B-spline subpixel interpolation — one pixel at a time, one subset at a time in scalar CPU loops. cuNCorr reformulates that work as batched array/matrix operations across all subsets at once and runs it on NVIDIA GPUs.

cuNCorr is a standalone project. It honors the exact public contract of ncorr/session.h (ImageBufferDICResult, SessionConfig) so it can act as a drop-in DIC backend, and it validates every result against CppNCorr as the CPU parity oracle. CppNCorr is never modified.

Status

A complete, working DIC engine. The CPU path runs anywhere (it's also the parity reference); the CUDA path is auto-used when a GPU is present and reuses the exact same numeric cores, so GPU output matches CPU by construction.

  • ✅ M0 — CMake CUNCORR_BACKEND toggle, cuncorr/session.h contract, backend HAL, smoke test.
  • ✅ M1 — interpolation core: bilinear + bicubic-Keys, value + analytic gradients, shared host/device core, batched sampling (analytic parity tests).
  • ✅ M2/M3 — IC-GN subset solver (ZNSSD, inverse-compositional, 6×6 Hessian) as a shared __host__ __device__ core; CPU + batched CUDA. Recovers synthetic deformation to ~1e-4 px.
  • ✅ M4 — coarse ZNCC seed + reliability-guided propagation (CPU) / batch coarse+refine (GPU).
  • ✅ M5 — LS strain, multi-frame session, self-contained proxyncorr_gpu CLI (stb_image IO, JSON + strain CSV output) — validated end to end.
  • ✅ M6 — CUDA + CPU Dockerfiles, SLURM GPU script, Apptainer def, GitHub CI.
  • ⏳ M1b — quintic B-spline (recursive prefilter): the one deferred stage, pending the CppNCorr oracle link (no closed-form self-check). Engine default is bicubic-Keys.

5 CPU tests pass (smoke, interp, icgn, engine, cli); a cuda_parity test gates the GPU path on a GPU node. See docs/cluster.md to run on a server/cluster.

Build (macOS / Linux, CPU backend — no GPU needed)

cmake -S . -B build -DCUNCORR_BACKEND=CPU
cmake --build build -j
ctest --test-dir build --output-on-failure

Build (CUDA backend — requires CUDA toolkit + NVIDIA GPU)

cmake -S . -B build -DCUNCORR_BACKEND=CUDA
cmake --build build -j

Bundled CppNCorr (CPU reference / parity oracle)

CppNCorr is vendored as a git submodule at Tools/CppNCorr and built directly by cuNCorr's CMake — the project never depends on a CppNCorr checkout outside this repo.

git submodule update --init --recursive          # fetch Tools/CppNCorr
# Build the bundled engine -> lib/libncorr.a (needs OpenCV, FFTW, SuiteSparse, BLAS):
cmake -S . -B build -DCUNCORR_BUILD_CPPNCORR=ON
cmake --build build --target ncorr -j            # produces lib/libncorr.a

# Or build the full parity oracle (implies the above):
cmake -S . -B build -DCUNCORR_BUILD_ORACLE=ON
cmake --build build -j
./build/test/dump_oracle <images_dir> <out.json>

Both options are off by default so the core CPU/GPU builds stay dependency-free. See docs/parity.md.

About

GPU (CUDA) port of CppNCorr's 2D Digital Image Correlation engine — batched IC-GN, mixed precision, parity-validated against CppNCorr.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors