gpu-cpg

GPU critical path generation experiments, including the current TC-powered PFXT prototype.

Build

cd /home/cchang289/Research/gpu-cpg
cmake --build build -j8 --target \
  cpg \
  tc-pfxt-gate5 \
  tc-pfxt-inprocess-timing \
  tc-pfxt-inprocess-exactness \
  tc_pfxt_candidates \
  tc_pfxt_inprocess

Unit Tests

./build/unittests/tc_pfxt_candidates
./build/unittests/tc_pfxt_inprocess

CUDA tests must run outside the sandbox on this machine.

Main Modes

The examples below use locally generated netcard density-crossover data: benchmarks/tc_pfxt_crossover/netcard_d20.txt. These benchmark and golden-cost files are large experiment artifacts and are not tracked in git. Generate or copy them into the shown paths before running the commands.

G-PathGen Baseline

./build/examples/tc-pfxt-inprocess-timing \
  --benchmark benchmarks/tc_pfxt_crossover/netcard_d20.txt \
  --k 1000000 \
  --mode gpg \
  --warmup 1 \
  --trials 3

TC PFXT, Spur-Source Grouped Candidate Generation

This is the current proposal configuration.

GPUCPG_ENABLE_TC_PFXT=1 \
GPUCPG_TC_PFXT_SINGLE_PASS=1 \
GPUCPG_TC_PFXT_SINGLE_WORK_CANDIDATE=1 \
GPUCPG_TC_PFXT_SOURCE_LOCAL_CANDIDATE=1 \
GPUCPG_TC_PFXT_COMPACT_STATIC_DEVS=1 \
GPUCPG_TC_PFXT_TILE_NATIVE_CANDIDATE=1 \
GPUCPG_TC_PFXT_COMPACT_SOURCE_GROUPS=1 \
GPUCPG_TC_PFXT_DISABLE_PHASE_PROFILE=1 \
./build/examples/tc-pfxt-inprocess-timing \
  --benchmark benchmarks/tc_pfxt_crossover/netcard_d20.txt \
  --k 1000000 \
  --mode tc \
  --warmup 1 \
  --trials 3

Exactness Against Golden Costs

Generate or reuse a GPG cost file, then compare multiple K prefixes:

./build/examples/tc-pfxt-gate5 \
  --benchmark benchmarks/tc_pfxt_crossover/netcard_d20.txt \
  --k 1000000 \
  --mode baseline \
  --out experiments/tc_pfxt_source_local_20260614/golden/netcard_d20_k1000000.golden.costs

GPUCPG_ENABLE_TC_PFXT=1 \
GPUCPG_TC_PFXT_SINGLE_PASS=1 \
GPUCPG_TC_PFXT_SINGLE_WORK_CANDIDATE=1 \
GPUCPG_TC_PFXT_SOURCE_LOCAL_CANDIDATE=1 \
GPUCPG_TC_PFXT_COMPACT_STATIC_DEVS=1 \
GPUCPG_TC_PFXT_TILE_NATIVE_CANDIDATE=1 \
GPUCPG_TC_PFXT_COMPACT_SOURCE_GROUPS=1 \
./build/examples/tc-pfxt-inprocess-exactness \
  --benchmark benchmarks/tc_pfxt_crossover/netcard_d20.txt \
  --baseline-file experiments/tc_pfxt_source_local_20260614/golden/netcard_d20_k1000000.golden.costs \
  --ks 1000

After the smoke passes, run the full prefix check with --ks 1000,10000,50000,100000,1000000.

The in-process drivers load the graph once and call CpGen::reset() between runs. They do not reset the CUDA device by default. Use --reset-device only for debugging a single short run.

Current Results

The latest spur-source grouped TC comparison is documented in doc/tc-pfxt-optimization-readme.md.

density	K	GPG ms	TC ms	TC/GPG
d10	1M	186.9	323.3	1.73x
d20	1M	264.1	266.7	1.01x
d30	200K	62.7	68.7	1.09x
d40	200K	64.1	104.3	1.63x
d50	100K	59.6	60.2	1.01x

TC is not faster than GPG yet. The current value is architectural: deviation discovery has been reformulated as a tensor-core-friendly operation while preserving exact path ordering. The next work is reducing candidate materialization overhead so more of PFXT is TC-shaped.

To rerun the current-best TC side of this table:

scripts/run_tc_pfxt_current_best_retime.sh

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.vscode		.vscode
3rd-party		3rd-party
benchmarks		benchmarks
cmake		cmake
doc		doc
examples		examples
experiments		experiments
gpucpg		gpucpg
scripts		scripts
unittests		unittests
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
compare-warp-spur.sh		compare-warp-spur.sh
convert_dimacs.sh		convert_dimacs.sh
corder.sh		corder.sh
densify-10-foreach.sh		densify-10-foreach.sh
densify.sh		densify.sh
dump-elist.sh		dump-elist.sh
dump-graph-stats.sh		dump-graph-stats.sh
gen-big-table.sh		gen-big-table.sh
gen-csr-bin.sh		gen-csr-bin.sh
gen-runtime-breakdown.sh		gen-runtime-breakdown.sh
gen-runtime-err-vs-avgdeg.sh		gen-runtime-err-vs-avgdeg.sh
gen-slacks-10-foreach.sh		gen-slacks-10-foreach.sh
gen-slacks.sh		gen-slacks.sh
gen-speedup-err-vs-diam.sh		gen-speedup-err-vs-diam.sh
gorder.sh		gorder.sh
prof.sh		prof.sh
rabbit-reorder.sh		rabbit-reorder.sh
runall.sh		runall.sh
runtime_table.sh		runtime_table.sh
runtime_table_dac21.sh		runtime_table_dac21.sh
vr-comp.sh		vr-comp.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpu-cpg

Build

Unit Tests

Main Modes

G-PathGen Baseline

TC PFXT, Spur-Source Grouped Candidate Generation

Exactness Against Golden Costs

Current Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gpu-cpg

Build

Unit Tests

Main Modes

G-PathGen Baseline

TC PFXT, Spur-Source Grouped Candidate Generation

Exactness Against Golden Costs

Current Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages