CPUOracle is an extended dataset generation suite designed for CPU benchmarking and hardware/software Design Space Exploration (DSE). It enables users to flexibly customize datasets and concurrency (thread/process counts) for comprehensive performance evaluation.
The suite consists of two main components:
- SPEC CPU2017 Dataset Generators: Tools to generate custom and extended datasets for all 43 workloads in the SPEC CPU2017 benchmark suite.
- MicroBench: 10 highly customizable microbenchmarks (implemented in C and Java) with built-in dataset generators.
⚠️ IMPORTANT DISCLAIMER: This repository ONLY contains dataset generation tools. It DOES NOT include the source code or binaries for the SPEC CPU2017 benchmark suite. To use the SPEC generators, you must acquire the official SPEC CPU2017 suite from SPEC's official website.
A collection of 10 microbenchmarks supporting flexible dataset and thread count customization.
*.cpp/*.java: Source code for the microbenchmarks.run.sh: Reference script for compiling and running C workloads.generate_scripts.sh: Batch generator for C workloads (customize datasets and thread counts).generate_java_scripts.sh: Batch generator for Java workloads.generated_scripts/: Default output directory for generated run scripts.batch_run.sh: Bash script to execute all microbenchmarks sequentially.fast_parse_data.py: Python script to parse and aggregate the execution results.
Usage: Modify parameters inside generate_scripts.sh and generate_java_scripts.sh to generate customized execution scripts, then run them.
Contains dataset generation tools for all 43 SPEC CPU2017 workloads (Int/FP, Rate/Speed). Each directory corresponds to a specific workload. Official workload details can be found in the SPEC CPU2017 Docs.
Below is the configuration guide for customizing the datasets for each workload group:
- 500.perlbench_r / 600.perlbench_s:
- Run
generate-n.py. Modifynumber of messages to generatevia runtime arguments.
- Run
- 502.gcc_r / 602.gcc_s:
- Run
generate-n.py. Accepts any successfully compilable*.cfile as input. Compilation flags can be modified.
- Run
- 503.bwaves_r / 603.bwaves_s:
- Run
generate-n.py. Modify grid size parameters (nx,ny,nz) in the*.infile.
- Run
- 505.mcf_r / 605.mcf_s:
- Run
505_generate_dataset.py. Inputs:timetabled_tripanddeadhead_trip. - Constraint:
deadhead_tripmust be<= timetabled_trip * (timetabled_trip - 1) / 2. - Run
generate-n.pyto usecustom_inp*.inas custom datasets.
- Run
- 507.cactuBSSN_r / 607.cactuBSSN_s:
- Run
generate-n.py. Modify 3D grid size (PUGH::local_nsize) and iterations (Cactus::cctk_itlast) in the*.parfile.
- Run
- 508.namd_r:
- Run
generate-n.py. Modify theiterationsparameter.
- Run
- 510.parest_r:
- Run
generate-n.py. ModifyNumber of experimentsandMaximal number of iterationsin the*.prmfile.
- Run
- 511.povray_r:
- Run
generate-n.py. ModifyWidthandHeightparameters in the*.inifile.
- Run
- 519.lbm_r / 619.lbm_s:
- Run
generate-n.py. Modify dataset size via runtime arguments (recommend changingtime steps).
- Run
- 520.omnetpp_r / 620.omnetpp_s:
- Run
generate-n.py. Modify thesim-time-limitparameter in the*.inifile.
- Run
- 521.wrf_r / 621.wrf_s:
- Run
generate-n.py. Modifyrun_days,run_hours,run_minites,run_secondsin*.input(Note: Time range is currently limited to <= 1 day).
- Run
- 523.xalancbmk_r / 623.xalancbmk_s:
- Run
523_generate_dataset.py(Input: Number of child nodes under the root node). - Run
generate-n.pyto selectdataset_*.xmlanddataset_*.xslfiles.
- Run
- 525.x264_r / 625.x264_s:
- Run
generate-n.py. Modify dataset size via runtime arguments (recommend changingseekandframes).
- Run
- 526.blender_r:
- Run
generate-n.py. Modify dataset size via runtime arguments (recommend changingsimulationandemulation).
- Run
- 527.cam4_r / 627.cam4_s:
- Run
generate-n.py. Modifynhtfrqinatm_inandstop_nindrv_in.
- Run
- 628.pop2_s:
- Modify
stop_nandrestart_nindrv_in.in, anddt_countinpop2_in.
- Modify
- 531.deepsjeng_r / 631.deepsjeng_s:
- Run
531_generate_dataset.py. Inputs:poisons, minimum/maximum number of pieces to generate. - Run
generate-n.pyto select the generated*.txtfile.
- Run
- 538.imagick_r / 638.imagick_s:
- Run
538_generate_dataset.py. Input: Image resolution (width * height). - Run
generate-n.pyto select*.tgafiles.
- Run
- 541.leela_r / 641.leela_s:
- Run
541_generate_dataset.py. Inputs: Number of games, board size, min/max moves per game. - Run
generate-n.pyto select*.sgffiles.
- Run
- 544.nab_r / 644.nab_s:
- Includes custom molecule models sourced from RCSB.
- Run
generate-n.py. Select models and random seeds via runtime arguments.
- 548.exchange2_r / 648.exchange2_s:
- Run
generate-n.py. Modify dataset size via runtime arguments.
- Run
- 549.fotonik3d_r / 649.fotonik3d_s:
- Run
generate-n.py. ModifyN_x,N_y,N_z,N_t, andOBCparameters in theyee.datfile.
- Run
- 554.roms_r / 654.roms_s:
- Run
generate-n.py. ModifyLm,Mm,N, andNTIMESin*.in.xfiles. -
⚠️ CRITICAL WARNING FOR 654.roms_s: EnsureNtileI * NtileJis multiple of the number of threads. For stability,Lmshould be a multiple ofNtileI, andMmshould be a multiple ofNtileJ. Incorrect settings will cause runtime errors.
- Run
- 557.xz_r / 657.xz_s:
- Run
generate-n.py. Modify dataset compression levels and cache size via runtime arguments.
- Run
Due to storage limitations, this repository only provides the generation tool scripts. The complete dataset is available for download at: https://resources.benchcouncil.org/datasets/CPU_Oracle.tar.gz
If you use CPUOracle in your research, please refer to our experiment and cite our paper:
- arXiv Link: https://arxiv.org/abs/2605.26643
@article{CPUOracle,
title={Inference of Component Effect on System Performance},
author={Wang, Chenxi and Wang, Lei and Gao, Wanling and Fan, Fanda and Kang, Guoxin and Li, Hongxiao and Su, Yuchen and Zhan, Jianfeng},
journal={arXiv preprint arXiv:2605.26643v2},
year={2026}
}