Optimize ACC port of atm_advance_scalars_work:4857 by abishekg7 · Pull Request #1476 · MPAS-Dev/MPAS-Model

abishekg7 · 2026-06-24T21:22:57Z

This PR introduces optimizations for the OpenACC port of atm_advance_scalars_work:4857 and is intended to serve as a template for other optimization PRs.

The table below lists the timings for a real, global 30km experiment on A100 GPU with nvhpc, and Derecho CPUs with several compiler suites.

For nvhpc gpu runs, -gpu=math_uniform is introduced as a build flag to ensure optimizations are bit-identical, and the we report the numbers using NV_ACC_TIME=1. The GPU runs are on 1 Derecho GPU node, using 1 A100 via 1 MPI task.

The numbers reported for the CPU runs with gnu and intel compilers use the newly-added timers local to this region, and are averaged across three runs. The CPU runs use a single derecho CPU node each, fully subscribed to 128 MPI tasks.

Version	GPU kernel time (ms)	CPU timer - nvhpc	CPU timer - gnu	CPU timer - intel24	CPU timer - intel25
base	36	0.0251	0.0246	0.0245	0.02338
1 swap the seq and vector collapse(2) loops	26	0.02695	0.03141	0.02768	0.03124
2 and fuse all scalar_tend_column loops	24	0.02694	0.03058	0.0273	0.03034
3 Rearrange loops so scalar_tend_column can be replaced by a scalar local to each gang	18	0.02669	0.03136	0.03269	0.03094

This work was completed in part at the NCAR/NLR/NOAA Open Hackathon, part of the Open Hackathons program. The authors would like to acknowledge OpenACC-Standard.org for their support.

This change also involves rearranging loops so wdtn is computed before the computation of scalar_tend_column and rho_zz_new_inv and scalar_new. The latter three of which can be fused together.

abishekg7 added 5 commits June 18, 2026 15:49

Adding timer atm_advance_scalars_4857

a1409f6

Fix timer

7686fa2

OPT 1 - Swap the seq and vector collapse(2) loops

8c1d882

OPT 2 fuse all scalar_tend_column loops

d77f2f0

OPT 3 - Replace scalar_tend_column by scalar local to each gang

6588495

This change also involves rearranging loops so wdtn is computed before the computation of scalar_tend_column and rho_zz_new_inv and scalar_new. The latter three of which can be fused together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize ACC port of atm_advance_scalars_work:4857#1476

Optimize ACC port of atm_advance_scalars_work:4857#1476
abishekg7 wants to merge 5 commits into
MPAS-Dev:developfrom
abishekg7:atmosphere/opt_adv_scalars_4857

abishekg7 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

abishekg7 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant