FlowR2A is a multimodal driving planner that learns the reward-conditioned action distribution p(a | r) with flow matching. Instead of treating simulation rewards as discriminative targets, FlowR2A treats them as a condition, unifying the dense supervision of scoring-based methods with the generative proposal modeling of anchor-based methods. At inference, generation is steered toward high-reward trajectories via classifier-free guidance.
- Overview
- Table of Contents
- Qualitative Results on NAVSIM
- Getting Started
- To-Do
- Acknowledgement
- Citation
- License
Each row is one planner; trajectory proposals are colored by PDM score, from red (0) to green (1). FlowR2A (bottom row) produces proposals that are both diverse and consistently high-scoring.
More qualitative comparisons are available on the project page.
- Data and Environment Preparation — install dependencies and download the data/models needed for evaluation.
- Evaluation — download the pre-trained checkpoints and run evaluation on the test set.
- Inference and evaluation code
- Training code
- Reward simulation / caching pipeline
FlowR2A is built upon the following outstanding open-source contributions: NAVSIM, DiffusionDrive.
If you find FlowR2A useful in your research or applications, please consider giving us a star 🌟 and citing it with the following BibTeX entry.
@article{flowr2a2026,
title = {FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning},
author = {Li, Xirui and Liu, Zhe and Ye, Xiaoqing and Han, Wenhua and Pan, Yifeng and Han, Junyu and Zhao, Hengshuang},
journal = {arXiv preprint arXiv:2606.24231},
eprint = {2606.24231},
archivePrefix = {arXiv},
year = {2026}
}MIT — see LICENSE.


