Skip to content

lixirui142/FlowR2A

Repository files navigation

FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

Project Page arXiv Hugging Face Model License

Overview

FlowR2A is a multimodal driving planner that learns the reward-conditioned action distribution p(a | r) with flow matching. Instead of treating simulation rewards as discriminative targets, FlowR2A treats them as a condition, unifying the dense supervision of scoring-based methods with the generative proposal modeling of anchor-based methods. At inference, generation is steered toward high-reward trajectories via classifier-free guidance.

Training pipeline of FlowR2A.

Table of Contents

Qualitative Results on NAVSIM

Each row is one planner; trajectory proposals are colored by PDM score, from red (0) to green (1). FlowR2A (bottom row) produces proposals that are both diverse and consistently high-scoring.

More qualitative comparisons are available on the project page.

Getting Started

To-Do

  • Inference and evaluation code
  • Training code
  • Reward simulation / caching pipeline

Acknowledgement

FlowR2A is built upon the following outstanding open-source contributions: NAVSIM, DiffusionDrive.

Citation

If you find FlowR2A useful in your research or applications, please consider giving us a star 🌟 and citing it with the following BibTeX entry.

@article{flowr2a2026,
  title         = {FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning},
  author        = {Li, Xirui and Liu, Zhe and Ye, Xiaoqing and Han, Wenhua and Pan, Yifeng and Han, Junyu and Zhao, Hengshuang},
  journal       = {arXiv preprint arXiv:2606.24231},
  eprint        = {2606.24231},
  archivePrefix = {arXiv},
  year          = {2026}
}

License

MIT — see LICENSE.

About

FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors