Question
Hi, I'm reproducing the VLN-N1 instruction pipeline (keyframes → sub-clips → LLaVA-OneVision → Qwen3-72B rewrite/summarize) from your tech report. Could you share the exact prompts and #images per sub-clip for LLaVA and Qwen, plus how you define turn left/right in the prompt? I'm getting left/right mismatches vs. our trajectory. Thanks!
Question
Hi, I'm reproducing the VLN-N1 instruction pipeline (keyframes → sub-clips → LLaVA-OneVision → Qwen3-72B rewrite/summarize) from your tech report. Could you share the exact prompts and #images per sub-clip for LLaVA and Qwen, plus how you define turn left/right in the prompt? I'm getting left/right mismatches vs. our trajectory. Thanks!