🌐 Project Page
•
Arxiv
• 💻 Code
ViLBench
•
ViLReward-73K •
ViLPRM-3B
- PRM Training Data Generation Pipeline 🏭
- Dataset Card 📚
- Training ViLPRM 💡
- Generate Responses and Best-of-N Selection
- Contributors 🙌
We provide an example of collecting the vision-language PRM training data from A-OKVQA in this README.md
| Name | Data Type | # Original | # Process Reward Size |
|---|---|---|---|
| A-OKVQA | General | 20,44 | 9,241 |
| GeoQA170K | Math | 8,063 | 31,406 |
| CLEVR-Math | Math | 957 | 1,425 |
| ScienceQA | General | 2,769 | 5,659 |
| Total | Math & General | 16,926 | 73,560 |
- Download
image.zipandvilreward_73k_train.jsonfrom here and put them into thedatafolder under theprm_trainingfolder bash scripts/train_vilprm.sh
We first generate 16 candidate responses and then select the one with the best potential using different reward models.
MODEL="vilprm"
# "math_shepherd", "skywork", "qwen25_vl_7b", "llama32_11b", "llava_ov_7b",
# "molmo_7b", "none", "gpt-4o", "internvl25_8b", "gemini-1.5-pro", "internvl25_26b", "qwen255_vl_7b",
# "gemini-2.0-flash-exp", "internvl25_4b", "qwen255_vl_3b", "llava_ov_7b", "ursa", "internlm",
# "ixc_7b", "llavac_7b", "visualprm"
TEST_DATA="mathvista"
# "mathvista", "mathverse", "mmstar", "mmmu_pro", "realworldqa", "vilbench"
DATA_SPLIT="testmini"
# "testmini", "test", "val", "train", "testmini", "test", "val", "train"
bash scripts/generate_responses.sh $MODEL $TEST_DATA $DATA_SPLITPRM_MODEL="vilprm"
# "math_shepherd", "skywork", "qwen25_vl_7b", "llama32_11b", "llava_ov_7b",
# "molmo_7b", "none", "gpt-4o", "internvl25_8b", "gemini-1.5-pro", "internvl25_26b", "qwen255_vl_7b",
# "gemini-2.0-flash-exp", "internvl25_4b", "qwen255_vl_3b", "llava_ov_7b", "ursa", "internlm",
# "ixc_7b", "llavac_7b", "visualprm"
TEST_DATA="mathvista"
# "mathvista", "mathverse", "mmstar", "mmmu_pro", "realworldqa", "vilbench"
DATA_SPLIT="testmini"
# "testmini", "test", "val", "train", "testmini", "test", "val", "train"
bash scripts/bon_select_responses.sh $PRM_MODEL $TEST_DATA $DATA_SPLIT- Benchmark and training data
- Code for generating training data for ViLPRM
- Code for training ViLPRM
- Code for generating responses and BoN selection
- Bash scripts for evaluating the BoN selection performance
Haoqin Tu, Weitao Feng, Hardy Chen, Hui Liu, Xianfeng Tang, Cihang Xie
If you find our data useful, please consider citing our work and starring the repo! We are VLAA from UC Santa Cruz.
@inproceedings{tu-vilbench-2025,
title = "{V}i{LB}ench: A Suite for Vision-Language Process Reward Modeling",
author = "Tu, Haoqin and
Feng, Weitao and
Chen, Hardy and
Liu, Hui and
Tang, Xianfeng and
Xie, Cihang",
booktitle = "EMNLP",
year = "2025",
}