ViLBench: A Suite for Vision-Language Process Reward Modeling (EMNLP 2025)

ViLBench • ViLReward-73K • ViLPRM-3B

PRM Training Data

We provide an example of collecting the vision-language PRM training data from A-OKVQA in this README.md

Dataset Card

Name	Data Type	# Original	# Process Reward Size
A-OKVQA	General	20,44	9,241
GeoQA170K	Math	8,063	31,406
CLEVR-Math	Math	957	1,425
ScienceQA	General	2,769	5,659
Total	Math & General	16,926	73,560

Training ViLPRM

Download image.zip and vilreward_73k_train.json from here and put them into the data folder under the prm_training folder
bash scripts/train_vilprm.sh

Generate Responses and BoN Selection

We first generate 16 candidate responses and then select the one with the best potential using different reward models.

Generate Responses

MODEL="vilprm" 
# "math_shepherd", "skywork", "qwen25_vl_7b", "llama32_11b", "llava_ov_7b", 
# "molmo_7b", "none", "gpt-4o", "internvl25_8b", "gemini-1.5-pro", "internvl25_26b", "qwen255_vl_7b", 
# "gemini-2.0-flash-exp", "internvl25_4b", "qwen255_vl_3b", "llava_ov_7b", "ursa", "internlm", 
# "ixc_7b", "llavac_7b", "visualprm"

TEST_DATA="mathvista"
# "mathvista", "mathverse", "mmstar", "mmmu_pro", "realworldqa", "vilbench"

DATA_SPLIT="testmini"
# "testmini", "test", "val", "train", "testmini", "test", "val", "train"

bash scripts/generate_responses.sh $MODEL $TEST_DATA $DATA_SPLIT

Best-of-N Selection

PRM_MODEL="vilprm"
# "math_shepherd", "skywork", "qwen25_vl_7b", "llama32_11b", "llava_ov_7b", 
# "molmo_7b", "none", "gpt-4o", "internvl25_8b", "gemini-1.5-pro", "internvl25_26b", "qwen255_vl_7b", 
# "gemini-2.0-flash-exp", "internvl25_4b", "qwen255_vl_3b", "llava_ov_7b", "ursa", "internlm", 
# "ixc_7b", "llavac_7b", "visualprm"
TEST_DATA="mathvista"
# "mathvista", "mathverse", "mmstar", "mmmu_pro", "realworldqa", "vilbench"
DATA_SPLIT="testmini"
# "testmini", "test", "val", "train", "testmini", "test", "val", "train"

bash scripts/bon_select_responses.sh $PRM_MODEL $TEST_DATA $DATA_SPLIT

Plan

Benchmark and training data
Code for generating training data for ViLPRM
Code for training ViLPRM
Code for generating responses and BoN selection
Bash scripts for evaluating the BoN selection performance

Contributors

Haoqin Tu, Weitao Feng, Hardy Chen, Hui Liu, Xianfeng Tang, Cihang Xie

If you find our data useful, please consider citing our work and starring the repo! We are VLAA from UC Santa Cruz.

@inproceedings{tu-vilbench-2025,
    title = "{V}i{LB}ench: A Suite for Vision-Language Process Reward Modeling",
    author = "Tu, Haoqin  and
      Feng, Weitao  and
      Chen, Hardy  and
      Liu, Hui  and
      Tang, Xianfeng  and
      Xie, Cihang",
    booktitle = "EMNLP",
    year = "2025",
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
inference		inference
prm_training		prm_training
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ViLBench: A Suite for Vision-Language Process Reward Modeling (EMNLP 2025)

Contents

PRM Training Data

Dataset Card

Training ViLPRM

Generate Responses and BoN Selection

Generate Responses

Best-of-N Selection

Plan

Contributors

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

UCSC-VLAA/ViLBench

Folders and files

Latest commit

History

Repository files navigation

ViLBench: A Suite for Vision-Language Process Reward Modeling (EMNLP 2025)

Contents

PRM Training Data

Dataset Card

Training ViLPRM

Generate Responses and BoN Selection

Generate Responses

Best-of-N Selection

Plan

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages