🌐 AI搜索 & 代理 主页
Skip to content

UCSC-VLAA/ViLBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ViLBench: A Suite for Vision-Language Process Reward Modeling (EMNLP 2025)

🌐 Project PageArxiv Logo Arxiv • 💻 Code

Hugging Face Logo ViLBenchHugging Face Logo ViLReward-73KHugging Face Logo ViLPRM-3B

Contents

PRM Training Data

We provide an example of collecting the vision-language PRM training data from A-OKVQA in this README.md

Dataset Card

Name Data Type # Original # Process Reward Size
A-OKVQA General 20,44 9,241
GeoQA170K Math 8,063 31,406
CLEVR-Math Math 957 1,425
ScienceQA General 2,769 5,659
Total Math & General 16,926 73,560

Training ViLPRM

  1. Download image.zip and vilreward_73k_train.json from here and put them into the data folder under the prm_training folder
  2. bash scripts/train_vilprm.sh

Generate Responses and BoN Selection

We first generate 16 candidate responses and then select the one with the best potential using different reward models.

Generate Responses

MODEL="vilprm" 
# "math_shepherd", "skywork", "qwen25_vl_7b", "llama32_11b", "llava_ov_7b", 
# "molmo_7b", "none", "gpt-4o", "internvl25_8b", "gemini-1.5-pro", "internvl25_26b", "qwen255_vl_7b", 
# "gemini-2.0-flash-exp", "internvl25_4b", "qwen255_vl_3b", "llava_ov_7b", "ursa", "internlm", 
# "ixc_7b", "llavac_7b", "visualprm"

TEST_DATA="mathvista"
# "mathvista", "mathverse", "mmstar", "mmmu_pro", "realworldqa", "vilbench"

DATA_SPLIT="testmini"
# "testmini", "test", "val", "train", "testmini", "test", "val", "train"

bash scripts/generate_responses.sh $MODEL $TEST_DATA $DATA_SPLIT

Best-of-N Selection

PRM_MODEL="vilprm"
# "math_shepherd", "skywork", "qwen25_vl_7b", "llama32_11b", "llava_ov_7b", 
# "molmo_7b", "none", "gpt-4o", "internvl25_8b", "gemini-1.5-pro", "internvl25_26b", "qwen255_vl_7b", 
# "gemini-2.0-flash-exp", "internvl25_4b", "qwen255_vl_3b", "llava_ov_7b", "ursa", "internlm", 
# "ixc_7b", "llavac_7b", "visualprm"
TEST_DATA="mathvista"
# "mathvista", "mathverse", "mmstar", "mmmu_pro", "realworldqa", "vilbench"
DATA_SPLIT="testmini"
# "testmini", "test", "val", "train", "testmini", "test", "val", "train"

bash scripts/bon_select_responses.sh $PRM_MODEL $TEST_DATA $DATA_SPLIT

Plan

  • Benchmark and training data
  • Code for generating training data for ViLPRM
  • Code for training ViLPRM
  • Code for generating responses and BoN selection
  • Bash scripts for evaluating the BoN selection performance

Contributors

Haoqin Tu, Weitao Feng, Hardy Chen, Hui Liu, Xianfeng Tang, Cihang Xie

If you find our data useful, please consider citing our work and starring the repo! We are VLAA from UC Santa Cruz.

@inproceedings{tu-vilbench-2025,
    title = "{V}i{LB}ench: A Suite for Vision-Language Process Reward Modeling",
    author = "Tu, Haoqin  and
      Feng, Weitao  and
      Chen, Hardy  and
      Liu, Hui  and
      Tang, Xianfeng  and
      Xie, Cihang",
    booktitle = "EMNLP",
    year = "2025",
}

About

[EMNLP'25] Official Python Implementation of ViLBench: A Suite for Vision-Language Process Reward Modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •