VisionReward-Video

Introduction

We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction. Here, we present the model of VisionReward-Video.

Using this model

You can quickly install the Python package dependencies and run model inference in our github.

Downloads last month
52
Safetensors
Model size
12.5B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) has been turned off for this model.