Initiate

Browse files

Files changed (12) hide show

.gitattributes +1 -0
README.md +31 -0
ckpts/split_part_aa +3 -0
ckpts/split_part_ab +3 -0
ckpts/split_part_ac +3 -0
ckpts/split_part_ad +3 -0
ckpts/split_part_ae +3 -0
ckpts/split_part_af +3 -0
ckpts/split_part_ag +3 -0
ckpts/split_part_ah +3 -0
latest +1 -0
model_config.json +41 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+ckpts/* filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,31 @@

+---
+license: other
+license_name: cogvlm2
+license_link: https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B/blob/main/LICENS
+language:
+- ens
+pipeline_tag: text-generation
+tags:
+- chat
+- cogvlm2
+inference: false
+---
+# VisionReward-Image
+## Introduction
+We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
+Here, we present the model of VisionReward-Image.
+## Merging and Extracting Checkpoint Files
+Use the following command to merge the split files into a single `.tar` file and then extract it into the specified directory:
+```sh
+cat ckpts/split_part_* > ckpts/visionreward_image.tar
+tar -xvf ckpts/visionreward_image.tar
+```
+## Using this model
+You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).
+> This model utilizes bf16 precision parameters and requires the use of the sat (SwissArmyTransformer) library for invocation. For the fp32 version of the model, please refer to the following link: [https://huggingface.co/THUDM/VisionReward-Image-bf16](https://huggingface.co/THUDM/VisionReward-Image-bf16)

ckpts/split_part_aa ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a7a1f3f4998763891d5847f15e9356ee388892824b6931fb57aff9edd172f8b
+size 5221908480

ckpts/split_part_ab ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad33f733572698294b3a661b8221bcd0a90f3c92c4345876a26e623d7e42a73b
+size 5221908480

ckpts/split_part_ac ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1870cfec4bae857dc658f498c5fd54c5fad2ba994912f2be4b5f182b92f5e07
+size 5221908480

ckpts/split_part_ad ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7bde4f729bf0de1cd7519c9b5ce249aaa3a7ddf128722348ee81d710d00fb79a
+size 5221908480

ckpts/split_part_ae ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1a9e7ae16eb4558d349c839d3fa4156eb816dc83a0abe5feb9689c6282479b39
+size 5221908480

ckpts/split_part_af ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:81dae2cb5af154d578039411843092a605c05df250f4181869de39975fb353e3
+size 5221908480

ckpts/split_part_ag ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4b3c26ed2cbbf28c56bf007701f7a18f42a0fad89ecaba3c9219ab2dc0bcd63
+size 5221908480

ckpts/split_part_ah ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:467f56744f37e49d8b861d98e31c51d6e1ff668a9bede7c62396459d73d4bbdc
+size 2453288960

latest ADDED Viewed

	@@ -0,0 +1 @@


1	+ 1

model_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+    "model_class": "VisualChatModel",
+    "tokenizer_type": "Meta-Llama-3-8B-Instruct",
+    "num_layers": 32,
+    "hidden_size": 4096,
+    "num_attention_heads": 32,
+    "vocab_size": 128256,
+    "layernorm_order": "pre",
+    "model_parallel_size": 1,
+    "max_sequence_length": 8192,
+    "use_bias": false,
+    "inner_hidden_size": 14336,
+    "num_multi_query_heads": 8,
+    "image_length": 2304,
+    "image_size": 1344,
+    "eva_args": {
+        "model_class": "EVA2CLIPModel",
+        "num_layers": 63,
+        "hidden_size": 1792,
+        "num_attention_heads": 16,
+        "vocab_size": 1,
+        "layernorm_order": "post",
+        "model_parallel_size": 1,
+        "max_sequence_length": 257,
+        "inner_hidden_size": 15360,
+        "use_final_layernorm": false,
+        "layernorm_epsilon": 1e-06,
+        "row_parallel_linear_final_bias": false,
+        "image_size": [
+            1344,
+            1344
+        ],
+        "pre_len": 1,
+        "post_len": 0,
+        "in_channels": 3,
+        "patch_size": 14
+    },
+    "bos_token_id": 128000,
+    "eos_token_id": 128001,
+    "pad_token_id": null
+}