boxin-wbx commited on
Commit
a6d6fdc
·
verified ·
1 Parent(s): 8cf28e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -3
README.md CHANGED
@@ -24,8 +24,13 @@ library_name: transformers
24
  ## Description
25
  This family of models performs vision-language and text-only tasks including optical character recognition, multimodal reasoning, localization, common sense reasoning, world knowledge utilization, and coding.
26
 
 
 
27
  ## License/Terms of Use
28
- [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0) <br>
 
 
 
29
 
30
  # Model Details
31
 
@@ -62,7 +67,20 @@ Results (as of September 17th, 2024) in the multimodal benchmarks are as follows
62
 
63
  ## Model Architectures
64
 
65
- **Network Architecture:** Decoder-Only Transformer
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ### Input
68
  **Input Type(s):** Text, Image <br>
@@ -179,7 +197,7 @@ Wenliang Dai* ([email protected]), Nayeon Lee* ([email protected]), Boxin Wang* (
179
 
180
 
181
  ## Ethical Considerations
182
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
183
 
184
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
185
 
 
24
  ## Description
25
  This family of models performs vision-language and text-only tasks including optical character recognition, multimodal reasoning, localization, common sense reasoning, world knowledge utilization, and coding.
26
 
27
+ This model is ready for non-commercial use.
28
+
29
  ## License/Terms of Use
30
+
31
+ Governing Terms: Deed - [Attribution-NonCommercial 4.0 International - Creative Commons](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
32
+
33
+ Additional Information: [LICENSE · Qwen/Qwen2-72B-Instruct at main](https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE) for Qwen2-72B-Instruct and [The MIT License – Open Source Initiative](https://opensource.org/license/mit) for InternViT-6B-448px-V1-2.
34
 
35
  # Model Details
36
 
 
67
 
68
  ## Model Architectures
69
 
70
+ **Network Architecture:** Decoder-Only Transformer
71
+
72
+ **Text-only LLM backbone:** [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct)
73
+
74
+ **Vision encoder:** [InternViT-6B](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)
75
+
76
+ ### Robustness
77
+
78
+ The model trained on this dataset cannot regenerate its training data:
79
+
80
+ 1. The model has no image generation capability since its output is only text. Hence it cannot regenerate any image it would have seen during training.
81
+
82
+ 2. The model cannot regenerate training text data: during training, the model takes text and images as inputs, and the model output (text) is conditioned on both inputs. During inference, without training images as input, the models would not be able to reproduce any part of the training text data.
83
+
84
 
85
  ### Input
86
  **Input Type(s):** Text, Image <br>
 
197
 
198
 
199
  ## Ethical Considerations
200
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
201
 
202
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
203