OpenGVLab
/

InternVL2-26B

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

czczup commited on Dec 8, 2024

Commit

1bfc2ba

·

verified ·

1 Parent(s): 102b9d2

Update README.md

Files changed (1) hide show

README.md +2 -6

README.md CHANGED Viewed

@@ -115,7 +115,7 @@ Limitations: Although we have made efforts to ensure the safety of the model dur
 ## Quick Start
-We provide an example code to run InternVL2-26B using `transformers`.
 > Please use transformers>=4.37.2 to ensure the model works normally.
@@ -150,10 +150,6 @@ model = AutoModel.from_pretrained(
     trust_remote_code=True).eval()
 ```
-#### BNB 4-bit Quantization
-> **⚠️ Warning:** Due to significant quantization errors with BNB 4-bit quantization on InternViT-6B, the model may produce nonsensical outputs and fail to understand images. Therefore, please avoid using BNB 4-bit quantization.
 #### Multiple GPUs
 The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.
@@ -413,7 +409,7 @@ response, history = model.chat(tokenizer, pixel_values, question, generation_con
                                num_patches_list=num_patches_list, history=None, return_history=True)
 print(f'User: {question}\nAssistant: {response}')
-question = 'Describe this video in detail. Don\'t repeat.'
 response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                                num_patches_list=num_patches_list, history=history, return_history=True)
 print(f'User: {question}\nAssistant: {response}')

 ## Quick Start
+We provide an example code to run `InternVL2-26B` using `transformers`.
 > Please use transformers>=4.37.2 to ensure the model works normally.
     trust_remote_code=True).eval()
 ```
 #### Multiple GPUs
 The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.
                                num_patches_list=num_patches_list, history=None, return_history=True)
 print(f'User: {question}\nAssistant: {response}')
+question = 'Describe this video in detail.'
 response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                                num_patches_list=num_patches_list, history=history, return_history=True)
 print(f'User: {question}\nAssistant: {response}')