czczup commited on
Commit
1bfc2ba
·
verified ·
1 Parent(s): 102b9d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -115,7 +115,7 @@ Limitations: Although we have made efforts to ensure the safety of the model dur
115
 
116
  ## Quick Start
117
 
118
- We provide an example code to run InternVL2-26B using `transformers`.
119
 
120
  > Please use transformers>=4.37.2 to ensure the model works normally.
121
 
@@ -150,10 +150,6 @@ model = AutoModel.from_pretrained(
150
  trust_remote_code=True).eval()
151
  ```
152
 
153
- #### BNB 4-bit Quantization
154
-
155
- > **⚠️ Warning:** Due to significant quantization errors with BNB 4-bit quantization on InternViT-6B, the model may produce nonsensical outputs and fail to understand images. Therefore, please avoid using BNB 4-bit quantization.
156
-
157
  #### Multiple GPUs
158
 
159
  The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.
@@ -413,7 +409,7 @@ response, history = model.chat(tokenizer, pixel_values, question, generation_con
413
  num_patches_list=num_patches_list, history=None, return_history=True)
414
  print(f'User: {question}\nAssistant: {response}')
415
 
416
- question = 'Describe this video in detail. Don\'t repeat.'
417
  response, history = model.chat(tokenizer, pixel_values, question, generation_config,
418
  num_patches_list=num_patches_list, history=history, return_history=True)
419
  print(f'User: {question}\nAssistant: {response}')
 
115
 
116
  ## Quick Start
117
 
118
+ We provide an example code to run `InternVL2-26B` using `transformers`.
119
 
120
  > Please use transformers>=4.37.2 to ensure the model works normally.
121
 
 
150
  trust_remote_code=True).eval()
151
  ```
152
 
 
 
 
 
153
  #### Multiple GPUs
154
 
155
  The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.
 
409
  num_patches_list=num_patches_list, history=None, return_history=True)
410
  print(f'User: {question}\nAssistant: {response}')
411
 
412
+ question = 'Describe this video in detail.'
413
  response, history = model.chat(tokenizer, pixel_values, question, generation_config,
414
  num_patches_list=num_patches_list, history=history, return_history=True)
415
  print(f'User: {question}\nAssistant: {response}')