czczup commited on
Commit
ea891f5
·
verified ·
1 Parent(s): 9237d25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -7
README.md CHANGED
@@ -83,7 +83,7 @@ The training pipeline for a single model in InternVL 2.5 is structured across th
83
 
84
  We introduce a progressive scaling strategy to align the vision encoder with LLMs efficiently. This approach trains with smaller LLMs first (e.g., 20B) to optimize foundational visual capabilities and cross-modal alignment before transferring the vision encoder to larger LLMs (e.g., 72B) without retraining. This reuse skips intermediate stages for larger models.
85
 
86
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AVb_PSxhJq1z2eUFNYoqQ.png)
87
 
88
  Compared to Qwen2-VL's 1.4 trillion tokens, InternVL2.5-78B uses only 120 billion tokens—less than one-tenth. This strategy minimizes redundancy, maximizes pre-trained component reuse, and enables efficient training for complex vision-language tasks.
89
 
@@ -166,7 +166,7 @@ As shown in the following figure, from InternVL 1.5 to 2.0 and then to 2.5, the
166
 
167
  ### Video Understanding
168
 
169
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/uD5aYt2wNYL94Xn8MOVih.png)
170
 
171
  ## Evaluation on Language Capability
172
 
@@ -543,10 +543,10 @@ Many repositories now support fine-tuning of the InternVL series models, includi
543
 
544
  ### LMDeploy
545
 
546
- LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
547
 
548
  ```sh
549
- pip install lmdeploy>=0.5.3
550
  ```
551
 
552
  LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline.
@@ -570,8 +570,6 @@ If `ImportError` occurs while executing this case, please install the required d
570
 
571
  When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased.
572
 
573
- question = 'Describe this video in detail.'
574
-
575
  ```python
576
  from lmdeploy import pipeline, TurbomindEngineConfig
577
  from lmdeploy.vl import load_image
@@ -635,7 +633,7 @@ print(sess.response.text)
635
  LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
636
 
637
  ```shell
638
- lmdeploy serve api_server OpenGVLab/InternVL2_5-78B --backend turbomind --server-port 23333 --tp 4
639
  ```
640
 
641
  To use the OpenAI-style interface, you need to install OpenAI:
 
83
 
84
  We introduce a progressive scaling strategy to align the vision encoder with LLMs efficiently. This approach trains with smaller LLMs first (e.g., 20B) to optimize foundational visual capabilities and cross-modal alignment before transferring the vision encoder to larger LLMs (e.g., 72B) without retraining. This reuse skips intermediate stages for larger models.
85
 
86
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/UoNUyS7ctN5pBxNv9KnzH.png)
87
 
88
  Compared to Qwen2-VL's 1.4 trillion tokens, InternVL2.5-78B uses only 120 billion tokens—less than one-tenth. This strategy minimizes redundancy, maximizes pre-trained component reuse, and enables efficient training for complex vision-language tasks.
89
 
 
166
 
167
  ### Video Understanding
168
 
169
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/tcwH-i1qc8H16En-7AZ5M.png)
170
 
171
  ## Evaluation on Language Capability
172
 
 
543
 
544
  ### LMDeploy
545
 
546
+ LMDeploy is a toolkit for compressing, deploying, and serving LLMs & VLMs.
547
 
548
  ```sh
549
+ pip install lmdeploy>=0.6.4
550
  ```
551
 
552
  LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference pipeline.
 
570
 
571
  When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the context window typically needs to be increased.
572
 
 
 
573
  ```python
574
  from lmdeploy import pipeline, TurbomindEngineConfig
575
  from lmdeploy.vl import load_image
 
633
  LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
634
 
635
  ```shell
636
+ lmdeploy serve api_server OpenGVLab/InternVL2_5-78B --server-port 23333 --tp 4
637
  ```
638
 
639
  To use the OpenAI-style interface, you need to install OpenAI: