mfarre HF staff commited on
Commit
f730b73
·
verified ·
1 Parent(s): a3992d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -4,6 +4,16 @@ license: apache-2.0
4
  datasets:
5
  - HuggingFaceM4/the_cauldron
6
  - HuggingFaceM4/Docmatix
 
 
 
 
 
 
 
 
 
 
7
  pipeline_tag: video-text-to-text
8
  language:
9
  - en
@@ -17,9 +27,7 @@ base_model:
17
 
18
  # SmolVLM2-500M-Video
19
 
20
- SmolVLM2-500M-Video is a model optimized for video that accepts video, arbitrary sequences of image and text inputs to produce text outputs. It can answer questions about media files, compare images, describe visual content, or transcribe text.
21
- Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks. It can run inference on a video with 1.8GB of GPU RAM.
22
-
23
  ## Model Summary
24
 
25
  - **Developed by:** Hugging Face 🤗
 
4
  datasets:
5
  - HuggingFaceM4/the_cauldron
6
  - HuggingFaceM4/Docmatix
7
+ - lmms-lab/LLaVA-OneVision-Data
8
+ - lmms-lab/M4-Instruct-Data
9
+ - HuggingFaceFV/finevideo
10
+ - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
11
+ - lmms-lab/LLaVA-Video-178K
12
+ - orrzohar/Video-STaR
13
+ - Mutonix/Vript
14
+ - TIGER-Lab/VISTA-400K
15
+ - Enxin/MovieChat-1K_train
16
+ - ShareGPT4Video/ShareGPT4Video
17
  pipeline_tag: video-text-to-text
18
  language:
19
  - en
 
27
 
28
  # SmolVLM2-500M-Video
29
 
30
+ SmolVLM2-500M-Video is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 1.8GB of GPU RAM for video inference, it delivers robust performance on complex multimodal tasks. This efficiency makes it particularly well-suited for on-device applications where computational resources may be limited.
 
 
31
  ## Model Summary
32
 
33
  - **Developed by:** Hugging Face 🤗