NovaSearch
/

jasper_en_vision_language_v1

sentence-transformers

Model card Files Files and versions Community

infgrad commited on 2 days ago

Commit

6fae668

·

verified ·

1 Parent(s): c37f0e4

Update README.md

Files changed (1) hide show

README.md +18 -19

README.md CHANGED Viewed

@@ -8978,32 +8978,15 @@ Based on dunzhang/stella_en_1.5B_v5 and google/siglip-so400m-patch14-384.
 It can encode both text and images.
-Essay writing is more complicated than I thought, and we're working on it. This work was accomplished during my free time; please grant time a little time.
-Below are some links:
-**Codes:** https://github.com/NLPJCL/RAG-Retrieval (will release the training codes of stella and jasper in a few weeks)
 **Data:** https://huggingface.co/datasets/infgrad/jasper_text_distill_dataset
 **Training logs:** https://api.wandb.ai/links/dunnzhang0/z8jqoqpb
-Here's a short introduction to the training method:
 The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
-The training process of jasper have 4 stage:
-Stage1&2: Distill from teacher vectors. In jasper model the teacher model is nvidia/NV-Embed-v2 and dunzhang/stella_en_1.5B_v5 (Stage1 and Stage2 will freeze different parameters.)
-Stage3: MRL training, I made some modifications to MRL to enable training on unsupervised text
-Stage4: Alignment between *jasper token embeddings from image's detailed caption* and *vision embeddings from google/siglip-so400m-patch14-384*.
-I use a AdaptiveAvgPool2d to do an adjustment on vision tokens' number and dimensions, this method does not need additional parameters.
-**The meaning of distillation is to achieve better results with smaller models or as a way of pre-training, not to hit the top of the leaderboards.**
-Actually, I've got first place on MTEB (Chinese and English), I will not release the two models, as I said before, it's meaningless and has poor generalisability.
 ## Usage
@@ -9077,3 +9060,19 @@ script: ./scripts/evaluate_en_mteb/run_evaluate_mteb.py
 ## License
 **This model should not be used for any commercial purpose!**

 It can encode both text and images.
+**Report:** https://arxiv.org/abs/2412.19048
+**Codes:** https://github.com/NLPJCL/RAG-Retrieval
 **Data:** https://huggingface.co/datasets/infgrad/jasper_text_distill_dataset
 **Training logs:** https://api.wandb.ai/links/dunnzhang0/z8jqoqpb
 The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
 ## Usage
 ## License
 **This model should not be used for any commercial purpose!**
+## Citation
+```
+@misc{zhang2025jasperstelladistillationsota,
+      title={Jasper and Stella: distillation of SOTA embedding models},
+      author={Dun Zhang and Jiacheng Li and Ziyang Zeng and Fulong Wang},
+      year={2025},
+      eprint={2412.19048},
+      archivePrefix={arXiv},
+      primaryClass={cs.IR},
+      url={https://arxiv.org/abs/2412.19048},
+}
+```