infgrad commited on
Commit
6fae668
·
verified ·
1 Parent(s): c37f0e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -19
README.md CHANGED
@@ -8978,32 +8978,15 @@ Based on dunzhang/stella_en_1.5B_v5 and google/siglip-so400m-patch14-384.
8978
 
8979
  It can encode both text and images.
8980
 
8981
- Essay writing is more complicated than I thought, and we're working on it. This work was accomplished during my free time; please grant time a little time.
8982
 
8983
- Below are some links:
8984
-
8985
- **Codes:** https://github.com/NLPJCL/RAG-Retrieval (will release the training codes of stella and jasper in a few weeks)
8986
 
8987
  **Data:** https://huggingface.co/datasets/infgrad/jasper_text_distill_dataset
8988
 
8989
  **Training logs:** https://api.wandb.ai/links/dunnzhang0/z8jqoqpb
8990
 
8991
- Here's a short introduction to the training method:
8992
-
8993
  The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
8994
- The training process of jasper have 4 stage:
8995
-
8996
- Stage1&2: Distill from teacher vectors. In jasper model the teacher model is nvidia/NV-Embed-v2 and dunzhang/stella_en_1.5B_v5 (Stage1 and Stage2 will freeze different parameters.)
8997
-
8998
- Stage3: MRL training, I made some modifications to MRL to enable training on unsupervised text
8999
-
9000
- Stage4: Alignment between *jasper token embeddings from image's detailed caption* and *vision embeddings from google/siglip-so400m-patch14-384*.
9001
-
9002
- I use a AdaptiveAvgPool2d to do an adjustment on vision tokens' number and dimensions, this method does not need additional parameters.
9003
-
9004
- **The meaning of distillation is to achieve better results with smaller models or as a way of pre-training, not to hit the top of the leaderboards.**
9005
- Actually, I've got first place on MTEB (Chinese and English), I will not release the two models, as I said before, it's meaningless and has poor generalisability.
9006
-
9007
 
9008
 
9009
  ## Usage
@@ -9077,3 +9060,19 @@ script: ./scripts/evaluate_en_mteb/run_evaluate_mteb.py
9077
 
9078
  ## License
9079
  **This model should not be used for any commercial purpose!**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8978
 
8979
  It can encode both text and images.
8980
 
8981
+ **Report:** https://arxiv.org/abs/2412.19048
8982
 
8983
+ **Codes:** https://github.com/NLPJCL/RAG-Retrieval
 
 
8984
 
8985
  **Data:** https://huggingface.co/datasets/infgrad/jasper_text_distill_dataset
8986
 
8987
  **Training logs:** https://api.wandb.ai/links/dunnzhang0/z8jqoqpb
8988
 
 
 
8989
  The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
 
 
 
 
 
 
 
 
 
 
 
 
 
8990
 
8991
 
8992
  ## Usage
 
9060
 
9061
  ## License
9062
  **This model should not be used for any commercial purpose!**
9063
+
9064
+ ## Citation
9065
+
9066
+ ```
9067
+
9068
+ @misc{zhang2025jasperstelladistillationsota,
9069
+ title={Jasper and Stella: distillation of SOTA embedding models},
9070
+ author={Dun Zhang and Jiacheng Li and Ziyang Zeng and Fulong Wang},
9071
+ year={2025},
9072
+ eprint={2412.19048},
9073
+ archivePrefix={arXiv},
9074
+ primaryClass={cs.IR},
9075
+ url={https://arxiv.org/abs/2412.19048},
9076
+ }
9077
+
9078
+ ```