Update README.md
Browse files
README.md
CHANGED
@@ -8978,32 +8978,15 @@ Based on dunzhang/stella_en_1.5B_v5 and google/siglip-so400m-patch14-384.
|
|
8978 |
|
8979 |
It can encode both text and images.
|
8980 |
|
8981 |
-
|
8982 |
|
8983 |
-
|
8984 |
-
|
8985 |
-
**Codes:** https://github.com/NLPJCL/RAG-Retrieval (will release the training codes of stella and jasper in a few weeks)
|
8986 |
|
8987 |
**Data:** https://huggingface.co/datasets/infgrad/jasper_text_distill_dataset
|
8988 |
|
8989 |
**Training logs:** https://api.wandb.ai/links/dunnzhang0/z8jqoqpb
|
8990 |
|
8991 |
-
Here's a short introduction to the training method:
|
8992 |
-
|
8993 |
The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
|
8994 |
-
The training process of jasper have 4 stage:
|
8995 |
-
|
8996 |
-
Stage1&2: Distill from teacher vectors. In jasper model the teacher model is nvidia/NV-Embed-v2 and dunzhang/stella_en_1.5B_v5 (Stage1 and Stage2 will freeze different parameters.)
|
8997 |
-
|
8998 |
-
Stage3: MRL training, I made some modifications to MRL to enable training on unsupervised text
|
8999 |
-
|
9000 |
-
Stage4: Alignment between *jasper token embeddings from image's detailed caption* and *vision embeddings from google/siglip-so400m-patch14-384*.
|
9001 |
-
|
9002 |
-
I use a AdaptiveAvgPool2d to do an adjustment on vision tokens' number and dimensions, this method does not need additional parameters.
|
9003 |
-
|
9004 |
-
**The meaning of distillation is to achieve better results with smaller models or as a way of pre-training, not to hit the top of the leaderboards.**
|
9005 |
-
Actually, I've got first place on MTEB (Chinese and English), I will not release the two models, as I said before, it's meaningless and has poor generalisability.
|
9006 |
-
|
9007 |
|
9008 |
|
9009 |
## Usage
|
@@ -9077,3 +9060,19 @@ script: ./scripts/evaluate_en_mteb/run_evaluate_mteb.py
|
|
9077 |
|
9078 |
## License
|
9079 |
**This model should not be used for any commercial purpose!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8978 |
|
8979 |
It can encode both text and images.
|
8980 |
|
8981 |
+
**Report:** https://arxiv.org/abs/2412.19048
|
8982 |
|
8983 |
+
**Codes:** https://github.com/NLPJCL/RAG-Retrieval
|
|
|
|
|
8984 |
|
8985 |
**Data:** https://huggingface.co/datasets/infgrad/jasper_text_distill_dataset
|
8986 |
|
8987 |
**Training logs:** https://api.wandb.ai/links/dunnzhang0/z8jqoqpb
|
8988 |
|
|
|
|
|
8989 |
The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8990 |
|
8991 |
|
8992 |
## Usage
|
|
|
9060 |
|
9061 |
## License
|
9062 |
**This model should not be used for any commercial purpose!**
|
9063 |
+
|
9064 |
+
## Citation
|
9065 |
+
|
9066 |
+
```
|
9067 |
+
|
9068 |
+
@misc{zhang2025jasperstelladistillationsota,
|
9069 |
+
title={Jasper and Stella: distillation of SOTA embedding models},
|
9070 |
+
author={Dun Zhang and Jiacheng Li and Ziyang Zeng and Fulong Wang},
|
9071 |
+
year={2025},
|
9072 |
+
eprint={2412.19048},
|
9073 |
+
archivePrefix={arXiv},
|
9074 |
+
primaryClass={cs.IR},
|
9075 |
+
url={https://arxiv.org/abs/2412.19048},
|
9076 |
+
}
|
9077 |
+
|
9078 |
+
```
|