citation
Browse files
README.md
CHANGED
@@ -11989,7 +11989,7 @@ MiniCPM-Embedding-Light结构上采取双向注意力和 Weighted Mean Pooling [
|
|
11989 |
- Outstanding cross-lingual retrieval capabilities between Chinese and English.
|
11990 |
- Long-text support (up to 8192 tokens).
|
11991 |
- Dense vectors and token-level sparse vectors.
|
11992 |
-
- Variable dense vector dimensions (Matryoshka representation).
|
11993 |
|
11994 |
MiniCPM-Embedding-Light incorporates bidirectional attention and Weighted Mean Pooling [1] in its architecture. The model underwent multi-stage training using approximately 260 million training examples, including open-source, synthetic, and proprietary data.
|
11995 |
|
@@ -12000,6 +12000,7 @@ We also invite you to explore the UltraRAG series:
|
|
12000 |
- Domain Adaptive RAG Framework: [UltraRAG](https://github.com/openbmb/UltraRAG)
|
12001 |
|
12002 |
[1] Muennighoff, N. (2022). Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904.
|
|
|
12003 |
|
12004 |
## 模型信息 Model Information
|
12005 |
|
|
|
11989 |
- Outstanding cross-lingual retrieval capabilities between Chinese and English.
|
11990 |
- Long-text support (up to 8192 tokens).
|
11991 |
- Dense vectors and token-level sparse vectors.
|
11992 |
+
- Variable dense vector dimensions (Matryoshka representation [2]).
|
11993 |
|
11994 |
MiniCPM-Embedding-Light incorporates bidirectional attention and Weighted Mean Pooling [1] in its architecture. The model underwent multi-stage training using approximately 260 million training examples, including open-source, synthetic, and proprietary data.
|
11995 |
|
|
|
12000 |
- Domain Adaptive RAG Framework: [UltraRAG](https://github.com/openbmb/UltraRAG)
|
12001 |
|
12002 |
[1] Muennighoff, N. (2022). Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904.
|
12003 |
+
[2] Kusupati, Aditya, et al. "Matryoshka representation learning." Advances in Neural Information Processing Systems 35 (2022): 30233-30249.
|
12004 |
|
12005 |
## 模型信息 Model Information
|
12006 |
|