aari1995 commited on
Commit
8a06dec
·
verified ·
1 Parent(s): 67b2fb6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -288,10 +288,11 @@ The successor of German_Semantic_STS_V2 is here!
288
 
289
  ## Major updates and USPs:
290
 
291
- - **Sequence length:** 8192, (16 times more than V2 and other models) => thanks to the ALiBi implementation of Jina-Team!
292
  - **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
293
  - **License:** Apache 2.0
294
- - **German only:** This model is German-only, causing the model to learn more efficient and deal better with shorter queries.
 
295
  - **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
296
 
297
  ## Usage:
@@ -300,7 +301,7 @@ The successor of German_Semantic_STS_V2 is here!
300
  from sentence_transformers import SentenceTransformer
301
 
302
 
303
- matryoshka_dim = 1024 # How big your embeddings should be, choose from: 64, 128, 256, 512, 1024
304
  model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True, truncate_dim=matryoshka_dim)
305
 
306
  # model.truncate_dim = 64 # truncation dimensions can also be changed after loading
 
288
 
289
  ## Major updates and USPs:
290
 
291
+ - **Sequence length:** 8192, (16 times more than V2 and other models) -> thanks to the ALiBi implementation of Jina-Team!
292
  - **Matryoshka Embeddings:** The model is trained for embedding sizes from 1024 down to 64, allowing you to store much smaller embeddings with little quality loss.
293
  - **License:** Apache 2.0
294
+ - **German only:** This model is German-only, causing the model to learn more efficient thanks to its tokenizer, deal better with shorter queries and generally be more nuanced.
295
+ - **Updated knowledge and quality data:** The backbone of this model is gbert-large by deepset. With Stage-2 pretraining on German fineweb by occiglot (newest only), up-to-date knowledge is ensured.
296
  - **Flexibility:** Trained with flexible sequence-length and embedding truncation, flexibility is a core feature of the model, while improving on V2-performance.
297
 
298
  ## Usage:
 
301
  from sentence_transformers import SentenceTransformer
302
 
303
 
304
+ matryoshka_dim = 1024 # How big your embeddings should be, choose from: 64, 128, 256, 512, 768, 1024
305
  model = SentenceTransformer("aari1995/German_Semantic_V3", trust_remote_code=True, truncate_dim=matryoshka_dim)
306
 
307
  # model.truncate_dim = 64 # truncation dimensions can also be changed after loading