Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
•
2412.13663
•
Published
•
103
Multilingual Datasets for everyone
export_static_quantized_openvino_model
method to quantize a model.prompts
argument in SentenceTransformerTrainingArguments
. Our experiments show that you can easily reach 0.66% to 0.90% relative performance improvement on NDCG@10 at no extra cost by adding "query: " before each training query and "document: " before each training answer.SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")
. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later 😉from_model2vec
or with from_distillation
where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.mine_hard_negatives
docs: https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negativesargs.push_to_hub=True
and args.hub_model_id
to upload your model checkpoints to Hugging Face while training. It also uploads your emissions (if codecarbon is installed) and your Tensorboard logs (if tensorboard is installed)model.similarity(embeddings1, embeddings2)
and you'll get your similarity scores immediately. Model authors can specify their desired similarity score, so you don't have to worry about it anymore!truncate_dim
option to the Sentence Transformer constructor. This also allows truncation when using HuggingFaceEmbeddings
from LlamaIndex or LangChain.truncate_dim
in evaluators to get the performance after truncation. (Hint: it's surprisingly good, even for models not trained with MatryoshkaLoss, and it can speed up e.g. clustering, retrieval, etc.)trust_remote_code
to load models with custom modelling code.