Tom Aarsen's picture

Tom Aarsen

tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

liked a model about 6 hours ago
HuggingFaceTB/SmolVLM-256M-Instruct
posted an update about 7 hours ago
I just released Sentence Transformers v3.4.0, featuring a memory leak fix, compatibility between the powerful Cached... losses and the Matryoshka loss modifier, and a bunch of fixes & small features. šŸŖ† Matryoshka & Cached loss compatibility It is now possible to combine the powerful Cached... losses (which use in-batch negatives & a caching mechanism to allow for endless batch size & negatives) with the Matryoshka loss modifier which modifies a base loss such that it is trained not only on the maximum dimensionality (e.g. 1024 dimensions), but also on many lower dimensions (e.g. 768, 512, 256, 128, 64, 32). After training, these models' embeddings can be truncated for faster retrieval, etc. šŸŽžļø Resolve memory leak when Model and Trainer are reinitialized Due to a circular dependency between Trainer -> Model -> ModelCardData -> Trainer, deleting both the trainer & model still didn't free up the memory. This led to a memory leak in scripts where you repeatedly do so. āž• New Features Many new small features, e.g. multi-GPU support for 'mine_hard_negatives', a 'margin' parameter to TripletEvaluator, and Matthews Correlation Coefficient in the BinaryClassificationEvaluator. šŸ› Bug Fixes Also a bunch of fixes, for example that subsequent batches were not sorted when using the "no_duplicates" batch sampler. See the release notes for more details. Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.4.0 Big thanks to all community members who assisted in this release. 10 folks with their first contribution this time around!
View all activity

Articles

Organizations

Hugging Face's profile picture Sentence Transformers's profile picture Sentence Transformers - Cross-Encoders's profile picture Hugging Face Internal Testing Organization's profile picture SetFit's profile picture Massive Text Embedding Benchmark's profile picture Hugging Face Fellows's profile picture Nomic AI's profile picture Open-Source AI Meetup's profile picture Hugging Face OSS Metrics's profile picture Blog-explorers's profile picture Sentence Transformers Testing's profile picture mLLM multilingual's profile picture Social Post Explorers's profile picture Answer.AI's profile picture gg-tt's profile picture Distillation Hugs's profile picture Hugging Face Discord Community's profile picture Bert ... but new's profile picture

tomaarsen's activity

New activity in minishlab/potion-base-8M 2 days ago

Update path to ""

1
#4 opened 2 days ago by
tomaarsen
New activity in kenoc/mxbai-abat-matryoshka 7 days ago
New activity in cnmoro/static-retrieval-distilbert-ptbr 7 days ago

Performance

1
#1 opened 7 days ago by
tomaarsen
New activity in mteb/leaderboard 8 days ago
New activity in huggingface/documentation-images 8 days ago
New activity in huggingface/documentation-images 10 days ago

Add more optimized images

#420 opened 10 days ago by
tomaarsen