Tom Aarsen's picture

Tom Aarsen

tomaarsen

·

https://linkedin.com/in/tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

new activity about 2 hours ago

tomaarsen/natural-questions-hard-negatives:Using hard negatives VS query, pos pair to train embedding models

upvoted a paper about 2 hours ago

Granite Embedding Models

new activity about 2 hours ago

ibm-granite/granite-embedding-278m-multilingual:List both Sentence Transformers and Transformers as compatible libraries

View all activity

Organizations

tomaarsen's activity

upvoted a paper about 2 hours ago

Granite Embedding Models

Paper • 2502.20204 • Published about 20 hours ago • 2

upvoted a collection about 4 hours ago

rank1

rank1 is the first test-time compute reasoning model in IR • 15 items • Updated about 21 hours ago • 3

upvoted a paper about 4 hours ago

NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published 1 day ago • 5

upvoted a paper 7 days ago

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published 9 days ago • 31

upvoted 2 collections 9 days ago

ModernGLiClass

GLiClass with ModernBERT backbone • 2 items • Updated 9 days ago • 6

The Ultimate Collection of Code Classifiers

🔥 15 classifiers, 124M parameters, one per programming language— for assessing the educational value of GitHub code • 15 items • Updated 8 days ago • 10

upvoted an article 10 days ago

Article

Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥

11 days ago

• 89

upvoted 2 articles 15 days ago

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

17 days ago

• 49

Article

1 Billion Classifications

16 days ago

• 39

upvoted a collection 16 days ago

Nomic Embed v2

Multilingual Embedding Models • 4 items • Updated 13 days ago • 12

upvoted an article 17 days ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

By

and 1 other •

17 days ago

• 25

upvoted an article 18 days ago

Article

Open R1: Update #2

By

and 6 others •

18 days ago

• 191

upvoted a paper 19 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 24 days ago • 195

upvoted 2 collections 22 days ago

GTE ModernBERT

GTE Models Based on ModernBERT • 2 items • Updated Jan 21 • 15

GTE models

General Text Embedding Models Released by Tongyi Lab of Alibaba Group • 21 items • Updated Jan 21 • 23

upvoted an article 23 days ago

Article

Open-source DeepResearch – Freeing our search agents

25 days ago

• 1.11k

upvoted an article 24 days ago

Article

Agentic RAG Stack (1/5) - Index and retrieve documents for vector search using Sentence Transformers and DuckDB

By

•

Jan 27

• 18

upvoted an article 28 days ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 415

upvoted 2 articles 29 days ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

By

•

29 days ago

• 33

Article

State of open video generation models in Diffusers

Jan 27

• 48