Anton Lozhkov's picture

Anton Lozhkov

anton-l

·

AI & ML interests

Generative Models, Distributed Training, Photo and Video Enhancement

Recent Activity

liked a dataset 1 day ago

HuggingFaceTB/dclm-edu

updated a dataset 1 day ago

HuggingFaceTB/dclm-edu

updated a dataset 1 day ago

HuggingFaceTB/dclm-edu

View all activity

Organizations

anton-l's activity

upvoted a collection 22 days ago

OpenR1-Math

Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co./blog/open-r1/update-2 • 3 items • Updated 22 days ago • 7

upvoted an article 26 days ago

Article

Open R1: Update #2

By

and 6 others •

26 days ago

• 197

upvoted a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 199

upvoted a collection 2 months ago

📐 FineMath

FineMath datasets and ablation models • 14 items • Updated 16 days ago • 19

upvoted a paper 6 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 126

upvoted 2 articles 8 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

• 329

Article

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

Jun 24, 2024

• 34

upvoted a paper 9 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 93

upvoted a collection 9 months ago

📀 Dataset comparison models

1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated Jun 12, 2024 • 37

upvoted a paper about 1 year ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 138

upvoted 2 papers over 1 year ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 123

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Paper • 2306.01116 • Published Jun 1, 2023 • 34