29 119 570

Ahmet

atasoglu

atasoglu

AI & ML interests

NLP, LLMs.

Recent Activity

liked a dataset 1 day ago

selimc/bilmecebench

upvoted a collection 6 days ago

Feb 14 Releases 💌

reacted to merve's post with ❤️ 6 days ago

Your weekly recap of open AI is here, and it's packed with models! https://huggingface.co./collections/merve/feb-14-releases-67af876b404cc27c6d837767 👀 Multimodal > OpenGVLab released InternVideo 2.5 Chat models, new video LMs with long context > AIDC released Ovis2 model family along with Ovis dataset, new vision LMs in different sizes (1B, 2B, 4B, 8B, 16B, 34B), with video and OCR support > ColQwenStella-2b is a multilingual visual retrieval model that is sota in it's size > Hoags-2B-Exp is a new multilingual vision LM with contextual reasoning, long context video understanding 💬 LLMs A lot of math models! > Open-R1 team released OpenR1-Math-220k large scale math reasoning dataset, along with Qwen2.5-220K-Math fine-tuned on the dataset, OpenR1-Qwen-7B > Nomic AI released new Nomic Embed multilingual retrieval model, a MoE with 500 params with 305M active params, outperforming other models > DeepScaleR-1.5B-Preview is a new DeepSeek-R1-Distill fine-tune using distributed RL on math > LIMO is a new fine-tune of Qwen2.5-32B-Instruct on Math 🗣️ Audio > Zonos-v0.1 is a new family of speech recognition models, which contains the model itself and embeddings 🖼️ Vision and Image Generation > We have ported DepthPro of Apple to transformers for your convenience! > illustrious-xl-v1.0 is a new illustration generation model

View all activity

Organizations

atasoglu's activity

liked a dataset 1 day ago

selimc/bilmecebench

Viewer • Updated 2 days ago • 442 • 8 • 4

upvoted a collection 6 days ago

Feb 14 Releases 💌

Collection

23 items • Updated 6 days ago • 7

reacted to merve's post with ❤️ 6 days ago

Post

4550

Your weekly recap of open AI is here, and it's packed with models! merve/feb-14-releases-67af876b404cc27c6d837767

👀 Multimodal
> OpenGVLab released InternVideo 2.5 Chat models, new video LMs with long context
> AIDC released Ovis2 model family along with Ovis dataset, new vision LMs in different sizes (1B, 2B, 4B, 8B, 16B, 34B), with video and OCR support
> ColQwenStella-2b is a multilingual visual retrieval model that is sota in it's size
> Hoags-2B-Exp is a new multilingual vision LM with contextual reasoning, long context video understanding

💬 LLMs
A lot of math models!
> Open-R1 team released OpenR1-Math-220k large scale math reasoning dataset, along with Qwen2.5-220K-Math fine-tuned on the dataset, OpenR1-Qwen-7B
> Nomic AI released new Nomic Embed multilingual retrieval model, a MoE with 500 params with 305M active params, outperforming other models
> DeepScaleR-1.5B-Preview is a new DeepSeek-R1-Distill fine-tune using distributed RL on math
> LIMO is a new fine-tune of Qwen2.5-32B-Instruct on Math

🗣️ Audio
> Zonos-v0.1 is a new family of speech recognition models, which contains the model itself and embeddings

🖼️ Vision and Image Generation
> We have ported DepthPro of Apple to transformers for your convenience!
> illustrious-xl-v1.0 is a new illustration generation model

3 replies

liked a dataset 11 days ago

yusufbaykaloglu/University_Mevzuat_QA_v2

Viewer • Updated 14 days ago • 14.3k • 139 • 8

upvoted a collection 13 days ago

🧠 Reasoning datasets

Collection

Datasets with reasoning traces for math and code released by the community • 12 items • Updated about 11 hours ago • 74

liked a model 13 days ago

umarigan/llama-3.2-8B-R1-Tr

Text Generation • Updated 12 days ago • 33 • 1

upvoted an article 15 days ago

Article

Open-source DeepResearch – Freeing our search agents

17 days ago

• 1.06k

liked a dataset 19 days ago

open-thoughts/OpenThoughts-114k

Viewer • Updated about 13 hours ago • 228k • 94.9k • 562

liked a model 19 days ago

open-thoughts/OpenThinker-7B

Text Generation • Updated 9 days ago • 7.7k • 109

liked a model 23 days ago

deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated 20 days ago • 114k • 370

upvoted a collection 23 days ago

DeepSeek-R1

Collection

8 items • Updated about 1 month ago • 519

liked a model 23 days ago

unsloth/DeepSeek-R1-GGUF

Text Generation • Updated 7 days ago • 2.46M • 866

reacted to merve's post with 🚀 25 days ago

Post

2255

smolagents can see 🔥
we just shipped vision support to smolagents 🤗 agentic computers FTW

you can now:
💻 let the agent get images dynamically (e.g. agentic web browser)
📑 pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! 🤯
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🤠

read our blog http://hf.co/blog/smolagents-can-see

upvoted a collection 25 days ago

Qwen2.5-VL

Collection

Vision-language model series based on Qwen2.5 • 3 items • Updated 25 days ago • 356

upvoted a paper 27 days ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 29 days ago • 325

reacted to merve's post with 🔥 27 days ago

Post

5177

Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images