Umitcan Sahin PRO

ucsahin

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

published a dataset about 12 hours ago
ucsahin/TR-Visual-Docs
liked a model about 14 hours ago
microsoft/Phi-4-multimodal-instruct
View all activity

Organizations

None yet

ucsahin's activity

New activity in ucsahin/TR-Visual-Docs about 12 hours ago
reacted to merve's post with πŸš€ 8 days ago
view post
Post
5148
Google just released PaliGemma 2 Mix: new versatile instruction vision language models πŸ”₯

> Three new models: 3B, 10B, 28B with res 224, 448 πŸ’™
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🀯

Read more https://huggingface.co./blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4
reacted to davanstrien's post with πŸ‘ 29 days ago
reacted to merve's post with πŸ”₯ about 1 month ago
view post
Post
5208
Oof, what a week! πŸ₯΅ So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal πŸ’¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG πŸ’—
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🀯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs πŸ“–
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🀯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio πŸ—£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Β·
reacted to kadirnar's post with πŸš€πŸ”₯ about 1 month ago
view post
Post
2963
I created my own AI image and video from scratch using the fal.ai platform πŸ’«

Workflow: Flux Lora Training + Upscale + Kling AI(1.6)
Β·
reacted to fdaudens's post with πŸš€ about 2 months ago
view post
Post
2328
πŸ”₯ The AI Agent hype is real! This blog post deep dives into everything you need to know before deploying them: from key definitions to practical recommendations. A must-read for anyone building the future of autonomous systems.

πŸ“Š Key insight: A clear table breaking down the 5 levels of AI agents - from simple processors to fully autonomous systems. Essential framework for understanding where your agent stands on the autonomy spectrum

βš–οΈ Deep analysis of 15 core values reveals critical trade-offs: accuracy, privacy, safety, equity & more. The same features that make agents powerful can make them risky. Understanding these trade-offs is crucial for responsible deployment

🎯 6 key recommendations for the road ahead:
- Create rigorous evaluation protocols
- Study societal effects
- Understand ripple effects
- Improve transparency
- Open source can make a positive difference
- Monitor base model evolution

Read the blog post: https://huggingface.co./blog/ethics-soc-7 Brillant work by @meg @evijit @sasha @giadap