7 26 78

neuralink

AI & ML interests

nanotron @ hf

Recent Activity

new activity 1 day ago

nanotron/ultrascale-playbook:Make hash section working

upvoted an article 8 days ago

Open-source DeepResearch – Freeing our search agents

liked a Space 8 days ago

m-ric/open_Deep-Research

View all activity

Organizations

neuralink's activity

New activity in nanotron/ultrascale-playbook 1 day ago

Make hash section working

#89 opened 1 day ago by

mishig

upvoted an article 8 days ago

Article

Open-source DeepResearch – Freeing our search agents

25 days ago

• 1.11k

liked a Space 8 days ago

592

Open Deep-Research

🏆

OpenAI's Deep Research, but open

New activity in nanotron/ultrascale-playbook 8 days ago

More ressources

#73 opened 9 days ago by

eliebak

liked a Space 8 days ago

1.79k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

New activity in nanotron/ultrascale-playbook 10 days ago

xrsrke/link_nanotron_fp8_appexdix

#21 opened 11 days ago by

neuralink

xrsrke/fix_width_height_for_fp8_graph

#46 opened 10 days ago by

neuralink

updated a Space 10 days ago

1.79k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

New activity in nanotron/ultrascale-playbook 10 days ago

xrsrke/add_interactive_fp8_loss_curve

#43 opened 10 days ago by

neuralink

upvoted an article 23 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 782

upvoted an article 24 days ago

Article

Open-R1: Update #1

and 7 others •

27 days ago

• 289

upvoted a paper about 1 month ago

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Paper • 2409.15241 • Published Sep 23, 2024 • 1

upvoted a paper about 2 months ago

Scaling Laws for Floating Point Quantization Training

Paper • 2501.02423 • Published Jan 5 • 26

liked 2 Spaces 2 months ago

Scaling With Vocab Demo

📊

Predict optimal vocabulary size based on model parameters

Harm Space

⚡

liked a model 3 months ago

tencent/Tencent-Hunyuan-Large

Text Generation • Updated Jan 19 • 240 • 568

upvoted a paper 3 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 256

reacted to ArthurZ's post with 🔥 3 months ago

Post

3371

Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! 🔥

liked a model 5 months ago

meta-llama/Llama-3.2-11B-Vision

Image-Text-to-Text • Updated Sep 27, 2024 • 95.7k • 473

updated a model 5 months ago

nanotron/temp_for_pr_review

Updated Sep 24, 2024