Edd's picture

Edd

Erland

·

AI & ML interests

None yet

Recent Activity

liked a Space 5 days ago

nanotron/ultrascale-playbook

liked a model 7 days ago

CohereForAI/aya-expanse-8b

updated a dataset 9 days ago

Erland/alpaca-cleaned-1000

View all activity

Organizations

None yet

Erland's activity

upvoted a collection 26 days ago

Mistral-Small-24B-2501 (All Versions)

A collection of Mistral's new Small 2501 models including GGUF, 4-bit and more! • 9 items • Updated 1 day ago • 5

upvoted a collection about 1 month ago

DeepSeek R1 (All Versions)

DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 1 day ago • 202

upvoted 2 papers about 2 months ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 258

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 91

upvoted a collection about 2 months ago

Phi-4 (All Versions)

Microsoft's new Phi-4 models including mini & multimodal in all formats. Includes GGUF, 4-bit bnb and original versions. Includes Unsloth's bug fixes. • 7 items • Updated 1 day ago • 42

upvoted a paper about 2 months ago

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 50

upvoted 8 collections 3 months ago

4bit Instruct Models

18 items • Updated 1 day ago • 28

Qwen 2.5

32 items • Updated 1 day ago • 8

Load 4bit models 4x faster

Native bitsandbytes 4bit pre quantized models • 25 items • Updated 1 day ago • 55

Qwen 2.5 Coder

Complete collection of Code-specific model series for Qwen2.5 in bnb 4bit, 16bit and GGUF formats. • 35 items • Updated 1 day ago • 26

Llama 3.2 Vision

Meta's Llama 3.2 vision models 11B and 90B. Include 4-bit bnb and original versions. • 8 items • Updated 1 day ago • 7

Llama 3.2

Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. • 27 items • Updated 1 day ago • 54

Vision/multimodal Models

Collection of the most popular vision models including Llama 3.2, LlaVa, Qwen2 VL, Pixtral, PaliGemma and more! • 25 items • Updated 1 day ago • 6

Unsloth 4-bit Dynamic Quants

Unsloths Dynamic 4bit Quants selectively skips quantizing certain parameters; greatly improving accuracy while only using <10% more VRAM than BnB 4bit • 22 items • Updated 1 day ago • 53

upvoted a collection 4 months ago

LLM Reasoning Papers

Papers to improve reasoning capabilities of LLMs • 20 items • Updated Jan 15 • 118

upvoted 2 papers 6 months ago

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Paper • 2409.07146 • Published Sep 11, 2024 • 20

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 89

upvoted an article 7 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14, 2024

• 59

upvoted an article 9 months ago

Article

Indexify: Bringing HuggingFace Models to Real-Time Pipelines for Production Applications

By

•

May 31, 2024

• 7

upvoted a collection 9 months ago

Blackhole

A black hole with lots of high-quality dialogue datasets in many fields, and multilingual helps to train LLMs with SFT and DPO methods easier. • 32 items • Updated Aug 18, 2024 • 6