Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Paper • 2410.23320 • Published Oct 30 • 7
Llama3-8B-1.58 Collection A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated Sep 14 • 12
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated Nov 2 • 160
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co./datasets?other=sentence-transformers • 67 items • Updated Jul 3 • 87
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24 • 59
Gemma release Collection Groups the Gemma models released by the Google team. • 40 items • Updated 12 days ago • 327
Canonical models Collection This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace • 68 items • Updated Feb 13 • 14
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 10 items • Updated 12 days ago • 47
Switch-Transformers release Collection This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated 12 days ago • 15
zephyr story Collection sources mentioned by hf.co/thomwolf tweet: x.com/Thom_Wolf/status/1720503998518640703 • 8 items • Updated Jan 24 • 15
Distil-Whisper Models Collection The first version of the Distil-Whisper models released with the Distil-Whisper paper. • 4 items • Updated Mar 21 • 36