MrT5: Dynamic Token Merging for Efficient Byte-level Language Models Paper • 2410.20771 • Published 13 days ago • 2
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 17 days ago • 453
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 40
Refusal in Language Models Is Mediated by a Single Direction Paper • 2406.11717 • Published Jun 17 • 2
Larimar: Large Language Models with Episodic Memory Control Paper • 2403.11901 • Published Mar 18 • 32
view article Article Recommendation to Revisit the Diffuser Default LoRA Parameters By alvdansen • Jun 21 • 11
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs Paper • 2406.10209 • Published Jun 14 • 8
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding Paper • 2406.05540 • Published Jun 8 • 3
view article Article An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct By leonardlin • Jun 11 • 47
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 156
abliterated-v3 Collection Latest gen of the abliterated models I've produced • 17 items • Updated Jun 3 • 97
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages Paper • 2404.16816 • Published Apr 25 • 3