Collections
Discover the best community collections!
Collections including paper arxiv:2402.00838
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 61 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 59 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 80
-
Attention Is All You Need
Paper • 1706.03762 • Published • 44 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14
-
google/flan-t5-large
Text2Text Generation • Updated • 1.14M • • 601 -
deepseek-ai/deepseek-coder-6.7b-instruct
Text Generation • Updated • 48.8k • 350 -
Object Recognition as Next Token Prediction
Paper • 2312.02142 • Published • 11 -
colbert-ir/dspy-Oct11-T5-Large-MH-3k-v1
Text2Text Generation • Updated • 7 • 1
-
Holistic Evaluation of Text-To-Image Models
Paper • 2311.04287 • Published • 11 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 13 -
Trusted Source Alignment in Large Language Models
Paper • 2311.06697 • Published • 10 -
DiLoCo: Distributed Low-Communication Training of Language Models
Paper • 2311.08105 • Published • 14
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 87 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 65 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 39 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96