-
Atla Selene Mini: A General Purpose Evaluation Model
Paper • 2501.17195 • Published • 29 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 46 -
Optimizing Large Language Model Training Using FP4 Quantization
Paper • 2501.17116 • Published • 28 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 85
Felix Tuma
floom
·
AI & ML interests
NLP
Recent Activity
updated
a collection
4 days ago
ShowAndTell-2025-01-30
upvoted
a
paper
4 days ago
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
updated
a collection
4 days ago
ShowAndTell-2025-01-30
Organizations
None yet
Collections
29
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 17 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 89 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 40 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 9
models
None public yet
datasets
None public yet