deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B Text Generation • Updated 6 days ago • 844k • • 813
sentence-transformers/all-mpnet-base-v2 Sentence Similarity • Updated Nov 5, 2024 • 33.7M • • 983
deepseek-ai/deepseek-coder-1.3b-instruct Text Generation • Updated Mar 7, 2024 • 59.9k • • 111
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Paper • 2403.17031 • Published Mar 24, 2024 • 6