Zhisheng Zheng

zhisheng01

https://zhishengzheng.com/

zhisheng147

AI & ML interests

LLM, Speech and Audio Processing

Recent Activity

upvoted a paper 3 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

upvoted a paper 3 days ago

Slamming: Training a Speech Language Model on One GPU in a Day

upvoted a paper 9 days ago

Soundwave: Less is More for Speech-Text Alignment in LLMs

View all activity

Organizations

None yet

zhisheng01's activity

upvoted 2 papers 3 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 3 days ago • 61

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 9 days ago • 56

upvoted a paper 9 days ago

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published 10 days ago • 76

liked a dataset 15 days ago

baijs/AudioSetCaps

Preview • Updated Nov 27, 2024 • 255 • 18

liked 2 models 18 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Text Generation • Updated 5 days ago • 1.26M • • 954

deepseek-ai/DeepSeek-R1

Text Generation • Updated 5 days ago • 4.63M • • 10.5k

upvoted a paper 19 days ago

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Paper • 2502.05176 • Published 21 days ago • 30

upvoted a paper 22 days ago

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published 22 days ago • 23

liked a dataset 22 days ago

CAiRE/ASCEND

Viewer • Updated Jul 16, 2024 • 12.3k • 1.2k • 33

upvoted an article 23 days ago

Article

Recipe: Preparing Multilingual Speech Datasets for TTS Training

and 1 other •

Nov 4, 2024

• 18

upvoted a paper about 2 months ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10 • 47

liked a model about 2 months ago

deepseek-ai/DeepSeek-V3

Text Generation • Updated 5 days ago • 3.29M • • 3.57k

liked 2 models 3 months ago

nyrahealth/CrisperWhisper

Automatic Speech Recognition • Updated Dec 19, 2024 • 23.7k • • 235

kyutai/mimi

Feature Extraction • Updated Sep 18, 2024 • 181k • 106

liked a dataset 4 months ago

walkerhyf/NCSSD

Updated Nov 12, 2024 • 85 • 20

upvoted a paper 4 months ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 92

liked a model 5 months ago

SWivid/F5-TTS

Text-to-Speech • Updated Nov 8, 2024 • 875k • 922

updated a dataset 5 months ago

zhisheng01/SpatialAudio

Preview • Updated Oct 12, 2024 • 94 • 3

upvoted 2 papers 5 months ago

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9, 2024 • 43

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Paper • 2410.04364 • Published Oct 6, 2024 • 28