Steven Zheng's picture

Steven Zheng

Steveeeeeeen

·

AI & ML interests

speech & audio

Recent Activity

liked a model about 4 hours ago

microsoft/Phi-4-multimodal-instruct

updated a dataset about 11 hours ago

Steveeeeeeen/whisper-leaderboard-evals

upvoted an article 1 day ago

SigLIP 2: A better multilingual vision language encoder

View all activity

Organizations

Steveeeeeeen's activity

upvoted an article 1 day ago

Article

SigLIP 2: A better multilingual vision language encoder

8 days ago

• 113

upvoted an article 2 days ago

Article

Deploying Speech-to-Speech on Hugging Face

Oct 22, 2024

• 38

upvoted 2 collections 2 days ago

OWLS: Scaling Laws for Speech Recognition and Translation

🦉 A suite of Whisper-style models from 250M to 18B parameters. Trained on up to 360K hours of data. • 6 items • Updated 3 days ago • 3

Open Whisper-style Speech Models (OWSM)

Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/ • 15 items • Updated 22 days ago • 5

upvoted a paper 3 days ago

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 9 days ago • 56

upvoted a paper 8 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 9 days ago • 150

upvoted an article 8 days ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 143

upvoted a paper 8 days ago

Presumed Cultural Identity: How Names Shape LLM Responses

Paper • 2502.11995 • Published 11 days ago • 10

upvoted an article 10 days ago

Article

Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥

11 days ago

• 89

upvoted a collection 13 days ago

Feb 14 Releases 💌

23 items • Updated 14 days ago • 7

upvoted 3 articles 15 days ago

Article

1 Billion Classifications

16 days ago

• 39

Article

Efficient Controllable Generation for SDXL with T2I-Adapters

Sep 8, 2023

• 7

Article

Introduction to the Open Leaderboard for Japanese LLMs

Nov 20, 2024

• 35

upvoted an article 16 days ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

By

and 1 other •

17 days ago

• 25

upvoted 3 articles 18 days ago

Article

Open R1: Update #2

By

and 6 others •

18 days ago

• 191

Article

G2P Shrinks Speech Models

By

•

23 days ago

• 27

Article

The Open Arabic LLM Leaderboard 2

19 days ago

• 27

upvoted a collection 21 days ago

Feb 7 Releases 🧣

33 items • Updated 21 days ago • 5

upvoted an article 21 days ago

Article

The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...

By

•

Jan 20

• 62

upvoted an article 22 days ago

Article

How biased is Whisper ? Evaluating Whisper Models for Robustness to Diverse English Accents

By

•

30 days ago

• 16