KW's picture

KW

kevineen

·

AI & ML interests

None yet

Recent Activity

liked a dataset about 12 hours ago

leonardPKU/clevr_cogen_a_train

upvoted an article 2 days ago

State of open video generation models in Diffusers

liked a model 2 days ago

Emanon14/LoRA

View all activity

Organizations

kevineen's activity

upvoted an article 2 days ago

Article

State of open video generation models in Diffusers

9 days ago

• 28

upvoted 2 articles 7 days ago

Article

Welcome to Inference Providers on the Hub 🔥

8 days ago

• 232

Article

FineWeb2-C: Help Build Better Language Models in Your Language

By

•

Dec 23, 2024

• 18

upvoted 2 collections 12 days ago

Eagle 2

Eagle 2 is a family of frontier vision-language models with vision-centric design. The model supports 4K HD input, long-context video, and grounding. • 9 items • Updated 12 days ago • 29

INTELLECT-MATH

6 items • Updated 13 days ago • 1

upvoted a paper 13 days ago

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published 14 days ago • 48

upvoted a paper 18 days ago

CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

Paper • 2501.09433 • Published 19 days ago • 17

upvoted 3 papers 19 days ago

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Paper • 2501.09756 • Published 19 days ago • 19

RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published 20 days ago • 15

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Paper • 2501.00574 • Published Dec 31, 2024 • 5

upvoted a collection 26 days ago

TACO Models

This collection contains the best-performing TACO models based on LLaMA-3/Qwen2 and SigLIP/CLIP. • 3 items • Updated Dec 20, 2024 • 8

upvoted 2 papers 27 days ago

Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published 28 days ago • 84

TransPixar: Advancing Text-to-Video Generation with Transparency

Paper • 2501.03006 • Published 29 days ago • 23

upvoted a paper 29 days ago

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published Jan 3 • 42

upvoted 3 papers about 1 month ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 59

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 139

1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 74

upvoted a collection about 1 month ago

YuLan-Mini

A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details. • 6 items • Updated about 8 hours ago • 13

upvoted 2 papers about 1 month ago

Large Motion Video Autoencoding with Cross-modal Video VAE

Paper • 2412.17805 • Published Dec 23, 2024 • 24

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 60