Zhengyuan Yang's picture

4 17 4

Zhengyuan Yang

zyang39

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

authored a paper 21 days ago

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

authored a paper about 2 months ago

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

View all activity

Organizations

zyang39's activity

upvoted a paper 11 days ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 12 days ago • 282

upvoted 2 papers about 2 months ago

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published Dec 12, 2024 • 11

Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension

Paper • 2412.03704 • Published Dec 4, 2024 • 6

upvoted 2 papers 3 months ago

GenXD: Generating Any 3D and 4D Scenes

Paper • 2411.02319 • Published Nov 4, 2024 • 20

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Paper • 2410.23277 • Published Oct 30, 2024 • 9

upvoted a paper 6 months ago

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Paper • 2408.00765 • Published Aug 1, 2024 • 13

upvoted a paper 9 months ago

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 17

upvoted 2 papers about 1 year ago

Interfacing Foundation Models' Embeddings

Paper • 2312.07532 • Published Dec 12, 2023 • 11

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 13

upvoted 8 papers over 1 year ago

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Paper • 2310.19773 • Published Oct 30, 2023 • 20

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Paper • 2310.15144 • Published Oct 23, 2023 • 14

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Paper • 2310.11441 • Published Oct 17, 2023 • 26

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

Paper • 2310.07749 • Published Oct 11, 2023 • 5

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

Paper • 2310.08541 • Published Oct 12, 2023 • 17

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

Paper • 2308.02490 • Published Aug 4, 2023 • 17

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

Paper • 2307.00040 • Published Jun 30, 2023 • 25