3 16 25

Tony Zhao

tianchez

https://www.tianchez.com

AI & ML interests

Multimodal Agent, Generative AI

Recent Activity

authored a paper 3 days ago

SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

authored a paper 3 days ago

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

authored a paper 3 days ago

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

View all activity

Organizations

tianchez's activity

authored 11 papers 3 days ago

SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

Paper • 2009.13013 • Published Sep 28, 2020

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Paper • 2308.13177 • Published Aug 25, 2023

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Paper • 2403.06892 • Published Mar 11, 2024 • 1

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Paper • 2407.04923 • Published Jul 6, 2024 • 1

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Paper • 2406.16620 • Published Jun 24, 2024 • 2

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

Paper • 2411.16044 • Published Nov 25, 2024 • 1

OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

Paper • 2209.05946 • Published Sep 10, 2022 • 1

VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

Paper • 2207.00221 • Published Jul 1, 2022 • 1

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Paper • 2312.15043 • Published Dec 22, 2023 • 1

RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

Paper • 2306.11300 • Published Jun 20, 2023 • 1

GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent

Paper • 2412.18426 • Published Dec 24, 2024

upvoted a paper 4 days ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 5 days ago • 214

updated a collection 4 days ago

Multimodal Research

Collection

9 items • Updated 4 days ago • 1

upvoted a paper 4 days ago

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Paper • 2406.16620 • Published Jun 24, 2024 • 2

updated a Space 4 days ago

Running

🏆

README

updated a collection 15 days ago

Multimodal Research

Collection

9 items • Updated 4 days ago • 1

upvoted a paper 15 days ago

RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

Paper • 2306.11300 • Published Jun 20, 2023 • 1

liked a Space 16 days ago

Running

🥇

Open Agent Leaderboard

upvoted a paper about 1 month ago

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Paper • 2403.06892 • Published Mar 11, 2024 • 1

updated a collection about 1 month ago

Multimodal Research

Collection

9 items • Updated 4 days ago • 1