Rufy992
's Collections
Articoli PHD
updated
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
•
2501.00958
•
Published
•
91
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
45
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent
Diffusion Models
Paper
•
2501.01423
•
Published
•
34
REDUCIO! Generating 1024times1024 Video within 16 Seconds using
Extremely Compressed Motion Latents
Paper
•
2411.13552
•
Published
Generative Modeling with Explicit Memory
Paper
•
2412.08781
•
Published
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers
Up
Paper
•
2412.16112
•
Published
•
21
TinyFusion: Diffusion Transformers Learned Shallow
Paper
•
2412.01199
•
Published
•
14
Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Paper
•
2412.12391
•
Published
ASGDiffusion: Parallel High-Resolution Generation with Asynchronous
Structure Guidance
Paper
•
2412.06163
•
Published
On the Surprising Effectiveness of Attention Transfer for Vision
Transformers
Paper
•
2411.09702
•
Published
•
1
Four-Plane Factorized Video Autoencoders
Paper
•
2412.04452
•
Published
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Paper
•
2412.10958
•
Published
Nested Diffusion Models Using Hierarchical Latent Priors
Paper
•
2412.05984
•
Published
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
Paper
•
2411.06786
•
Published
FlexDiT: Dynamic Token Density Control for Diffusion Transformer
Paper
•
2412.06028
•
Published
Paper
•
2412.08905
•
Published
•
101
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
•
2411.15124
•
Published
•
58
Training and Evaluating Language Models with Template-based Data
Generation
Paper
•
2411.18104
•
Published
•
3
Paper
•
2411.05281
•
Published
•
1
ALMA: Alignment with Minimal Annotation
Paper
•
2412.04305
•
Published
Training Data for Large Language Model
Paper
•
2411.07715
•
Published
TransformLLM: Adapting Large Language Models via LLM-Transformed Reading
Comprehension Text
Paper
•
2410.21479
•
Published
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
•
2402.14289
•
Published
•
19
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
90
TinyLLM: Learning a Small Student from Multiple Large Language Models
Paper
•
2402.04616
•
Published
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Paper
•
2410.07062
•
Published
•
3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
Paper
•
2408.15881
•
Published
•
21
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper
•
2404.04167
•
Published
•
12
Rethinking Optimization and Architecture for Tiny Language Models
Paper
•
2402.02791
•
Published
•
12
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
•
2312.16862
•
Published
•
30
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
•
2501.01264
•
Published
•
24
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong
Prompt Optimizers
Paper
•
2412.09722
•
Published
•
5
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Paper
•
2412.09078
•
Published
AlphaVerus: Bootstrapping Formally Verified Code Generation through
Self-Improving Translation and Treefinement
Paper
•
2412.06176
•
Published
MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models
with a Monte Carlo Nash Equilibrium Self-Refine Tree
Paper
•
2411.15645
•
Published
PerfCodeGen: Improving Performance of LLM Generated Code with Execution
Feedback
Paper
•
2412.03578
•
Published
Enhancing LLM Reasoning via Critique Models with Test-Time and
Training-Time Supervision
Paper
•
2411.16579
•
Published
•
2
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
•
2412.13663
•
Published
•
121
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Framework
Paper
•
2308.08155
•
Published
•
3
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
•
2501.01904
•
Published
•
27
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Paper
•
2501.01957
•
Published
•
32
SDPO: Segment-Level Direct Preference Optimization for Social Agents
Paper
•
2501.01821
•
Published
•
17
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning
for Image and Video Generation
Paper
•
2412.21059
•
Published
•
17
Graph Generative Pre-trained Transformer
Paper
•
2501.01073
•
Published
•
15
LUSIFER: Language Universal Space Integration for Enhanced Multilingual
Embeddings with Large Language Models
Paper
•
2501.00874
•
Published
•
10
BoxingGym: Benchmarking Progress in Automated Experimental Design and
Model Discovery
Paper
•
2501.01540
•
Published
•
5
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity,
Bias and Propensity for Hallucinations
Paper
•
2404.09785
•
Published
Gemma 2: Improving Open Language Models at a Practical Size
Paper
•
2408.00118
•
Published
•
76
Dispider: Enabling Video LLMs with Active Real-Time Interaction via
Disentangled Perception, Decision, and Reaction
Paper
•
2501.03218
•
Published
•
28
BoostStep: Boosting mathematical capability of Large Language Models via
improved single-step reasoning
Paper
•
2501.03226
•
Published
•
33
Test-time Computing: from System-1 Thinking to System-2 Thinking
Paper
•
2501.02497
•
Published
•
31
Personalized Graph-Based Retrieval for Large Language Models
Paper
•
2501.02157
•
Published
•
24
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large
Language Models
Paper
•
2501.01830
•
Published
•
14
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models
in Multi-Hop Tool Use
Paper
•
2501.02506
•
Published
•
9
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
Paper
•
2403.04801
•
Published
Battle of the Large Language Models: Dolly vs LLaMA vs Vicuna vs Guanaco
vs Bard vs ChatGPT -- A Text-to-SQL Parsing Comparison
Paper
•
2310.10190
•
Published
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
•
2404.06395
•
Published
•
22
LLM Teacher-Student Framework for Text Classification With No Manually
Annotated Data: A Case Study in IPTC News Topic Classification
Paper
•
2411.19638
•
Published
•
6
Performance-Guided LLM Knowledge Distillation for Efficient Text
Classification at Scale
Paper
•
2411.05045
•
Published
Selecting Between BERT and GPT for Text Classification in Political
Science Research
Paper
•
2411.05050
•
Published
Improving Bilingual Capabilities of Language Models to Support Diverse
Linguistic Practices in Education
Paper
•
2411.04308
•
Published
CoCoP: Enhancing Text Classification with LLM through Code Completion
Prompt
Paper
•
2411.08979
•
Published
Introducing Super RAGs in Mistral 8x7B-v1
Paper
•
2404.08940
•
Published
•
2
OpenDevin: An Open Platform for AI Software Developers as Generalist
Agents
Paper
•
2407.16741
•
Published
•
69
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper
•
2501.05441
•
Published
•
45
On Computational Limits and Provably Efficient Criteria of Visual
Autoregressive Models: A Fine-Grained Complexity Analysis
Paper
•
2501.04377
•
Published
•
6
Are VLMs Ready for Autonomous Driving? An Empirical Study from the
Reliability, Data, and Metric Perspectives
Paper
•
2501.04003
•
Published
•
16
Entropy-Guided Attention for Private LLMs
Paper
•
2501.03489
•
Published
•
9
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
176
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
64
Agent Laboratory: Using LLM Agents as Research Assistants
Paper
•
2501.04227
•
Published
•
62
URSA: Understanding and Verifying Chain-of-thought Reasoning in
Multimodal Mathematics
Paper
•
2501.04686
•
Published
•
42
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
44
LLM4SR: A Survey on Large Language Models for Scientific Research
Paper
•
2501.04306
•
Published
•
26
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning
and Reflection
Paper
•
2501.04575
•
Published
•
20
GeAR: Generation Augmented Retrieval
Paper
•
2501.02772
•
Published
•
16
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper
•
2501.04652
•
Published
•
7
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich
Paradigm for Direct Preference Optimization
Paper
•
2501.03271
•
Published
•
8