Collections
Discover the best community collections!
Collections including paper arxiv:2408.00714
-
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Paper • 2407.10960 • Published • 10 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 24 -
EVLM: An Efficient Vision-Language Model for Visual Understanding
Paper • 2407.14177 • Published • 42 -
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Paper • 2407.15017 • Published • 33
-
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
Paper • 2407.06027 • Published • 8 -
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper • 2407.09025 • Published • 122 -
Toto: Time Series Optimized Transformer for Observability
Paper • 2407.07874 • Published • 29 -
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Paper • 2407.09413 • Published • 9
-
Depth Anything V2
Paper • 2406.09414 • Published • 91 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 50 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 34 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 103
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 13 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 4
-
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 20 -
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper • 2403.18816 • Published • 20 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 9 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 77
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 123 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 49 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 12 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 64
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 7 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 25 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 32 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 26
-
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
RTSeg: Real-time Semantic Segmentation Comparative Study
Paper • 1803.02758 • Published • 2
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 6 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2