mohammedbriman
's Collections
To read... eventually
updated
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
124
Evolutionary Optimization of Model Merging Recipes
Paper
•
2403.13187
•
Published
•
50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper
•
2402.03766
•
Published
•
12
LLM Agent Operating System
Paper
•
2403.16971
•
Published
•
65
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
•
2403.19887
•
Published
•
104
ReALM: Reference Resolution As Language Modeling
Paper
•
2403.20329
•
Published
•
21
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision
Language Models
Paper
•
2403.20331
•
Published
•
14
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
103
TransformerFAM: Feedback attention is working memory
Paper
•
2404.09173
•
Published
•
43
FABLES: Evaluating faithfulness and content selection in book-length
summarization
Paper
•
2404.01261
•
Published
•
3
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
Checkpoints
Paper
•
2305.13245
•
Published
•
5
Video as the New Language for Real-World Decision Making
Paper
•
2402.17139
•
Published
•
18
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
•
2404.14619
•
Published
•
124
Pegasus-v1 Technical Report
Paper
•
2404.14687
•
Published
•
30
Transformers Can Represent n-gram Language Models
Paper
•
2404.14994
•
Published
•
18
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
Cores
Paper
•
2311.05908
•
Published
•
12
Question Aware Vision Transformer for Multimodal Reasoning
Paper
•
2402.05472
•
Published
•
8
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge
Bases
Paper
•
2404.13207
•
Published
Make Your LLM Fully Utilize the Context
Paper
•
2404.16811
•
Published
•
52
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
•
2405.00732
•
Published
•
118
Prometheus 2: An Open Source Language Model Specialized in Evaluating
Other Language Models
Paper
•
2405.01535
•
Published
•
116
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper
•
2405.05254
•
Published
•
9
SUTRA: Scalable Multilingual Language Model Architecture
Paper
•
2405.06694
•
Published
•
37
What matters when building vision-language models?
Paper
•
2405.02246
•
Published
•
98
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Paper
•
2405.09215
•
Published
•
18
LoRA Learns Less and Forgets Less
Paper
•
2405.09673
•
Published
•
87
Many-Shot In-Context Learning in Multimodal Foundation Models
Paper
•
2405.09798
•
Published
•
26
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
•
2405.12130
•
Published
•
45
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper
•
2405.12981
•
Published
•
28
Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
150
Aya 23: Open Weight Releases to Further Multilingual Progress
Paper
•
2405.15032
•
Published
•
26
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Paper
•
2404.19752
•
Published
•
22
An Introduction to Vision-Language Modeling
Paper
•
2405.17247
•
Published
•
85
Dense Connector for MLLMs
Paper
•
2405.13800
•
Published
•
21
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
•
2405.20204
•
Published
•
32
Similarity is Not All You Need: Endowing Retrieval Augmented Generation
with Multi Layered Thoughts
Paper
•
2405.19893
•
Published
•
29
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Paper
•
2403.11703
•
Published
•
16
Transformers are SSMs: Generalized Models and Efficient Algorithms
Through Structured State Space Duality
Paper
•
2405.21060
•
Published
•
63
xLSTM: Extended Long Short-Term Memory
Paper
•
2405.04517
•
Published
•
11
To Believe or Not to Believe Your LLM
Paper
•
2406.02543
•
Published
•
31
σ-GPTs: A New Approach to Autoregressive Models
Paper
•
2404.09562
•
Published
•
4
Scalable MatMul-free Language Modeling
Paper
•
2406.02528
•
Published
•
10
Transformers meet Neural Algorithmic Reasoners
Paper
•
2406.09308
•
Published
•
43
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context
Language Modeling
Paper
•
2406.07522
•
Published
•
37
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper
•
2406.08707
•
Published
•
15
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper
•
2406.15319
•
Published
•
61
Florence-2: Advancing a Unified Representation for a Variety of Vision
Tasks
Paper
•
2311.06242
•
Published
•
84
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Paper
•
2406.16235
•
Published
•
11
The FineWeb Datasets: Decanting the Web for the Finest Text Data at
Scale
Paper
•
2406.17557
•
Published
•
86
From Decoding to Meta-Generation: Inference-time Algorithms for Large
Language Models
Paper
•
2406.16838
•
Published
•
2
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
Supporting Long-Contextual Input and Output
Paper
•
2407.03320
•
Published
•
92
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Paper
•
2407.04620
•
Published
•
27
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation
Capabilities Beyond 100 Languages
Paper
•
2407.05975
•
Published
•
34
Lost in the Middle: How Language Models Use Long Contexts
Paper
•
2307.03172
•
Published
•
36
Unveiling Encoder-Free Vision-Language Models
Paper
•
2406.11832
•
Published
•
49
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive
Distillation
Paper
•
2407.07093
•
Published
•
1
Paper
•
2407.10671
•
Published
•
155
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore
Non-Determinism
Paper
•
2407.10457
•
Published
•
22
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper
•
2407.12772
•
Published
•
33
Qwen2-Audio Technical Report
Paper
•
2407.10759
•
Published
•
55
On the Limitations of Compute Thresholds as a Governance Strategy
Paper
•
2407.05694
•
Published
•
2
SAM 2: Segment Anything in Images and Videos
Paper
•
2408.00714
•
Published
•
107
Gemma 2: Improving Open Language Models at a Practical Size
Paper
•
2408.00118
•
Published
•
73
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
Medical SAM 2: Segment medical images as video via Segment Anything
Model 2
Paper
•
2408.00874
•
Published
•
41
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper
•
2408.04619
•
Published
•
154
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented
Generation
Paper
•
2408.02545
•
Published
•
33
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper
•
2408.10914
•
Published
•
40
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
•
2408.11796
•
Published
•
53
Controllable Text Generation for Large Language Models: A Survey
Paper
•
2408.12599
•
Published
•
62
Building and better understanding vision-language models: insights and
future directions
Paper
•
2408.12637
•
Published
•
116
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper
•
2408.15237
•
Published
•
36
Graph Retrieval-Augmented Generation: A Survey
Paper
•
2408.08921
•
Published
•
4
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and
Deduplication by Introducing a Competitive Large Language Model Baseline
Paper
•
2408.15079
•
Published
•
52
Writing in the Margins: Better Inference Pattern for Long Context
Retrieval
Paper
•
2408.14906
•
Published
•
138
Law of Vision Representation in MLLMs
Paper
•
2408.16357
•
Published
•
92
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio
Language Modeling
Paper
•
2408.16532
•
Published
•
46
OLMoE: Open Mixture-of-Experts Language Models
Paper
•
2409.02060
•
Published
•
77
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding
Benchmark
Paper
•
2409.02813
•
Published
•
28
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Paper
•
2409.05840
•
Published
•
45
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge
Discovery
Paper
•
2409.05591
•
Published
•
28
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via
Hybrid Architecture
Paper
•
2409.02889
•
Published
•
54
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector
Retrieval
Paper
•
2409.10516
•
Published
•
37
NVLM: Open Frontier-Class Multimodal LLMs
Paper
•
2409.11402
•
Published
•
71
One missing piece in Vision and Language: A Survey on Comics
Understanding
Paper
•
2409.09502
•
Published
•
23
A Controlled Study on Long Context Extension and Generalization in LLMs
Paper
•
2409.12181
•
Published
•
43
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
•
2409.12191
•
Published
•
73
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
•
2409.17146
•
Published
•
99
Automatic Metrics in Natural Language Generation: A Survey of Current
Evaluation Practices
Paper
•
2408.09169
•
Published
•
1
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey
on How to Make your LLMs use External Data More Wisely
Paper
•
2409.14924
•
Published
•
1
LLaVA-Critic: Learning to Evaluate Multimodal Models
Paper
•
2410.02712
•
Published
•
34
Emu3: Next-Token Prediction is All You Need
Paper
•
2409.18869
•
Published
•
89
Paper
•
2410.01201
•
Published
•
46
Aria: An Open Multimodal Native Mixture-of-Experts Model
Paper
•
2410.05993
•
Published
•
107
Paper
•
2410.07073
•
Published
•
59
Paper
•
2410.05258
•
Published
•
165
MLLM as Retriever: Interactively Learning Multimodal Retrieval for
Embodied Agents
Paper
•
2410.03450
•
Published
•
32
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in
Large Language Models
Paper
•
2410.05229
•
Published
•
17
Harnessing Webpage UIs for Text-Rich Visual Understanding
Paper
•
2410.13824
•
Published
•
29
nGPT: Normalized Transformer with Representation Learning on the
Hypersphere
Paper
•
2410.01131
•
Published
•
8
HelpSteer2-Preference: Complementing Ratings with Preferences
Paper
•
2410.01257
•
Published
•
19
Dense Transformer Networks
Paper
•
1705.08881
•
Published
A Survey of Small Language Models
Paper
•
2410.20011
•
Published
•
36
Paper
•
2410.21276
•
Published
•
76
Transformers are Multi-State RNNs
Paper
•
2401.06104
•
Published
•
35
Transformers are RNNs: Fast Autoregressive Transformers with Linear
Attention
Paper
•
2006.16236
•
Published
•
3
FlexAttention for Efficient High-Resolution Vision-Language Models
Paper
•
2407.20228
•
Published
•
1
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context
Generation with Speculative Decoding
Paper
•
2408.11049
•
Published
•
11
An Empirical Model of Large-Batch Training
Paper
•
1812.06162
•
Published
•
3
Measuring short-form factuality in large language models
Paper
•
2411.04368
•
Published
•
1