AI Engineering
A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.
Paper • 2211.04325 • PublishedNote Start of Introduction to Building AI Applications with Foundation Models
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 17On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • PublishedSuper-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Paper • 2303.10130 • Published • 4CoEdIT: Text Editing by Task-Specific Instruction Tuning
Paper • 2305.09857 • Published • 7Generative Agents: Interactive Simulacra of Human Behavior
Paper • 2304.03442 • Published • 12Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Paper • 2305.14233 • Published • 6
Measuring Massive Multitask Language Understanding
Paper • 2009.03300 • Published • 3Note End of Introduction to Building AI Applications with Foundation Models
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 11Note Start of Understanding Foundation Models
Textbooks Are All You Need
Paper • 2306.11644 • Published • 142ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning
Paper • 2304.05613 • Published • 1Attention Is All You Need
Paper • 1706.03762 • Published • 55Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 244Deep Learning using Rectified Linear Units (ReLU)
Paper • 1803.08375 • PublishedGaussian Error Linear Units (GELUs)
Paper • 1606.08415 • PublishedThe Llama 3 Herd of Models
Paper • 2407.21783 • Published • 114Generative Adversarial Networks
Paper • 1406.2661 • Published • 4Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
Paper • 2110.13985 • PublishedEfficiently Modeling Long Sequences with Structured State Spaces
Paper • 2111.00396 • Published • 3Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Paper • 2212.14052 • Published • 1Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 108Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 6Flamingo: a Visual Language Model for Few-Shot Learning
Paper • 2204.14198 • Published • 15LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 14PaLM: Scaling Language Modeling with Pathways
Paper • 2204.02311 • Published • 2Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 13Discovering Language Model Behaviors with Model-Written Evaluations
Paper • 2212.09251 • Published • 1Inverse Scaling: When Bigger Isn't Better
Paper • 2306.09479 • Published • 9Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper • 2401.00448 • Published • 29Emergent Abilities of Large Language Models
Paper • 2206.07682 • Published • 3Model Dementia: Generated Data Makes Models Forget
Paper • 2305.17493 • Published • 4Consent in Crisis: The Rapid Decline of the AI Data Commons
Paper • 2407.14933 • Published • 12Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53OpenAssistant Conversations -- Democratizing Large Language Model Alignment
Paper • 2304.07327 • Published • 6Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Paper • 2112.11446 • Published • 1Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Paper • 2403.04132 • Published • 38Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 17Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 57Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4Detecting Hallucinated Content in Conditional Neural Sequence Generation
Paper • 2011.02593 • PublishedShaking the foundations: delusions in sequence models for interaction and control
Paper • 2110.10819 • Published
How Language Model Hallucinations Can Snowball
Paper • 2305.13534 • Published • 3Note End of Understanding Foundation Models
Sharp seasonal threshold property for cooperative population dynamics with concave nonlinearities
Paper • 1804.07641 • PublishedNote Start of Evaluation Methodology
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Paper • 1905.00537 • Published • 2Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Paper • 2104.08773 • PublishedMMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Paper • 2406.01574 • Published • 46A Survey on Evaluation of Large Language Models
Paper • 2307.03109 • Published • 42Re-evaluating Evaluation
Paper • 1806.02643 • PublishedSeq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
Paper • 1709.00103 • Published • 1GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Paper • 2206.11249 • PublishedEvaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8MTEB: Massive Text Embedding Benchmark
Paper • 2210.07316 • Published • 6Meta-Prod2Vec - Product Embeddings Using Side-Information for Recommendation
Paper • 1607.07326 • PublishedLearning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 11ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
Paper • 2212.05171 • PublishedImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 5Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 33Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Paper • 2404.04475 • PublishedStyle Over Substance: Evaluation Biases for Large Language Models
Paper • 2307.03025 • PublishedHow Do Data Science Workers Communicate Intermediate Results?
Paper • 2210.03305 • PublishedCRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8Can Large Language Models Really Improve by Self-critiquing Their Own Plans?
Paper • 2310.08118 • Published • 1Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Paper • 2311.06720 • Published • 9BLEURT: Learning Robust Metrics for Text Generation
Paper • 2004.04696 • PublishedPrometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 54PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Paper • 2306.05087 • Published • 6JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 34A General Language Assistant as a Laboratory for Alignment
Paper • 2112.00861 • Published • 2
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
Paper • 2311.17295 • PublishedNote End of Evaluation Methodology
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Paper • 2304.06364 • Published • 2Note Start of Evaluate AI Systems
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Paper • 2402.01781 • Published • 2Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods
Paper • 2203.05227 • PublishedChatGPT as a Factual Inconsistency Evaluator for Text Summarization
Paper • 2303.15621 • PublishedSelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Paper • 2303.08896 • Published • 4Long-form factuality in large language models
Paper • 2403.18802 • Published • 25Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Paper • 2312.06674 • Published • 8From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
Paper • 2305.08283 • PublishedThe political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation
Paper • 2301.01768 • PublishedFrom Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection
Paper • 2102.12162 • PublishedInstruction-Following Evaluation for Large Language Models
Paper • 2311.07911 • Published • 20LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Paper • 2309.11998 • Published • 25RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
Paper • 2310.00746 • Published • 1CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
Paper • 2401.01275 • Published • 1Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Paper • 1803.05457 • Published • 2HellaSwag: Can a Machine Really Finish Your Sentence?
Paper • 1905.07830 • Published • 4TruthfulQA: Measuring How Models Mimic Human Falsehoods
Paper • 2109.07958 • Published • 1WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Paper • 1907.10641 • Published • 1Measuring Mathematical Problem Solving With the MATH Dataset
Paper • 2103.03874 • Published • 5What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Paper • 2009.13081 • PublishedThe NarrativeQA Reading Comprehension Challenge
Paper • 1712.07040 • PublishedCan a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Paper • 1809.02789 • PublishedGPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 31MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning
Paper • 2310.16049 • Published • 4Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Paper • 2206.04615 • Published • 5How is ChatGPT's behavior changing over time?
Paper • 2307.09009 • Published • 24Holistic Evaluation of Language Models
Paper • 2211.09110 • Published
Pretraining on the Test Set Is All You Need
Paper • 2309.08632 • Published • 2Note End of Evaluate AI Systems
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Paper • 2404.13208 • Published • 39Note Start of Prompt Engineering
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 40RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 35Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 11OpenPrompt: An Open-source Framework for Prompt-learning
Paper • 2111.01998 • Published • 1DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Paper • 2310.03714 • Published • 34Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Paper • 2309.16797 • PublishedTextGrad: Automatic "Differentiation" via Text
Paper • 2406.07496 • Published • 30Universal and Transferable Adversarial Attacks on Aligned Language Models
Paper • 2307.15043 • Published • 2Jailbreaking Black Box Large Language Models in Twenty Queries
Paper • 2310.08419 • PublishedNot what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Paper • 2302.12173 • PublishedHow do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Paper • 2402.13208 • PublishedGmail Smart Compose: Real-Time Assisted Writing
Paper • 1906.00080 • PublishedLanguage Models as Knowledge Bases?
Paper • 1909.01066 • PublishedExtracting Training Data from Large Language Models
Paper • 2012.07805 • Published • 1Are Large Pre-Trained Language Models Leaking Your Personal Information?
Paper • 2205.12628 • PublishedScalable Extraction of Training Data from (Production) Language Models
Paper • 2311.17035 • Published • 3Extracting Training Data from Diffusion Models
Paper • 2301.13188 • Published • 2
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Paper • 2306.04528 • Published • 3Note End of Prompt Engineering
Reading Wikipedia to Answer Open-Domain Questions
Paper • 1704.00051 • PublishedNote Start of RAG and Agents
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 10BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Paper • 2104.08663 • Published • 3SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
Paper • 2405.15793 • Published • 5Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Paper • 2304.09842 • Published • 1Reasoning with Language Model is Planning with World Model
Paper • 2305.14992 • Published • 3ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 22HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Paper • 1809.09600 • Published • 3Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 5Toolformer: Language Models Can Teach Themselves to Use Tools
Paper • 2302.04761 • Published • 11Gorilla: Large Language Model Connected with Massive APIs
Paper • 2305.15334 • Published • 5Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper • 2305.16291 • Published • 9Keep Me Updated! Memory Management in Long-term Conversations
Paper • 2210.08750 • Published
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
Paper • 2311.08719 • PublishedNote End of RAG and Agents
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Paper • 1611.04558 • PublishedNote Start of Finetuning
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 25Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7BloombergGPT: A Large Language Model for Finance
Paper • 2303.17564 • Published • 22Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Paper • 2305.05862 • Published • 4Reducing Activation Recomputation in Large Transformer Models
Paper • 2205.05198 • PublishedLLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 4QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 50BinaryConnect: Training Deep Neural Networks with binary weights during propagations
Paper • 1511.00363 • PublishedXNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Paper • 1603.05279 • PublishedBitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 97The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 610On Stochastic Shell Models of Turbulence
Paper • 1712.05887 • PublishedLLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Paper • 2305.17888 • Published • 1Parameter-Efficient Transfer Learning for NLP
Paper • 1902.00751 • Published • 1GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 35BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
Paper • 2106.10199 • PublishedLongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 88Prefix-Tuning: Optimizing Continuous Prompts for Generation
Paper • 2101.00190 • Published • 6GPT Understands, Too
Paper • 2103.10385 • Published • 9The Power of Scale for Parameter-Efficient Prompt Tuning
Paper • 2104.08691 • Published • 10Measuring the Intrinsic Dimension of Objective Landscapes
Paper • 1804.08838 • PublishedIntrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Paper • 2012.13255 • Published • 3SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Paper • 1602.07360 • Published • 1Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
Paper • 2307.05695 • Published • 23GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 185Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity
Paper • 2305.08252 • PublishedConvolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
Paper • 2401.17868 • PublishedConvLoRA and AdaBN based Domain Adaptation via Self-Training
Paper • 2402.04964 • PublishedA Note on LoRA
Paper • 2404.05086 • PublishedQA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Paper • 2402.05445 • PublishedOvercoming catastrophic forgetting in neural networks
Paper • 1612.00796 • Published • 1Communication-Efficient Learning of Deep Networks from Decentralized Data
Paper • 1602.05629 • PublishedModel soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6Editing Models with Task Arithmetic
Paper • 2212.04089 • Published • 6Model Fusion via Optimal Transport
Paper • 1910.05653 • Published • 1Git Re-Basin: Merging Models modulo Permutation Symmetries
Paper • 2209.04836 • Published • 1Merging by Matching Models in Task Subspaces
Paper • 2312.04339 • Published • 2Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 14Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 29Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 56
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 58Note End of Finetuning
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 50Note Start of Dataset Engineering
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 23Nemotron-4 340B Technical Report
Paper • 2406.11704 • PublishedThe Data Addition Dilemma
Paper • 2408.04154 • PublishedOrthogonal Fold & Cut
Paper • 2202.01293 • PublishedTrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Paper • 2305.11171 • Published • 2One pixel attack for fooling deep neural networks
Paper • 1710.08864 • PublishedMaxout Networks
Paper • 1302.4389 • PublishedDeepFool: a simple and accurate method to fool deep neural networks
Paper • 1511.04599 • PublishedBenchmarking Neural Network Robustness to Common Corruptions and Perturbations
Paper • 1903.12261 • PublishedCARLA: An Open Urban Driving Simulator
Paper • 1711.03938 • PublishedStableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Paper • 2403.07714 • Published • 1MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18Self-Instruct: Aligning Language Model with Self Generated Instructions
Paper • 2212.10560 • Published • 9LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction
Paper • 2304.08460 • Published • 3Self-Alignment with Instruction Backtranslation
Paper • 2308.06259 • Published • 42TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design
Paper • 2309.05447 • Published • 1The False Promise of Imitating Proprietary LLMs
Paper • 2305.15717 • Published • 5Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
Paper • 2404.01413 • PublishedOn the Stability of Iterative Retraining of Generative Models on their own Data
Paper • 2310.00429 • PublishedA Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper • 2402.07043 • Published • 15Common 7B Language Models Already Possess Strong Math Capabilities
Paper • 2403.04706 • Published • 18Distilling the Knowledge in a Neural Network
Paper • 1503.02531 • PublishedDistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14Instruction Tuning with GPT-4
Paper • 2304.03277 • PublishedDeduplicating Training Data Makes Language Models Better
Paper • 2107.06499 • Published • 4D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Paper • 2308.12284 • PublishedScaling Laws and Interpretability of Learning from Repeated Data
Paper • 2205.10487 • Published • 1
Annotation Sensitivity: Training Data Collection Methods Affect Model Performance
Paper • 2311.14212 • PublishedNote End of Dataset Engineering
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Paper • 1802.04799 • PublishedNote Start of Inference Optimization
Rethinking the Value of Network Pruning
Paper • 1810.05270 • PublishedTo prune, or not to prune: exploring the efficacy of pruning for model compression
Paper • 1710.01878 • Published • 1Blockwise Parallel Decoding for Deep Autoregressive Models
Paper • 1811.03115 • Published • 2Accelerating Large Language Model Decoding with Speculative Sampling
Paper • 2302.01318 • Published • 2Fast Inference from Transformers via Speculative Decoding
Paper • 2211.17192 • Published • 5Inference with Reference: Lossless Acceleration of Large Language Models
Paper • 2304.04487 • PublishedRevisiting fixed-point quantum search: proof of the quasi-Chebyshev lemma
Paper • 2403.02057 • PublishedMedusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 55Efficiently Scaling Transformer Inference
Paper • 2211.05102 • Published • 2Longformer: The Long-Document Transformer
Paper • 2004.05150 • Published • 3Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 32Fast Transformer Decoding: One Write-Head is All You Need
Paper • 1911.02150 • Published • 6GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
Paper • 2403.05527 • Published • 1Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Paper • 2310.01801 • Published • 3FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Paper • 2407.08608 • Published • 1DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Paper • 2401.09670 • Published • 1
Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads
Paper • 2401.11181 • PublishedNote End of Inference Optimization
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Paper • 2404.12272 • Published • 1Note Start of AI Engineering Architecture and User Feedback
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
Paper • 1902.07742 • PublishedUsing Natural Language for Reward Shaping in Reinforcement Learning
Paper • 1903.02020 • PublishedInverse Reinforcement Learning with Natural Language Goals
Paper • 2008.06924 • PublishedLearning Rewards from Linguistic Feedback
Paper • 2009.14715 • PublishedFeedback-Based Self-Learning in Large-Scale Conversational AI Agents
Paper • 1911.02557 • PublishedA scalable framework for learning from implicit user feedback to improve natural language understanding in large-scale conversational AI systems
Paper • 2010.12251 • PublishedLearning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
Paper • 2208.03270 • PublishedSystem-Level Natural Language Feedback
Paper • 2306.13588 • Published • 10Towards Understanding Sycophancy in Language Models
Paper • 2310.13548 • Published • 5