Foundation Models and Tools
- Paper • 2402.10986 • Published • 76
bigcode/starcoder2-15b
Text Generation • Updated • 23.7k • • 568
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 121Note Zephyr is by far the best aligned open-sourced LLM I've used. They recently have a -beta and a -gamma (fine-tuned out of Gemma) version too.
mixedbread-ai/mxbai-rerank-large-v1
Text Classification • Updated • 25.1k • 105
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62Note "We clean data"
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
Paper • 2402.18334 • Published • 12
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 45Note Trade more computation with less memory. It's much like if you do not want to remember all the corollary from a math class, you'd then have to derive everything from the 3 axioms.
Large language models surpass human experts in predicting neuroscience results
Paper • 2403.03230 • Published • 4Note Perplexity score used to decide which abstract makes more sense, given all the previous works on the field of neuroscience, beating expert's annotation. Advice to expert: pay attention when you annotate, otherwise you might lose your job (!)
Equall/Saul-7B-Instruct-v1
Text Generation • Updated • 2.86k • 76Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Paper • 2403.04696 • Published • 4- Paused71♾️
AutoMerger
Representation Engineering: A Top-Down Approach to AI Transparency
Paper • 2310.01405 • Published • 5Note Fixed control vector gets added into each layer's output, convenient package here: https://github.com/vgel/repeng quite easy to use, and allow linear control along user-defined semantic dimension
Editing Conceptual Knowledge for Large Language Models
Paper • 2403.06259 • Published • 1Learning to Edit: Aligning LLMs with Knowledge Editing
Paper • 2402.11905 • Published • 1Knowledge Editing on Black-box Large Language Models
Paper • 2402.08631 • Published • 3In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Paper • 2311.06668 • Published • 5Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries
Paper • 2402.13043 • Published • 2
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 72Note Revolutionise next-token prediction based pre-training to enhance reasoning. Routing through multiple rationales for next-k-token prediction, combined with RL-based survival of the fittest rationales, achieve significant improvement in reasoning at the cost of huge increase in training cost. Likely lags behind Q* due to the missing of adaptive control of thought process.
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 54Note Arsenal for training VLM-based Front-end designer.
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 93Note Stanford's drop-in replacement model for automating front-end design. Image (or a sketch) of the target website in, front-end code out.
SALT-NLP/Design2Code-18B-v0
Updated • 34Note This LLM does Front-End engineering for you at the cost of your electricity.
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 124Note Apple's own VLM.
vikhyatk/moondream2
Image-Text-to-Text • Updated • 196k • 690Note Very performant small VLM. It appears an extra vision encoder might just do the trick?
prometheus-eval/prometheus-13b-v1.0
Text2Text Generation • Updated • 3.91k • 126Note Fine-tuned LLM for acting as LLM-as-a-Judge
Gorilla: Large Language Model Connected with Massive APIs
Paper • 2305.15334 • Published • 4Note How to train a GPT-4 level function calling LLM from UC Berkeley
gorilla-llm/gorilla-openfunctions-v2
Text Generation • Updated • 11.7k • 209Note GPT-4 level FunctionCalling LLM from UC Berkeley
Learning to Compress Prompt in Natural Language Formats
Paper • 2402.18700 • Published • 2Note Soft Prompt Compression from Samsung
MemGPT: Towards LLMs as Operating Systems
Paper • 2310.08560 • Published • 7Note LLM OS with MemGPT from UC Berkeley
Get an A in Math: Progressive Rectification Prompting
Paper • 2312.06867 • Published • 2Qwen/Qwen-VL
Text Generation • Updated • 69.3k • 211Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Paper • 2110.03742 • Published • 3GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Paper • 2006.16668 • Published • 3parler-tts/parler_tts_mini_v0.1
Text-to-Speech • Updated • 25.3k • 346instruction-pretrain/instruction-synthesizer
Text Generation • Updated • 289 • 72jinaai/jina-embeddings-v2-base-en
Feature Extraction • Updated • 105k • 693EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 8