Foundation Models and Tools - a Temus Collection

Note Zephyr is by far the best aligned open-sourced LLM I've used. They recently have a -beta and a -gamma (fine-tuned out of Gemma) version too.

mixedbread-ai/mxbai-rerank-large-v1

Text Classification • Updated Jul 22 • 25.1k • 105

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Note "We clean data"

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

Paper • 2402.18334 • Published Feb 28 • 12

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 45

Note Trade more computation with less memory. It's much like if you do not want to remember all the corollary from a math class, you'd then have to derive everything from the 3 axioms.

Large language models surpass human experts in predicting neuroscience results

Paper • 2403.03230 • Published Mar 4 • 4

Note Perplexity score used to decide which abstract makes more sense, given all the previous works on the field of neuroscience, beating expert's annotation. Advice to expert: pay attention when you annotate, otherwise you might lose your job (!)

Equall/Saul-7B-Instruct-v1

Text Generation • Updated Mar 10 • 2.86k • 76

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

Paper • 2403.04696 • Published Mar 7 • 4

Paused

71

♾️

AutoMerger

Representation Engineering: A Top-Down Approach to AI Transparency

Paper • 2310.01405 • Published Oct 2, 2023 • 5

Note Fixed control vector gets added into each layer's output, convenient package here: https://github.com/vgel/repeng quite easy to use, and allow linear control along user-defined semantic dimension

Editing Conceptual Knowledge for Large Language Models

Paper • 2403.06259 • Published Mar 10 • 1

Learning to Edit: Aligning LLMs with Knowledge Editing

Paper • 2402.11905 • Published Feb 19 • 1

Knowledge Editing on Black-box Large Language Models

Paper • 2402.08631 • Published Feb 13 • 3

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

Paper • 2311.06668 • Published Nov 11, 2023 • 5

Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries

Paper • 2402.13043 • Published Feb 20 • 2

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 72

Note Revolutionise next-token prediction based pre-training to enhance reasoning. Routing through multiple rationales for next-k-token prediction, combined with RL-based survival of the fittest rationales, achieve significant improvement in reasoning at the cost of huge increase in training cost. Likely lags behind Q* due to the missing of adaptive control of thought process.