Michael Coppola

m18coppola

AI & ML interests

AI lobotomies, nlp

Recent Activity

upvoted a collection 18 days ago
Human-Like LLMs
upvoted a collection 3 months ago
OpenCoder
View all activity

Organizations

Social Post Explorers's profile picture

m18coppola's activity

reacted to Avelina's post with β€οΈπŸ˜” 9 months ago
view post
Post
1168
Found out my ECCV paper is getting rejected because of a LaTeX compile error :(
reacted to mrm8488's post with πŸ”₯ 9 months ago
view post
Post
5936
Working on a concept GPT-2 (small) that uses KANs instead of MLPs.
The ckpt and training code will be soon on the hub.
Β·
reacted to akhaliq's post with πŸ”₯ 10 months ago
view post
Post
3256
LLM2Vec

Large Language Models Are Secretly Powerful Text Encoders

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders (2404.05961)

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 3 popular LLMs ranging from 1.3B to 7B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB). Moreover, when combining LLM2Vec with supervised contrastive learning, we achieve state-of-the-art performance on MTEB among models that train only on publicly available data. Our strong empirical results and extensive analysis demonstrate that LLMs can be effectively transformed into universal text encoders in a parameter-efficient manner without the need for expensive adaptation or synthetic GPT-4 generated data.
reacted to alielfilali01's post with 🧠 10 months ago
view post
Post
2184
Honestly i don't understand how come we as the open source community haven't surpassed GPT-4 yet ? Like for me it looks like everything is out there just need to be exploited! Clearly specialized small models outperforms gpt4 on downstream tasks ! So why haven't we just trained a 1B-2B really strong general model and then continue pertained and/or finetuned it on datasets for downstream tasks like math, code...well structured as Textbooks format or other datasets formats that have been proven to be really efficient and good! Ounce you have 100 finetuned model, just wrap them all into a FrankenMoE and Voila ✨
And that's just what a NOOB like myself had in mind, I'm sure there is better, more efficient ways to do it ! So the question again, why we haven't yet ? I feel I'm missing something... Right?
Β·