alkinun's picture

alkinun

AtAndDev

·

AI & ML interests

LLMs, Alignment, Merging, Unsloth, DPO, SFT, ORPO, SPIN..

Recent Activity

reacted to singhsidhukuldeep's post with 👀 about 22 hours ago

I just came across a groundbreaking paper titled "Hypencoder: Hypernetworks for Information Retrieval" by researchers from the University of Massachusetts Amherst that introduces a fundamentally new paradigm for search technology. Most current retrieval models rely on simple inner product calculations between query and document vectors, which severely limits their expressiveness. The authors prove theoretically that inner product similarity functions fundamentally constrain what types of relevance relationships can be captured. Hypencoder takes a radically different approach: instead of encoding a query as a vector, it generates a small neural network (called a "q-net") that acts as a learned relevance function. This neural network takes document representations as input and produces relevance scores. Under the hood, Hypencoder uses: - Attention-based hypernetwork layers (hyperhead layers) that transform contextualized query embeddings into weights and biases for the q-net - A document encoder that produces vector representations similar to existing models - A graph-based greedy search algorithm for efficient retrieval that can search 8.8M documents in under 60ms The results are impressive - Hypencoder significantly outperforms strong dense retrieval models on standard benchmarks like MS MARCO and TREC Deep Learning Track. The performance gap widens even further on complex retrieval tasks like tip-of-the-tongue queries and instruction-following retrieval. What makes this approach particularly powerful is that neural networks are universal approximators, allowing Hypencoder to express far more complex relevance relationships than inner product similarity functions. The framework is also flexible enough to replicate any existing neural retrieval method while adding the ability to learn query-dependent weights.

liked a Space 5 days ago

mteb/leaderboard

replied to their post 13 days ago

@nroggendorff is that you sama?

View all activity

Organizations

AtAndDev's activity

upvoted 2 collections about 1 month ago

DeepSeek-R1

8 items • Updated Jan 21 • 545

Qwen2.5-Math

Math-specific model series based on Qwen2.5 • 11 items • Updated Jan 14 • 75

upvoted a paper about 1 month ago

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper • 2501.11425 • Published Jan 20 • 92

upvoted a collection about 1 month ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 2 days ago • 535

upvoted an article about 1 month ago

Article

Welcome Gemma 2 - Google's new open LLM

Jun 27, 2024

• 128

upvoted a paper about 1 month ago

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Paper • 2501.09751 • Published Jan 16 • 47

upvoted a collection about 1 month ago

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 287

upvoted a paper about 1 month ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 92

upvoted an article about 1 month ago

Article

Gradio spaces are the perfect agent tools\!

By

•

Jan 17

• 14

upvoted 4 papers about 1 month ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 273

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 60

O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning

Paper • 2501.06458 • Published Jan 11 • 29

Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9 • 53

upvoted an article about 1 month ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

By

•

Jan 15

• 41

upvoted 2 papers about 1 month ago

Generative AI for Cel-Animation: A Survey

Paper • 2501.06250 • Published Jan 8 • 13

TransPixar: Advancing Text-to-Video Generation with Transparency

Paper • 2501.03006 • Published Jan 6 • 23

upvoted a paper about 2 months ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 84

upvoted an article about 2 months ago

Article

Symbiotic Intelligence

By

•

Nov 19, 2024

• 3

upvoted 2 papers about 2 months ago

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 16

1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 78