MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published 10 days ago • 31
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems Paper • 2410.13716 • Published Oct 17, 2024
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track Paper • 2406.16828 • Published Jun 24, 2024
view post Post 3381 🦢 The SWIM-IR dataset contains 29 million text-retrieval training pairs across 27 diverse languages. It is one of the largest synthetic multilingual datasets generated using PaLM 2 on Wikipedia! 🔥🔥SWIM-IR dataset contains three subsets :- Cross-lingual:nthakur/swim-ir-cross-lingual- Monolingual: nthakur/swim-ir-monolingual- Indic Cross-lingual: nthakur/indic-swim-ir-cross-lingualCheck it out:https://huggingface.co./collections/nthakur/swim-ir-dataset-662ddaecfc20896bf14dd9b7 🔥 3 3 👀 1 1 🤯 1 1 + Reply
Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard Paper • 2306.07471 • Published Jun 13, 2023
NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation Paper • 2312.11361 • Published Dec 18, 2023 • 1
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution Paper • 2307.16883 • Published Jul 31, 2023
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks Paper • 2010.08240 • Published Oct 16, 2020
Evaluating Embedding APIs for Information Retrieval Paper • 2305.06300 • Published May 10, 2023 • 1
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval Paper • 2112.07577 • Published Dec 14, 2021
Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages Paper • 2210.09984 • Published Oct 18, 2022 • 2
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models Paper • 2104.08663 • Published Apr 17, 2021 • 3
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval Paper • 2311.05800 • Published Nov 10, 2023 • 3
income/cqadupstack-wordpress-top-20-gen-queries Viewer • Updated Jan 24, 2023 • 48.6k • 76 • 3