Data Governance in the Age of Large-Scale Data-Driven Language Technology Paper • 2206.03216 • Published May 4, 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 28
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper • 2303.03915 • Published Mar 7, 2023 • 6
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources Paper • 2201.10066 • Published Jan 25, 2022
A question-answering system for aircraft pilots' documentation Paper • 2011.13284 • Published Nov 26, 2020 • 1