Pierre-Carl Langlais

Pclanglais

AI & ML interests

Open data & open LLMs

Recent Activity

updated a dataset about 8 hours ago
Pclanglais/course-material
updated a dataset 1 day ago
PleIAs/post-ocr
published a dataset 1 day ago
PleIAs/post-ocr
View all activity

Organizations

AgentPublic's profile picture BigScience Data's profile picture Kheops SAS's profile picture Blog-explorers's profile picture OpenLLM France's profile picture ZeroGPU Explorers's profile picture INAGUA's profile picture PleIAs's profile picture :probabl.'s profile picture Social Post Explorers's profile picture LLM - Digital Humanities's profile picture

Pclanglais's activity

published an article 3 months ago
published an article 4 months ago
view article
Article

Releasing the largest multilingual open pretraining dataset

By Pclanglais and 2 others
100
published an article 7 months ago
view article
Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

29
published an article 8 months ago
view article
Article

Announcing Finance Commons and the Bad Data Toolbox: Pioneering Open Data and Advanced Document Processing

20
published an article 11 months ago
view article
Article

Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM

16
published an article 11 months ago
view article
Article

Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data

22
published an article 12 months ago
view article
Article

Releasing Common Corpus: the largest public domain dataset for training LLMs

21