Florent Daudens's picture

Florent Daudens

fdaudens

·

AI & ML interests

AI & Journalism

Recent Activity

liked a Space about 9 hours ago

Wan-AI/Wan2.1

updated a Space about 16 hours ago

JournalistsonHF/ai-toolkit

liked a Space about 16 hours ago

microsoft/PhineSpeechTranslator

View all activity

Organizations

Posts 119

Post

2078

Is this the best tool to extract clean info from PDFs, handwriting and complex documents yet?

Open source olmOCR just dropped and the results are impressive.

Tested the free demo with various documents, including a handwritten Claes Oldenburg letter. The speed is impressive: 3000 tokens/second on your own GPU - that's 1/32 the cost of GPT-4o ($190/million pages). Game-changer for content extraction and digital archives.

To achieve this, Ai2 trained a 7B vision language model on 260K pages from 100K PDFs using "document anchoring" - combining PDF metadata with page images.

Best part: it actually understands document structure (columns, tables, equations) instead of just jumbling everything together like most OCR tools. Their human eval results back this up.

👉 Try the demo: https://olmocr.allenai.org

Going right into the AI toolkit: JournalistsonHF/ai-toolkit

Articles 2

Article

4

Bringing Open-Source Models to Spreadsheets 🚀

View all Articles

Collections 1

spaces 11

Hf Blog Tags Classification

Explore and manage text data annotations

First Agent Template

Fetch and summarize news articles on any topic

Deepseek Download Stats

DeepSeek download stats

Meta Llama 3 Download Stats

Meta Llama 3 download stats

Nieman Lab 2025 Predictions Visualization

Mapping Nieman Lab's 2025 Journalism Predictions

Model Drops Tracker

Find recent high-liked Hugging Face models

models 3

fdaudens/ModernBERT-domain-classifier

Text Classification • Updated Jan 11 • 20

fdaudens/ModernBERT-hf-posts-classifier

Text Classification • Updated Jan 10 • 34

fdaudens/SmolLM2-FT-MyDataset

Text Generation • Updated Dec 12, 2024 • 52 • 1

datasets 13

fdaudens/blog_posts_classified

Viewer • Updated Jan 19 • 507 • 117

fdaudens/my-distiset-9c84f049

Viewer • Updated Jan 10 • 450 • 98

fdaudens/us-presidential-elections-with-electoral-college

Viewer • Updated Oct 26, 2024 • 4.29k • 128 • 1

fdaudens/us-presidential-elections

Viewer • Updated Sep 26, 2024 • 4.29k • 133

fdaudens/hf-blog-posts-dpo_raw

Viewer • Updated May 28, 2024 • 232 • 267 • 2

fdaudens/hf-blog-posts-split

Viewer • Updated May 28, 2024 • 287 • 500 • 1

fdaudens/aya_french_dpo

Viewer • Updated May 20, 2024 • 1.01k • 64 • 1

fdaudens/aya_french_dpo_raw

Viewer • Updated May 20, 2024 • 1.26k • 576 • 2

fdaudens/aya_dataset_french_example

Viewer • Updated May 15, 2024 • 1.42k • 103

fdaudens/hf-blog-posts

Viewer • Updated May 15, 2024 • 381 • 121 • 3