Florent Daudens's picture

Florent Daudens

fdaudens

AI & ML interests

AI & Journalism

Recent Activity

Organizations

Hugging Face's profile picture Hugging Face OSS Metrics's profile picture Hugging Face TB Research's profile picture ZeroGPU Explorers's profile picture LeRobot's profile picture Journalists on Hugging Face's profile picture Major TOM's profile picture MLX Community's profile picture Social Post Explorers's profile picture Projet Spinoza's profile picture Dev Mode Explorers's profile picture Hugging Face for Legal's profile picture Hugging Face Discord Community's profile picture Big Science Social Impact Evaluation for Bias and Stereotypes's profile picture Dataset Tools's profile picture Hugging Face Science's profile picture Coordination Nationale pour l'IA's profile picture Data Is Better Together Contributor's profile picture Sandbox's profile picture Open R1's profile picture

fdaudens's activity

posted an update about 1 hour ago
view post
Post
46
What if AI becomes as ubiquitous as the internet, but runs locally and transparently on our devices?

Fascinating TED talk by @thomwolf on open source AI and its future impact.

Imagine this for AI: instead of black box models running in distant data centers, we get transparent AI that runs locally on our phones and laptops, often without needing internet access. If the original team moves on? No problem - resilience is one of the beauties of open source. Anyone (companies, collectives, or individuals) can adapt and fix these models.

This is a compelling vision of AI's future that solves many of today's concerns around AI transparency and centralized control.

Watch the full talk here: https://www.ted.com/talks/thomas_wolf_what_if_ai_just_works
posted an update 1 day ago
view post
Post
2366
Is this the best tool to extract clean info from PDFs, handwriting and complex documents yet?

Open source olmOCR just dropped and the results are impressive.

Tested the free demo with various documents, including a handwritten Claes Oldenburg letter. The speed is impressive: 3000 tokens/second on your own GPU - that's 1/32 the cost of GPT-4o ($190/million pages). Game-changer for content extraction and digital archives.

To achieve this, Ai2 trained a 7B vision language model on 260K pages from 100K PDFs using "document anchoring" - combining PDF metadata with page images.

Best part: it actually understands document structure (columns, tables, equations) instead of just jumbling everything together like most OCR tools. Their human eval results back this up.

๐Ÿ‘‰ Try the demo: https://olmocr.allenai.org

Going right into the AI toolkit: JournalistsonHF/ai-toolkit
  • 3 replies
ยท
posted an update 4 days ago
view post
Post
3159
๐Ÿš€ Just launched: A toolkit of 20 powerful AI tools that journalists can use right now - transcribe, analyze, create. 100% free & open-source.

Been testing all these tools myself and created a searchable collection of the most practical ones - from audio transcription to image generation to document analysis. No coding needed, no expensive subscriptions.

Some highlights I've tested personally:
- Private, on-device transcription with speaker ID in 100+ languages using Whisper
- Website scraping that just works - paste a URL, get structured data
- Local image editing with tools like Finegrain (impressive results)
- Document chat using Qwen 2.5 72B (handles technical papers well)

Sharing this early because the best tools come from the community. Drop your favorite tools in the comments or join the discussion on what to add next!

๐Ÿ‘‰ JournalistsonHF/ai-toolkit
posted an update 7 days ago
posted an update 10 days ago
view post
Post
5762
๐ŸŽฏ Perplexity drops their FIRST open-weight model on Hugging Face: A decensored DeepSeek-R1 with full reasoning capabilities. Tested on 1000+ examples for unbiased responses.

Check it out: perplexity-ai/r1-1776
Blog post: https://perplexity.ai/hub/blog/open-sourcing-r1-1776
  • 1 reply
ยท
reacted to clem's post with ๐Ÿ”ฅ 11 days ago
view post
Post
3411
We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.

Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.

Have you been using any integration and how can we make it better?

https://huggingface.co./blog/inference-providers
posted an update 12 days ago
view post
Post
2268
Will we soon all have our own personalized AI news agents? And what does it mean for journalism?

Just built a simple prototype based on the Hugging Face course. It lets you get customized news updates on any topic.

Not perfect yet, but you can see where things could go: we'll all be able to build personalized AI agents that curate & analyze news for each of us. And users who could decide to build custom news products for their needs, such as truly personalized newsletters or podcasts.

The implications for both readers & news organizations are significant. To name a few:
- Will news articles remain the best format for informing people?
- What monetization model will work for news organizations?
- How do you create an effective conversion funnel?

๐Ÿ‘‰ Try it here: fdaudens/my-news-agent (Code is open-source)
๐Ÿ‘‰ Check out the course: https://huggingface.co./learn/agents-course/unit0/introduction
reacted to davanstrien's post with ๐Ÿ‘ 13 days ago
reacted to merve's post with ๐Ÿ‘ 13 days ago
view post
Post
4639
Your weekly recap of open AI is here, and it's packed with models! merve/feb-14-releases-67af876b404cc27c6d837767

๐Ÿ‘€ Multimodal
> OpenGVLab released InternVideo 2.5 Chat models, new video LMs with long context
> AIDC released Ovis2 model family along with Ovis dataset, new vision LMs in different sizes (1B, 2B, 4B, 8B, 16B, 34B), with video and OCR support
> ColQwenStella-2b is a multilingual visual retrieval model that is sota in it's size
> Hoags-2B-Exp is a new multilingual vision LM with contextual reasoning, long context video understanding

๐Ÿ’ฌ LLMs
A lot of math models!
> Open-R1 team released OpenR1-Math-220k large scale math reasoning dataset, along with Qwen2.5-220K-Math fine-tuned on the dataset, OpenR1-Qwen-7B
> Nomic AI released new Nomic Embed multilingual retrieval model, a MoE with 500 params with 305M active params, outperforming other models
> DeepScaleR-1.5B-Preview is a new DeepSeek-R1-Distill fine-tune using distributed RL on math
> LIMO is a new fine-tune of Qwen2.5-32B-Instruct on Math

๐Ÿ—ฃ๏ธ Audio
> Zonos-v0.1 is a new family of speech recognition models, which contains the model itself and embeddings

๐Ÿ–ผ๏ธ Vision and Image Generation
> We have ported DepthPro of Apple to transformers for your convenience!
> illustrious-xl-v1.0 is a new illustration generation model
ยท
reacted to Kseniase's post with ๐Ÿ”ฅ 13 days ago
view post
Post
7701
8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
Enhancesย  novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG ->ย  CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots
  • 1 reply
ยท
posted an update 14 days ago
view post
Post
2117
๐Ÿ”Š Meet Kokoro Web - Free, ML speech synthesis on your computer, that'll make you ditch paid services!

28 natural voices, unlimited generations, and WebGPU acceleration. Perfect for journalists and content creators.

Test it with full articlesโ€”sounds amazingly human! ๐ŸŽฏ๐ŸŽ™๏ธ

Xenova/kokoro-web
posted an update 15 days ago
view post
Post
2681
โญ๏ธ The AI Energy Score project just launched - this is a game-changer for making informed decisions about AI deployment.

You can now see exactly how much energy your chosen model will consume, with a simple 5-star rating system. Think appliance energy labels, but for AI.

Looking at transcription models on the leaderboard is fascinating: choosing between whisper-tiny or whisper-large-v3 can make a 7x difference. Real-time data on these tradeoffs changes everything.

166 models already evaluated across 10 different tasks, from text generation to image classification. The whole thing is public and you can submit your own models to test.

Why this matters:
- Teams can pick efficient models that still get the job done
- Developers can optimize for energy use from day one
- Organizations can finally predict their AI environmental impact

If you're building with AI at any scale, definitely worth checking out.

๐Ÿ‘‰ leaderboard: https://lnkd.in/esrSxetj
๐Ÿ‘‰ blog post: https://lnkd.in/eFJvzHi8

Huge work led by @sasha with @bgamazay @yjernite @sarahooker @regisss @meg
  • 1 reply
ยท
posted an update 18 days ago
view post
Post
1295
๐Ÿ”ฅ Video AI is taking over! Out of 17 papers dropped on Hugging Face today, 6 are video-focused - from Sliding Tile Attention to On-device Sora. The race for next-gen video tech is heating up! ๐ŸŽฌ๐Ÿš€
posted an update 22 days ago
reacted to merve's post with ๐Ÿ‘ 25 days ago
view post
Post
3847
This week in open AI was ๐Ÿ”ฅ Let's recap! ๐Ÿค— merve/january-31-releases-679a10669bd4030090c5de4d
LLMs ๐Ÿ’ฌ
> Huge: AllenAI released new Tรผlu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B ๐Ÿ”ฅ
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license ๐Ÿ˜ฑ
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license ๐Ÿ”ฅ
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset

VLMs & vision ๐Ÿ‘€
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization ๐Ÿ”ฅ
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!

Audio ๐Ÿ—ฃ๏ธ
> YuE is a new open-source music generation foundation model, lyrics-to-song generation

Codebase ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co./blog/open-r1
  • 1 reply
ยท
posted an update 26 days ago
view post
Post
2391
๐Ÿ“Š R1 just built its own download dashboard!

Some fresh stats: +6M downloads for 800+ derivative models vs 2M for originals. Watch the numbers grow here: fdaudens/deepseek-download-stats
posted an update 29 days ago
view post
Post
3354
๐ŸŽฏ Kokoro TTS just hit v1.0! ๐Ÿš€

Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities โœจ

Check it out: hexgrad/Kokoro-82M
  • 1 reply
ยท
reacted to hexgrad's post with ๐Ÿ”ฅ 29 days ago
posted an update 30 days ago
view post
Post
1289
๐Ÿ’ช The open-source community is really unstoppable:

+5M total downloads for DeepSeek models on @hf .co
+4M are from the 700 models created by the community
That's 30% more than yesterday!
posted an update about 1 month ago
view post
Post
1713
๐Ÿš€ The open source community is unstoppable: 4M total downloads for DeepSeek models on Hugging Face, with 3.2M coming from the +600 models created by the community.

That's 30% more than yesterday!
  • 1 reply
ยท