3 11 112

Adriel Martins

Martins6

https://github.com/Martins6

Martins6

AI & ML interests

Graph Neural Networks (GNN) & Robot Learning & Multimodal AI

Recent Activity

liked a model 2 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

liked a model 10 days ago

kyutai/helium-1-preview-2b

liked a model 16 days ago

nvidia/Cosmos-1.0-Diffusion-7B-Text2World

View all activity

Organizations

None yet

Martins6's activity

liked a model 2 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Text Generation • Updated 1 day ago • 75.9k • • 425

liked a model 10 days ago

kyutai/helium-1-preview-2b

Text Generation • Updated 10 days ago • 5.94k • 125

liked a model 16 days ago

nvidia/Cosmos-1.0-Diffusion-7B-Text2World

Updated 14 days ago • 187k • 192

reacted to s-emanuilov's post with 😎👍➕🤝👀 21 days ago

Post

2572

Hey HF community! 👋

Excited to share Monkt - a tool I built to solve the eternal headache of processing documents for ML/AI pipelines.

What it does: Converts PDFs, Word, PowerPoint, Excel, Web pages or raw HTML into clean Markdown or structured JSON.

Great for:
✔ LLM training dataset preparation;
✔ Knowledge base construction;
✔ Research paper processing;
✔ Technical documentation management.

It has API access for integration into ML pipelines.

Check it out at https://monkt.com/ if you want to save time on document processing infrastructure.

Looking forward to your feedback!

3 replies

liked 2 datasets about 1 month ago

HuggingFaceM4/DocumentVQA

Viewer • Updated Dec 18, 2023 • 50k • 1.62k • 26

HuggingFaceM4/WebSight

Viewer • Updated Mar 26, 2024 • 2.75M • 8.75k • 340

liked 2 models about 1 month ago

HuggingFaceM4/idefics2-8b

Image-Text-to-Text • Updated Oct 14, 2024 • 17.8k • 602

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated Dec 2, 2024 • 52k • 340

liked 3 datasets about 1 month ago

liked a Space about 1 month ago

Running

235

🏃

Jupyter Agent

reacted to thomwolf's post with 🤗🔥🚀 about 2 months ago

Post

4957

We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive 📜 ODC-By 1.0 license, and the 💻 code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a 📝 blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi

2 replies

liked a model about 2 months ago

alibaba-pai/VideoCLIP-XL

Updated Oct 7, 2024 • 10