Adriel Martins

Martins6

AI & ML interests

Graph Neural Networks (GNN) & Robot Learning & Multimodal AI

Recent Activity

Organizations

None yet

Martins6's activity

reacted to s-emanuilov's post with πŸ˜ŽπŸ‘βž•πŸ€πŸ‘€ 21 days ago
view post
Post
2572
Hey HF community! πŸ‘‹

Excited to share Monkt - a tool I built to solve the eternal headache of processing documents for ML/AI pipelines.

What it does: Converts PDFs, Word, PowerPoint, Excel, Web pages or raw HTML into clean Markdown or structured JSON.

Great for:
βœ” LLM training dataset preparation;
βœ” Knowledge base construction;
βœ” Research paper processing;
βœ” Technical documentation management.

It has API access for integration into ML pipelines.

Check it out at https://monkt.com/ if you want to save time on document processing infrastructure.

Looking forward to your feedback!
  • 3 replies
Β·
liked a Space about 1 month ago
reacted to thomwolf's post with πŸ€—πŸ”₯πŸš€ about 2 months ago
view post
Post
4957
We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of πŸ—£οΈlanguages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

πŸ₯‚ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive πŸ“œ ODC-By 1.0 license, and the πŸ’» code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a πŸ“ blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi
  • 2 replies
Β·