wb's picture
4 2

wb

whitebill

AI & ML interests

None yet

Recent Activity

reacted to as-cle-bert's post with ๐Ÿ”ฅ 9 days ago
๐ŸŽ‰๐„๐š๐ซ๐ฅ๐ฒ ๐๐ž๐ฐ ๐˜๐ž๐š๐ซ ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ๐ŸŽ‰ Hi HuggingFacers๐Ÿค—, I decided to ship early this year, and here's what I came up with: ๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft GitHub Repo ๐Ÿ‘‰ https://github.com/AstraBert/PdfItDown PyPi Package ๐Ÿ‘‰ https://pypi.org/project/pdfitdown/ ๐’๐ž๐ง๐“๐ซ๐„๐ฏ ๐ฏ๐Ÿ.๐ŸŽ.๐ŸŽ (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฎ๐—น performance of your ๐˜๐—ฒ๐˜…๐˜ ๐—ฒ๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด models, I have good news for you๐Ÿฅณ๐Ÿฅณ The new release for ๐’๐ž๐ง๐“๐ซ๐„๐ฏ now supports ๐—ฑ๐—ฒ๐—ป๐˜€๐—ฒ and ๐˜€๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ retrieval (thanks to FastEmbed by Qdrant) with ๐˜๐—ฒ๐˜…๐˜-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—ณ๐—ถ๐—น๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐˜€ (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new ๐—ฟ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ๐˜€! GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv Release Notes ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0 PyPi Package ๐Ÿ‘‰ https://pypi.org/project/sentrev/ Happy New Year and have fun!๐Ÿฅ‚
View all activity

Organizations

medical-ai-linkdata's profile picture

whitebill's activity

reacted to as-cle-bert's post with ๐Ÿ”ฅ 9 days ago
view post
Post
2056
๐ŸŽ‰๐„๐š๐ซ๐ฅ๐ฒ ๐๐ž๐ฐ ๐˜๐ž๐š๐ซ ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ๐ŸŽ‰

Hi HuggingFacers๐Ÿค—, I decided to ship early this year, and here's what I came up with:

๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft
GitHub Repo ๐Ÿ‘‰ https://github.com/AstraBert/PdfItDown
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/pdfitdown/

๐’๐ž๐ง๐“๐ซ๐„๐ฏ ๐ฏ๐Ÿ.๐ŸŽ.๐ŸŽ (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฎ๐—น performance of your ๐˜๐—ฒ๐˜…๐˜ ๐—ฒ๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด models, I have good news for you๐Ÿฅณ๐Ÿฅณ
The new release for ๐’๐ž๐ง๐“๐ซ๐„๐ฏ now supports ๐—ฑ๐—ฒ๐—ป๐˜€๐—ฒ and ๐˜€๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ retrieval (thanks to FastEmbed by Qdrant) with ๐˜๐—ฒ๐˜…๐˜-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—ณ๐—ถ๐—น๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐˜€ (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new ๐—ฟ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ๐˜€!
GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv
Release Notes ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/sentrev/

Happy New Year and have fun!๐Ÿฅ‚
  • 2 replies
ยท
reacted to awacke1's post with ๐Ÿ‘ 9 days ago
reacted to cfahlgren1's post with ๐Ÿ‘ 12 days ago
reacted to singhsidhukuldeep's post with ๐Ÿš€ 21 days ago
view post
Post
3631
Exciting breakthrough in AI: @Meta 's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization!

The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:

>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.

Three-Component Architecture:
โ€ข Lightweight Local Encoder that converts bytes to patch representations
โ€ข Powerful Global Latent Transformer that processes patches
โ€ข Local Decoder that converts patches back to bytes

>> Technical Advantages
โ€ข Matches performance of Llama 3 at 8B parameters while being more efficient
โ€ข Superior handling of non-English languages and rare character sequences
โ€ข Remarkable 99.9% accuracy on spelling tasks
โ€ข Better scaling properties than token-based models

>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.

This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!
  • 2 replies
ยท
reacted to clem's post with ๐Ÿš€ 24 days ago
view post
Post
1801
Coming back to Paris Friday to open our new Hugging Face office!

We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots ๐Ÿค–๐Ÿฆพ๐Ÿฆฟ

https://t.co/enkFXjWndJ
  • 1 reply
ยท
updated a collection about 1 month ago
upvoted an article about 1 month ago
view article
Article

Use Models from the Hugging Face Hub in LM Studio

By yagilb โ€ข
โ€ข 129
reacted to averoo's post with ๐Ÿ‘ 3 months ago
view post
Post
3793
Hello, researchers! I've tried to made reading HF Daily Papers easier and made a tool that does reviews with LLMs like Claude 3.5, GPT-4o and sometimes FLUX.

๐Ÿ“š Classification by topics
๐Ÿ“… Sorting by publication date and HF addition date
๐Ÿ”„ Syncing every 2 hours
๐Ÿ’ป Hosted on GitHub
๐ŸŒ English, Russian, and Chinese
๐Ÿ“ˆ Top by week/month (in progress)

๐Ÿ‘‰ https://hfday.ru

Let me know what do you think of it.