DamarJati (Damar Jati 🍫)

reacted to ZennyKenny's post with 🔥 4 days ago

Post

3338

I've completed the first unit of the just-launched Hugging Face Agents Course. I would highly recommend it, even for experienced builders, because it is a great walkthrough of the smolagents library and toolkit.

reacted to tomaarsen's post with 🔥 4 days ago

Post

7067

📣 Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1️⃣ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2️⃣ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later 😉

🔒 Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1️⃣ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2️⃣ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

1 reply

·

reacted to m-ric's post with 🔥 4 days ago

Post

3195

Today we make the biggest release in smolagents so far: 𝘄𝗲 𝗲𝗻𝗮𝗯𝗹𝗲 𝘃𝗶𝘀𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹𝘀, 𝘄𝗵𝗶𝗰𝗵 𝗮𝗹𝗹𝗼𝘄𝘀 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹 𝘄𝗲𝗯 𝗯𝗿𝗼𝘄𝘀𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁𝘀! 🥳

Our agents can now casually open up a web browser, and navigate on it by scrolling, clicking elements on the webpage, going back, just like a user would.

The demo below shows Claude-3.5-Sonnet browsing GitHub for task: "Find how many commits the author of the current top trending repo did over last year."
Hi @mlabonne !

Go try it out, it's the most cracked agentic stuff I've seen in a while 🤯 (well, along with OpenAI's Operator who beat us by one day)

For more detail, read our announcement blog 👉 https://huggingface.co./blog/smolagents-can-see
The code for the web browser example is here 👉 https://github.com/huggingface/smolagents/blob/main/examples/vlm_web_browser.py

3 replies

·

posted an update about 2 months ago

Post

2765

Happy New Year 2025 🤗
For the Huggingface community.

reacted to victor's post with 🔥 6 months ago

Post

5713

🙋 Calling all Hugging Face users! We want to hear from YOU!

What feature or improvement would make the biggest impact on Hugging Face?

Whether it's the Hub, better documentation, new integrations, or something completely different – we're all ears!

Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! 👇

175 replies

·

posted an update 6 months ago

Post

4151

Improved ControlNet!
Now supports dynamic resolution for perfect landscape and portrait outputs. Generate stunning images without distortion—optimized for any aspect ratio!
...
DamarJati/FLUX.1-DEV-Canny

reacted to victor's post with ➕ 9 months ago

Post

2272

Am I the only one who think command-r-+ is a better daily Assistant than ChatGPT-4? (and it's not even close :D)

6 replies

·

replied to KingNish's post 10 months ago

Wow, this has quite a short processing time.
Awesome!

reacted to KingNish's post with 🔥 10 months ago

Post

2640

Introducing JARVIS Tony's voice assistant for You.

JARVIS responds to all your questions in audio format.
Must TRY -> KingNish/JARVIS

Jarvis is currently equipped to accept text input and provide audio output.
In the future, it may also support audio input.

DEMO Video:

4 replies

·

reacted to dhuynh95's post with 🤯 11 months ago

Post

Hello World! This post is written by the Large Action Model framework LaVague! Find out more on https://github.com/mithril-security/LaVague

Edit: Here is the video of 🌊LaVague posting this. This is quite meta

2 replies

·

reacted to macadeliccc's post with 👍 about 1 year ago

Post

Benefits of imatrix quantization in place of quip#

Quip-# is a quantization method proposed by [Cornell-RelaxML](https://github.com/Cornell-RelaxML) that claims tremendous performance gains using only 2-bit precision.

RelaxML proposes that quantizing a model from 16 bit to 2 bit precision they can utilize Llama-2-70B on a single 24GB GPU.

QuIP# aims to revolutionize model quantization through a blend of incoherence processing and advanced lattice codebooks. By switching to a Hadamard transform-based incoherence approach, QuIP# enhances GPU efficiency, making weight matrices more Gaussian-like and ideal for quantization with its improved lattice codebooks.

This new method has already seen some adoption by projects like llama.cpp. The use of the Quip-# methodology has been implemented in the form of imatrix calculations. The importance matrix is calculated from a dataset such as wiki.train.raw and will output the perplexity on the given dataset.

This interim step can improve the results of the quantized model. If you would like to explore this process for yourself:

llama.cpp - https://github.com/ggerganov/llama.cpp/
Quip# paper - https://cornell-relaxml.github.io/quip-sharp/
AutoQuip# colab - https://colab.research.google.com/drive/1rPDvcticCekw8VPNjDbh_UcivVBzgwEW?usp=sharing

Other impressive quantization projects to watch:
+ AQLM
https://github.com/Vahe1994/AQLM
https://arxiv.org/abs/2401.06118

reacted to alielfilali01's post with 👍 about 1 year ago

Post

I love the new Viewer and i didn't knew how much i needed it until now
@sylvain , @lhoestq and team, GREAT JOB 🔥 and THANK YOU 🤗

4 replies

·

reacted to Xenova's post with 🤯❤️ about 1 year ago

Post

Introducing Remove Background Web: In-browser background removal, powered by @briaai 's new RMBG-v1.4 model and 🤗 Transformers.js!

Everything runs 100% locally, meaning none of your images are uploaded to a server! 🤯 At only ~45MB, the 8-bit quantized version of the model is perfect for in-browser usage (it even works on mobile).

Check it out! 👇
Demo: Xenova/remove-background-web
Model: briaai/RMBG-1.4

9 replies

·

reacted to fffiloni's post with ❤️ about 1 year ago

Post

Quick build of the day: LCM Supa Fast Image Variation
—
We take the opportunity to combine moondream1 vision and LCM SDXL fast abilities to generate a variation from the subject of the image input.
All that thanks to gradio APIs 🤗

Try the space: https://huggingface.co./spaces/fffiloni/lcm-img-variations

3 replies

·

reacted to joaogante's post with ❤️ about 1 year ago

Post

Up to 3x faster LLM generation with no extra resources/requirements - ngram speculation has landed in 🤗 transformers! 🏎️💨

All you need to do is to add prompt_lookup_num_tokens=10 to your generate call, and you'll get faster LLMs 🔥

How does it work? 🤔

Start with assisted generation, where a smaller model generates candidate sequences. The net result is a significant speedup if the model agrees with the candidate sequences! However, we do require a smaller model trained similarly 😕

The idea introduced (and implemented) by Apoorv Saxena consists of gathering the candidate sequences from the input text itself. If the latest generated ngram is in the input, use the continuation therein as a candidate! No smaller model is required while still achieving significant speedups 🔥

In fact, the penalty of gathering and testing the candidates is so small that you should use this technique whenever possible!

Here is the code example that produces the outputs shown in the video: https://pastebin.com/bms6XtR4

Have fun 🤗

3 replies

·

reacted to alvarobartt's post with 🤯 about 1 year ago

Post

💨 Notux 8x7b was just released!

From Argilla, we recently fine-tuned Mixtral 8x7b Instruct from Mistral AI using DPO, and a binarized and curated version of UltraFeedback, to find out it outperforms every other MoE-based model on the Hub.

- argilla/notux-8x7b-v1
- argilla/ultrafeedback-binarized-preferences-cleaned

19 replies

·

Damar Jati 🍫

AI & ML interests

Recent Activity

Organizations

DamarJati's activity