SmolVLM speeding locally on a laptop thanks to mlx-vlm and @Gradio ! Try it with two lines: pip install git+https://github.com/andimarafioti/mlx-vlm.git@stream-generate-fix python -m mlx_vlm.chat_ui --model mlx-community/SmolVLM-Instruct-8bit
Gotta love the MLX community! Big thanks to @pcuenq and @prince_canuma !
reacted to danielhanchen's
post with π₯about 2 months ago
This is no Woodstock AI but will be fun nonetheless haha. Iβll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.
1,000 spots available first-come first serve with some surprises during the stream!
π€ Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)
Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And weβre not going to stop here β stay tuned as we enable more experiences to build AI with open models on Google Cloud!
It's a multimodal model based on Llama 3.1 that accepts an arbitrary number of interleaved images with text with a huge context window (10k tokens!) β¨
Supported by Hugging Face transformers π€
2 replies
Β·
reacted to dvilasuero's
post with π€ππ₯7 months ago
Today is a huge day in Argillaβs history. We couldnβt be more excited to share this with the community: weβre joining Hugging Face!
Weβre embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.
Over the past year, weβve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyrβs learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets
After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, weβre now the same team.
To those of you whoβve been following us, this wonβt be a huge surprise, but it will be a big deal in the coming months. This acquisition means weβll double down on empowering the community to build and collaborate on high quality datasets, weβll bring full support for multimodal datasets, and weβll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.
βΌοΈSentence Transformers v3.0 is out! You can now train and finetune embedding models with multi-GPU training, bf16 support, loss logging, callbacks & much more. I also release 50+ datasets to train on.
1οΈβ£ Training Refactor Embedding models can now be trained using an extensive trainer with a lot of powerful features: - MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP)) - bf16 training support; loss logging - Evaluation datasets + evaluation loss - Improved callback support + an excellent Weights & Biases integration - Gradient checkpointing, gradient accumulation - Improved model card generation - Resuming from a training checkpoint without performance loss - Hyperparameter Optimization and much more! Read my detailed blogpost to learn about the components that make up this new training approach: https://huggingface.co./blog/train-sentence-transformers
2οΈβ£ Similarity Score Not sure how to compare embeddings? Don't worry, you can now use model.similarity(embeddings1, embeddings2) and you'll get your similarity scores immediately. Model authors can specify their desired similarity score, so you don't have to worry about it anymore!
3οΈβ£ Additional Kwargs Sentence Transformers relies on various Transformers instances (AutoModel, AutoTokenizer, AutoConfig), but it was hard to provide valuable keyword arguments to these (like 'torch_dtype=torch.bfloat16' to load a model a lower precision for 2x inference speedup). This is now easy!
4οΈβ£ Hyperparameter Optimization Sentence Transformers now ships with HPO, allowing you to effectively choose your hyperparameters for your data and task.
π₯ Prometheus 2 was recently released by Kaist AI as an alternative and closely mirroring both human and GPT-4 evaluation, and surpassing the former Prometheus!
π¬οΈFine-tuned on top of mistralai/Mistral-7B-Instruct-v0.2 and mistralai/Mixtral-8x7B-Instruct-v0.1 ποΈThe datasets used for fine-tuning have been publicly released i.e. prometheus-eval/Feedback-Collection and prometheus-eval/Preference-Collection π€π»Unified LM evaluator for absolute (a single prompt-completion pair) and relative (two completions for a given prompt) due to model merging βNo longer needs a mandatory reference / golden answer, but can still be provided optionally πSurpasses the former version of Prometheus, and has a high correlation with human, GPT-4, and Claude 3 Opus scores when evaluating LMs πApache 2.0 license
Long-story short, an amazing job from Kaist AI bridging the gap with LLM evaluators other than proprietary and bigger models!
This week at Argilla, we decided to add a new task to use Prometheus 2 as an LLM evaluator using distilabel, so we implemented PrometheusEval.
π± Using PrometheusEval running their 7B variant with vLLM in a single L40 on top of HuggingFaceH4/instruction-dataset, we got the 327 existing prompt-completion pairs evaluated and pushed to the Hub in less than 2 minutes!
As part of the Data is Better Together MPEP project, we are now at the point where some translation efforts have successfully translated 500 highly ranked prompts into a new target language (amazing work from @Rijgersberg et al!)
Our next step is to use these translated prompts to evaluate the performance of LLMs for non English languages.
Does LLM, as a judge, work outside of English?
Ideally, it would be compelling to leverage LLMs to judge models for non-English since this significantly lowers the barrier to evaluating models (although it doesn't remove this barrier altogether).
What we want to know is: - does auto/LLM eval work in general for a particular language - which model(s) works best as a judge - do LLMs' judgments of non-English models match human preferences?
Can you create domain-specific synthetic datasets in under 20 minutes?
@burtenshaw recently launched the Domain Specific Dataset Project as part of Data is Better Together. As part of this, Ben created a Space that you can use to define some key perspectives and concepts from a domain. This seed dataset can then be used to generate a synthetic dataset for a particular domain.
In less than 30 minutes this afternoon, I created a domain-specific dataset focused on data-centric machine learning using these tools: davanstrien/data-centric-ml-sft.
π Sentence Transformers v2.7.0 is out! Featuring a new loss function, easier Matryoshka model inference & evaluation, CrossEncoder improvements & Intel Gaudi2 Accelerator support. Details:
1οΈβ£ A new loss function: CachedGISTEmbedLoss This loss function is a combination of CachedMultipleNegativesRankingLoss and the GISTEmbedLoss, both of which are already excellent. The caching mechanism allows for much higher batch sizes with constant memory usage, which boosts training performance. The GIST part introduces a guide model to guide the in-batch negative sample selection. This prevents false negatives, resulting in a stronger training signal.
2οΈβ£ Automatic Matryoshka model truncation Matryoshka models produce embeddings that are still useful after truncation. However, this truncation always had to be done manually, until now! We've added a truncate_dim option to the Sentence Transformer constructor. This also allows truncation when using HuggingFaceEmbeddings from LlamaIndex or LangChain.
3οΈβ£ Additionally, you can now specify truncate_dim in evaluators to get the performance after truncation. (Hint: it's surprisingly good, even for models not trained with MatryoshkaLoss, and it can speed up e.g. clustering, retrieval, etc.)
4οΈβ£ CrossEncoder improvements The CrossEncoder now supports 'push_to_hub' to upload trained reranker models to Hugging Face. Additionally, CrossEncoders now support trust_remote_code to load models with custom modelling code.
5οΈβ£ Inference on Intel Gaudi2 If you have an Intel Gaudi2 Accelerator, Sentence Transformers now uses it automatically for even faster inference. No changes are necessary to your code, the device is automatically detected!
I'm very excited for the upcoming releases: I'm making great progress with a notable v3 refactor that should heavily improve the training process for embedding models!
2 replies
Β·
reacted to sayakpaul's
post with π€π₯9 months ago
We're introducing experimental support for device_map in Diffusers π€
If you have multiple GPUs you want to use to distribute the pipeline models, you can do so. Additionally, this becomes more useful when you have multiple low-VRAM GPUs.