optimum (Hugging Face Optimum)

regisss

posted an update 14 days ago

Post

1629

Nice paper comparing the fp8 inference efficiency of Nvidia H100 and Intel Gaudi2: An Investigation of FP8 Across Accelerators for LLM Inference (2502.01070)

The conclusion is interesting: "Our findings highlight that the Gaudi 2, by leveraging FP8, achieves higher throughput-to-power efficiency during LLM inference"

One aspect of AI hardware accelerators that is often overlooked is how they consume less energy than GPUs. It's nice to see researchers starting carrying out experiments to measure this!

Gaudi3 results soon...

echarlaix

updated a dataset 24 days ago

optimum/documentation-images

Viewer • Updated 24 days ago • 15 • 13.1k • 2

baptistecolle

in optimum/llm-perf-leaderboard 25 days ago

fix-broken-graph

1

#37 opened 25 days ago by

baptistecolle

updated a Space 25 days ago

431

LLM-Perf Leaderboard

🏆

Explore LLM performance across hardware

pagezyhf

posted an update 29 days ago

Post

1685

We published https://huggingface.co./blog/deepseek-r1-aws!

If you are using AWS, give a read. It is a running document to showcase how to deploy and fine-tune DeepSeek R1 models with Hugging Face on AWS.

We're working hard to enable all the scenarios, whether you want to deploy to Inference Endpoints, Sagemaker or EC2; with GPUs or with Trainium & Inferentia.

We have full support for the distilled models, DeepSeek-R1 support is coming soon!! I'll keep you posted.

Cheers

1 reply

·

baptistecolle

in optimum/llm-perf-leaderboard 29 days ago

fix-memory-requirements-for-cpu

1

#36 opened 29 days ago by

baptistecolle

pagezyhf

posted an update about 2 months ago

Post

443

Learn how to deploy multiple LoRA adapters on Vertex AI with this blogpost, using Hugging Face Deep Learning Containers on GCP.

https://medium.com/google-cloud/open-models-on-vertex-ai-with-hugging-face-serving-multiple-lora-adapters-on-vertex-ai-e3ceae7b717c

jeffboudier

posted an update about 2 months ago

Post

661

NVIDIA just announced the Cosmos World Foundation Models, available on the Hub: nvidia/cosmos-6751e884dc10e013a0a0d8e6

Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development.
The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6

Learn more in this great community article by @mingyuliutw and @PranjaliJoshi https://huggingface.co./blog/mingyuliutw/nvidia-cosmos

1 reply

·

regisss

posted an update 2 months ago

Post

1020

Nice to see day 1 support of Falcon 3 on Gaudi with Optimum Habana!

👉 https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-support-falcon-3-fdn-models.html

baptistecolle

in optimum/llm-perf-leaderboard 3 months ago

Add torchao int4 weight only quantization as an option

6

#34 opened 3 months ago by

jerryzh168

pagezyhf

posted an update 3 months ago

Post

370

Today you are able to access some of the most famous models from the Hugging Face community in Amazon Bedrock 🤯

Amazon Bedrock expands its model catalog with Bedrock Marketplace to hundreds of specialized models.

https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/model-catalog

pagezyhf

posted an update 3 months ago

Post

976

It’s 2nd of December , here’s your Cyber Monday present 🎁 !

We’re cutting our price down on Hugging Face Inference Endpoints and Spaces!

Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3️⃣ months. We have other reductions on all instances ranging from 20 to 50%.

Sounds like the time to give Inference Endpoints a try? Get started today and find in our documentation the full pricing details.
https://ui.endpoints.huggingface.co/
https://huggingface.co./pricing

pagezyhf

posted an update 3 months ago

Post

305

Hello Hugging Face Community,

if you use Google Kubernetes Engine to host you ML workloads, I think this series of videos is a great way to kickstart your journey of deploying LLMs, in less than 10 minutes! Thank you @wietse-venema-demo !

To watch in this order:
1. Learn what are Hugging Face Deep Learning Containers
https://youtu.be/aWMp_hUUa0c?si=t-LPRkRNfD3DDNfr

2. Learn how to deploy a LLM with our Deep Learning Container using Text Generation Inference
https://youtu.be/Q3oyTOU1TMc?si=V6Dv-U1jt1SR97fj

3. Learn how to scale your inference endpoint based on traffic
https://youtu.be/QjLZ5eteDds?si=nDIAirh1r6h2dQMD

If you want more of these small tutorials and have any theme in mind, let me know!

jeffboudier

posted an update 3 months ago

Post

1091

New - add your bluesky account to your HF profile:
https://huggingface.co./settings/profile

Is the grass greener, the sky bluer? Will try and figure it out at https://bsky.app/profile/jeffboudier.bsky.social

By the way, HF people starter pack https://bsky.app/starter-pack/huggingface.bsky.social/3laz5x7naiz22

pagezyhf

posted an update 3 months ago

Post

1367

Hello Hugging Face Community,

I'd like to share here a bit more about our Deep Learning Containers (DLCs) we built with Google Cloud, to transform the way you build AI with open models on this platform!

With pre-configured, optimized environments for PyTorch Training (GPU) and Inference (CPU/GPU), Text Generation Inference (GPU), and Text Embeddings Inference (CPU/GPU), the Hugging Face DLCs offer:

⚡ Optimized performance on Google Cloud's infrastructure, with TGI, TEI, and PyTorch acceleration.
🛠️ Hassle-free environment setup, no more dependency issues.
🔄 Seamless updates to the latest stable versions.
💼 Streamlined workflow, reducing dev and maintenance overheads.
🔒 Robust security features of Google Cloud.
☁️ Fine-tuned for optimal performance, integrated with GKE and Vertex AI.
📦 Community examples for easy experimentation and implementation.
🔜 TPU support for PyTorch Training/Inference and Text Generation Inference is coming soon!

Find the documentation at https://huggingface.co./docs/google-cloud/en/index
If you need support, open a conversation on the forum: https://discuss.huggingface.co/c/google-cloud/69

echarlaix

updated a model 4 months ago

optimum/mobilenetv3_large_100.ra_in1k

Updated Oct 21, 2024 • 3

regisss

posted an update 4 months ago

Post

1419

Interested in performing inference with an ONNX model?⚡️

The Optimum docs about model inference with ONNX Runtime is now much clearer and simpler!

You want to deploy your favorite model on the hub but you don't know how to export it to the ONNX format? You can do it in one line of code as follows:

from optimum.onnxruntime import ORTModelForSequenceClassification

# Load the model from the hub and export it to the ONNX format
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

Check out the whole guide 👉 https://huggingface.co./docs/optimum/onnxruntime/usage_guides/models

jeffboudier

posted an update 5 months ago

Post

1099

This week in Inference Endpoints - thx @erikkaum for the update!

👀 https://huggingface.co./blog/erikkaum/endpoints-changelog

1 reply

·

IlyasMoutawwakil

updated a Space 5 months ago

12

Auto Benchmark

🏋

jeffboudier

posted an update 5 months ago

Post

462

Inference Endpoints got a bunch of cool updates yesterday, this is my top 3

Hugging Face Optimum

AI & ML interests

Recent Activity