Can multiple NVIDIA T4 GPUs be used to deploy Gemma2-27B-IT?

#36

by armanZhou - opened Aug 20, 2024

Discussion

armanZhou

Aug 20, 2024

If so, how many T4 GPUs are needed?

Renu11

Google org Aug 20, 2024

Deploying Gemma2-27B-IT on Multiple T4 GPUs is not recommended due to the model's architecture, communication overhead and the need to consider different parallelism types. The Gemma 2-27B model is designed to run inference efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU. Please refer to the Gemma2 blog for more details.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment