Can multiple NVIDIA T4 GPUs be used to deploy Gemma2-27B-IT?

#36
by armanZhou - opened

If so, how many T4 GPUs are needed?

Deploying Gemma2-27B-IT on Multiple T4 GPUs is not recommended due to the model's architecture, communication overhead and the need to consider different parallelism types. The Gemma 2-27B model is designed to run inference efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU. Please refer to the Gemma2 blog for more details.

Sign up or log in to comment