text-embeddings-inference documentation

Build a custom container for TEI

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Build a custom container for TEI

You can build our own CPU or CUDA TEI container using Docker. To build a CPU container, run the following command in the directory containing your custom Dockerfile:

docker build .

To build a CUDA container, it is essential to determine the compute capability (compute cap) of the GPU that will be used at runtime. This information is crucial for the proper configuration of the CUDA containers. The following are the examples of runtime compute capabilities for various GPU types:

  • Turing (T4, RTX 2000 series, …) - runtime_compute_cap=75
  • A100 - runtime_compute_cap=80
  • A10 - runtime_compute_cap=86
  • Ada Lovelace (RTX 4000 series, …) - runtime_compute_cap=89
  • H100 - runtime_compute_cap=90

Once you have determined the compute capability is determined, set it as the runtime_compute_cap variable and build the container as shown in the example below:

runtime_compute_cap=80

docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap
< > Update on GitHub