ENOT-AutoDL
/

gpt-j-6B-tensorrt-int8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

igor commited on Jun 8, 2023

Commit

8245433

•

1 Parent(s): 554833e

updated README

Files changed (1) hide show

README.md +3 -8

README.md CHANGED Viewed

@@ -18,12 +18,7 @@ tags:
 GPT-J 6B is a transformer model trained using Ben Wang's [Mesh Transformer JAX](https://github.com/kingoflolz/mesh-transformer-jax/). "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters.
-This repository contains TensorRT engines with mixed precission int8 + fp32. You can find prebuilt engines for the following GPUs:
-* RTX 4090
-* RTX 3080 Ti
-* RTX 2080 Ti
-ONNX model generated by [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) and build script will be published soon.
 ## Metrics:
@@ -62,7 +57,7 @@ ONNX model generated by [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) and
 ## How to use
-Example of inference and accuracy test [published on github](https://github.com/ENOT-AutoDL/gpt-j-6B-tensorrt-int8):
 ```shell
-git clone https://github.com/ENOT-AutoDL/gpt-j-6B-tensorrt-int8
 ```

 GPT-J 6B is a transformer model trained using Ben Wang's [Mesh Transformer JAX](https://github.com/kingoflolz/mesh-transformer-jax/). "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters.
+This repository contains GPT-J 6B onnx model suitable for building TensorRT int8+fp32 engines. Quantization of model was performed by the [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) framework. Code for building of TensorRT engines and examples published on [github](https://github.com/ENOT-AutoDL/ENOT-transformers).
 ## Metrics:
 ## How to use
+Example of inference and accuracy test [published on github](https://github.com/ENOT-AutoDL/ENOT-transformers):
 ```shell
+git clone https://github.com/ENOT-AutoDL/ENOT-transformers
 ```