ivkalgin commited on
Commit
2db91ae
1 Parent(s): 292111d

updated README.md (added latency table)

Browse files
Files changed (1) hide show
  1. README.md +32 -5
README.md CHANGED
@@ -25,13 +25,40 @@ This repository contains TensorRT engines with mixed precission int8 + fp32. You
25
 
26
  ONNX model generated by [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) and build script will be published soon.
27
 
28
- ## Test result
29
 
30
- | |INT8|FP32|
31
- |---|:---:|:---:|
32
- | **Lambada Acc** |78.50%|79.17%|
33
- | **Model size (GB)** |8.5|24.2|
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## How to use
37
 
 
25
 
26
  ONNX model generated by [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) and build script will be published soon.
27
 
28
+ ## Metrics:
29
 
30
+ | |TensorRT INT8+FP32|torch FP16|torch FP32|
31
+ |---|:---:|:---:|:---:|
32
+ | **Lambada Acc** |78.79%|79.17%|-|
33
+ | **Model size (GB)** |8.5|12.1|24.2|
34
 
35
+ ### Test environment
36
+
37
+ * GPU RTX 4090
38
+ * CPU 11th Gen Intel(R) Core(TM) i7-11700K
39
+ * TensorRT 8.5.3.1
40
+ * pytorch 1.13.1+cu116
41
+
42
+ ## Latency:
43
+
44
+ |Input sequance length|Number of generated tokens|TensorRT INT8+FP32 ms|torch FP16 ms|Acceleration|
45
+ |:---:|:---:|:---:|:---:|:---:|
46
+ |64|64|1040|1610|1.55|
47
+ |64|128|2089|3224|1.54|
48
+ |64|256|4236|6479|1.53|
49
+ |128|64|1060|1619|1.53|
50
+ |128|128|2120|3241|1.53|
51
+ |128|256|4296|6510|1.52|
52
+ |256|64|1109|1640|1.49|
53
+ |256|128|2204|3276|1.49|
54
+ |256|256|4443|6571|1.49|
55
+
56
+ ### Test environment
57
+
58
+ * GPU RTX 4090
59
+ * CPU 11th Gen Intel(R) Core(TM) i7-11700K
60
+ * TensorRT 8.5.3.1
61
+ * pytorch 1.13.1+cu116
62
 
63
  ## How to use
64