Update README.md
Browse files
README.md
CHANGED
@@ -94,6 +94,33 @@ print(generated_text)
|
|
94 |
# perspective. The puppy is sitting on a wooden deck, which is composed ...
|
95 |
```
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
## Evaluations
|
98 |
|
99 |
| Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |
|
|
|
94 |
# perspective. The puppy is sitting on a wooden deck, which is composed ...
|
95 |
```
|
96 |
|
97 |
+
To make inference more efficient, run with autocast:
|
98 |
+
|
99 |
+
```python
|
100 |
+
with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
|
101 |
+
output = model.generate_from_batch(
|
102 |
+
inputs,
|
103 |
+
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
104 |
+
tokenizer=processor.tokenizer
|
105 |
+
)
|
106 |
+
```
|
107 |
+
|
108 |
+
We did most of our evaluation in this setting (autocast on, but float32 weights)
|
109 |
+
|
110 |
+
To even further reduce the memory requirements, the model can be run with bfloat16 weights:
|
111 |
+
|
112 |
+
```python
|
113 |
+
model.to(dtype=torch.bfloat16)
|
114 |
+
inputs["images"] = inputs["images"].to(torch.bfloat16)
|
115 |
+
output = model.generate_from_batch(
|
116 |
+
inputs,
|
117 |
+
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
118 |
+
tokenizer=processor.tokenizer
|
119 |
+
)
|
120 |
+
```
|
121 |
+
|
122 |
+
Note that we have observed that this can (rarely) change the output of the model compared to running with float32 weights.
|
123 |
+
|
124 |
## Evaluations
|
125 |
|
126 |
| Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |
|