Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,21 @@ Initial subjective testing has shown that this model can chat reasonably well in
|
|
22 |
- **Context length:** 256K
|
23 |
- **Knowledge cutoff date:** March 5, 2024
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
## How to use
|
26 |
|
27 |
※ - This code automatically appends the "<|startoftext|>" special token to any input.
|
|
|
22 |
- **Context length:** 256K
|
23 |
- **Knowledge cutoff date:** March 5, 2024
|
24 |
|
25 |
+
## Presequities
|
26 |
+
Jamba requires you use `transformers` version 4.39.0 or higher:
|
27 |
+
```bash
|
28 |
+
pip install transformers>=4.39.0
|
29 |
+
```
|
30 |
+
|
31 |
+
In order to run optimized Mamba implementations, you first need to install `mamba-ssm` and `causal-conv1d`:
|
32 |
+
```bash
|
33 |
+
pip install mamba-ssm causal-conv1d>=1.2.0
|
34 |
+
```
|
35 |
+
You also have to have the model on a CUDA device.
|
36 |
+
|
37 |
+
You can run the model not using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly lower latencies. In order to do that, you'll need to specify `use_mamba_kernels=False` when loading the model.
|
38 |
+
|
39 |
+
|
40 |
## How to use
|
41 |
|
42 |
※ - This code automatically appends the "<|startoftext|>" special token to any input.
|