Update README.md
Browse files
README.md
CHANGED
@@ -54,9 +54,17 @@ Developed by: Replit, Inc.
|
|
54 |
The training mixture includes **20 different languages**, listed here in descending order of number of tokens:
|
55 |
<br/>
|
56 |
`Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
|
57 |
-
|
58 |
In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, `replit-code-v1-3b` has been trained on **525B** tokens (~195 tokens per parameter).
|
59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
## Intended Use
|
61 |
Replit intends this model be used by anyone as a foundational model for application-specific fine-tuning without strict limitations on commercial use.
|
62 |
|
|
|
54 |
The training mixture includes **20 different languages**, listed here in descending order of number of tokens:
|
55 |
<br/>
|
56 |
`Markdown`, `Java`, `JavaScript`, `Python`, `TypeScript`, `PHP`, `SQL`, `JSX`, `reStructuredText`, `Rust`, `C`, `CSS`, `Go`, `C++`, `HTML`, `Vue`, `Ruby`, `Jupyter Notebook`, `R`, `Shell`
|
57 |
+
<br/>
|
58 |
In total, the training dataset contains 175B tokens, which were repeated over 3 epochs -- in total, `replit-code-v1-3b` has been trained on **525B** tokens (~195 tokens per parameter).
|
59 |
|
60 |
+
The model has been trained on the [MosaicML](https://www.mosaicml.com/) platform with 256 x A100-40GB GPUs, leveraging their latest [LLM examples repo](https://github.com/mosaicml/examples/tree/release/v0.0.4/examples/llm).
|
61 |
+
<br/>
|
62 |
+
`replit-code-v1-3b` is powered by state-of-the-art LLM techniques, such as:
|
63 |
+
[Flash Attention](https://arxiv.org/abs/2205.14135) for fast training and inference,
|
64 |
+
[AliBi positional embeddings](https://arxiv.org/abs/2108.12409) to support variable context length at inference time,
|
65 |
+
[LionW optimizer](https://arxiv.org/abs/2302.06675),
|
66 |
+
etc.
|
67 |
+
|
68 |
## Intended Use
|
69 |
Replit intends this model be used by anyone as a foundational model for application-specific fine-tuning without strict limitations on commercial use.
|
70 |
|