Spaces:
Runtime error
Runtime error
update datasets
Browse files- datasets/incoder.txt +3 -3
datasets/incoder.txt
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
-
[InCoder](https://huggingface.co/facebook/incoder-6B) was trained on **216 GB** of data from Github and Stackoverflow from 28 programming languages. 52 GB is in Python, 107GB in other programming languages and 57GB is content from Stackoverflow that isn't code.
|
2 |
|
3 |
The Github data used the following filtering:
|
4 |
-
- Average line length < 100
|
5 |
-
- Maximum line length < 3000
|
6 |
- Alphanumeric characters fraction > 0.4
|
7 |
- Remove auto-generated files (keyword search)
|
8 |
|
|
|
1 |
+
[InCoder](https://huggingface.co/facebook/incoder-6B) was trained on **216 GB** of data, after preprocessing, from Github and Stackoverflow from 28 programming languages. 52 GB is in Python, 107GB in other programming languages and 57GB is content from Stackoverflow that isn't code.
|
2 |
|
3 |
The Github data used the following filtering:
|
4 |
+
- Average line length < 100 tokens
|
5 |
+
- Maximum line length < 3000 MB
|
6 |
- Alphanumeric characters fraction > 0.4
|
7 |
- Remove auto-generated files (keyword search)
|
8 |
|