Spaces:
Runtime error
Runtime error
update
Browse files- datasets/github_code.txt +2 -2
datasets/github_code.txt
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories
|
2 |
|
3 |
```python
|
4 |
from datasets import load_dataset
|
@@ -17,6 +17,6 @@ print(next(iter(ds)))
|
|
17 |
}
|
18 |
|
19 |
```
|
20 |
-
You can see that in addition to the code, the samples include
|
21 |
|
22 |
For model-specific information about the pretraining dataset, please select a model below:
|
|
|
1 |
+
We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories in 32 programming languages. The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
|
2 |
|
3 |
```python
|
4 |
from datasets import load_dataset
|
|
|
17 |
}
|
18 |
|
19 |
```
|
20 |
+
You can see that in addition to the code, the samples include some metadata: repo name, path, language, license, and the size of the file.
|
21 |
|
22 |
For model-specific information about the pretraining dataset, please select a model below:
|