loubnabnl HF staff commited on
Commit
0b86e83
·
1 Parent(s): 17d4234

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -15
README.md CHANGED
@@ -10,18 +10,19 @@ pinned: false
10
  <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/codeparrot_logo.png" alt="drawing" width="440"/>
11
  </p>
12
 
13
-
14
- This organization is dedicated to language models for code generation. In particular CodeParrot is a GPT-2 model trained to generate Python code.
15
-
16
- ## Table of contents:
17
-
18
- * Interactive blog: [Code generation with 🤗](https://huggingface.co/spaces/loubnabnl/code-generation-models), where we compare different code models and explain how they are trained and evaluated.
19
- * Spaces: code generation with: [CodeParrot](https://huggingface.co/codeparrot/codeparrot) (1.5B), [InCoder](https://huggingface.co/facebook/incoder-6B) (6B) and [CodeGen](https://github.com/salesforce/CodeGen) (6B)
20
- * Models: CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.
21
- * Datasets:
22
- * [codeparrot-clean](https://huggingface.co/datasets/codeparrot/codeparrot-clean), dataset on which we trained and evaluated CodeParrot, the splits are available under [codeparrot-clean-train](https://huggingface.co/datasets/codeparrot/codeparrot-clean-train) and [codeparrot-clean-valid](https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid).
23
- * A more filtered version of codeparrot-clean under [codeparrot-train-more-filtering](https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering) and [codeparrot-train-more-filtering](https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering).
24
- * CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it's available under [codeparrot-train-near-deduplication](https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication) and [codeparrot-train-near-deduplication](https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication).
25
- * [GitHub-Code](https://huggingface.co/datasets/codeparrot/github-code), a 1TB dataset of 32 programming languages with 60 from GitHub files.
26
- * [GitHub-Jupyter](https://huggingface.co/datasets/codeparrot/github-jupyter), a 16.3GB dataset of Jupyter Notebooks from BigQuery GitHub.
27
- * [APPS](https://huggingface.co/datasets/codeparrot/apps), a benchmark for code generation with 10000 problems.
 
 
10
  <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/codeparrot_logo.png" alt="drawing" width="440"/>
11
  </p>
12
 
13
+ <p>This organization is dedicated to language models for code generation. In particular CodeParrot is a GPT-2 model trained to generate Python code.</p>
14
+ <h2 id="table-of-contents-">Table of contents:</h2>
15
+ <ul>
16
+ <li>Interactive blog: <a href="https://huggingface.co/spaces/loubnabnl/code-generation-models">Code generation with 🤗</a>, where we compare different code models and explain how they are trained and evaluated.</li>
17
+ <li>Spaces: code generation with: <a href="https://huggingface.co/codeparrot/codeparrot">CodeParrot</a> (1.5B), <a href="https://huggingface.co/facebook/incoder-6B">InCoder</a> (6B) and <a href="https://github.com/salesforce/CodeGen">CodeGen</a> (6B)</li>
18
+ <li>Models: CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>
19
+ <li>Datasets:<ul>
20
+ <li><a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid">codeparrot-clean-valid</a>.</li>
21
+ <li>A more filtered version of codeparrot-clean under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering">codeparrot-train-more-filtering</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering">codeparrot-train-more-filtering</a>.</li>
22
+ <li>CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it&#39;s available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication">codeparrot-train-near-deduplication</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication">codeparrot-train-near-deduplication</a>.</li>
23
+ <li><a href="https://huggingface.co/datasets/codeparrot/github-code">GitHub-Code</a>, a 1TB dataset of 32 programming languages with 60 from GitHub files.</li>
24
+ <li><a href="https://huggingface.co/datasets/codeparrot/github-jupyter">GitHub-Jupyter</a>, a 16.3GB dataset of Jupyter Notebooks from BigQuery GitHub.</li>
25
+ <li><a href="https://huggingface.co/datasets/codeparrot/apps">APPS</a>, a benchmark for code generation with 10000 problems.</li>
26
+ </ul>
27
+ </li>
28
+ </ul>