File size: 3,680 Bytes
9fe1804 fb332d6 4bfa8b8 fb332d6 4bfa8b8 9fe1804 fb332d6 4bfa8b8 fb332d6 4bfa8b8 fb332d6 4bfa8b8 fb332d6 4bfa8b8 fb332d6 4bfa8b8 fb332d6 4bfa8b8 fb332d6 4bfa8b8 fb332d6 d6692e7 b54267f 179d601 044920b 95d16f5 4bfa8b8 fb332d6 4bfa8b8 50ed878 4bfa8b8 fb332d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
---
license: llama2
library_name: peft
tags:
- typescript
- instruction-tuning
- code-generation
- lora
- peft
base_model: codellama/CodeLlama-13b-hf
model-index:
- name: lora-out
results: []
datasets:
- mhhmm/typescript-instruct-20k
language:
- en
metrics:
- code_eval
pipeline_tag: text-generation
---
## Architecture
![The Architecture](https://github.com/LeVuMinhHuy/brocode/blob/master/.pics/about-the-model.png?raw=true)
## About
This model is a fine-tuned version of [codellama/CodeLlama-13b-hf](https://huggingface.co./codellama/CodeLlama-13b-hf).
It achieves the following results on the evaluation set:
- Loss: 0.4268
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.7555 | 0.01 | 1 | 0.7062 |
| 0.7036 | 0.05 | 7 | 0.6673 |
| 0.5422 | 0.1 | 14 | 0.5152 |
| 0.5351 | 0.15 | 21 | 0.4866 |
| 0.495 | 0.2 | 28 | 0.4688 |
| 0.5651 | 0.25 | 35 | 0.4587 |
| 0.5146 | 0.3 | 42 | 0.4486 |
| 0.4955 | 0.35 | 49 | 0.4469 |
| 0.5117 | 0.4 | 56 | 0.4432 |
| 0.5245 | 0.45 | 63 | 0.4410 |
| 0.5003 | 0.5 | 70 | 0.4371 |
| 0.4502 | 0.55 | 77 | 0.4340 |
| 0.527 | 0.6 | 84 | 0.4315 |
| 0.48 | 0.65 | 91 | 0.4305 |
| 0.448 | 0.7 | 98 | 0.4289 |
| 0.5427 | 0.75 | 105 | 0.4289 |
| 0.4715 | 0.8 | 112 | 0.4279 |
| 0.5584 | 0.85 | 119 | 0.4276 |
| 0.4936 | 0.9 | 126 | 0.4267 |
| 0.4788 | 0.95 | 133 | 0.4268 |
| 0.476 | 1.0 | 140 | 0.4268 |
### Framework versions
- Transformers 4.36.0.dev0
- Pytorch 2.0.1+cu118
- Datasets 2.15.0
- Tokenizers 0.15.0
- PEFT 0.6.0
### Evaluation
I'm using MultiPL-E benchmark, the same as Code Llmama using in their paper
| Modal | Pass@k | Estimate | Num problems |
|-----------------------------------------|--------|----------|---------------|
| Code LLama - Instruct 13B | 1 | 39.0% | 159 |
| Our 13B | 1 | 42.4% | 159 |
How to reproduce my evaluation? Just run like the offical document of MultiPL-E: https://nuprl.github.io/MultiPL-E/tutorial.html, change the modal name by my model here: `mhhmm/typescript-instruct-20k`
This is the code that I ran with Google Colab (using A100 40GB, yes, it requires that much GPU RAM)
If you even have a stronger GPU, increase the --batch-size, or --completion-limit
```
!pip install --upgrade pip
!pip install aiohttp numpy tqdm pytest datasets torch transformers sentencepiece
!git clone https://github.com/nuprl/MultiPL-E
%cd MultiPL-E
!mkdir typescript
!python3 automodel.py --name mhhmm/typescript-instruct-20k-v2 --root-dataset humaneval --lang ts --temperature 0.2 --batch-size 10 --completion-limit 20 --output-dir-prefix typescript
%cd evaluation/src
!python3 main.py --dir ../../typescript --output-dir ../../typescript --recursive
!python3 pass_k.py ./typescript/*
```
|