File size: 2,417 Bytes
bfb6abe bfa6d6d bfb6abe bfa6d6d b4d0e1d bfa6d6d 3d50205 bfb6abe 71e37d7 bfb6abe 71e37d7 bfb6abe 2d66b0f 966ec27 94faa3c bfb6abe 70f6c54 bfa6d6d bfb6abe 71e37d7 bfb6abe 2d66b0f bfa6d6d bfb6abe b19b86b bfb6abe b19b86b bfb6abe b19b86b bfb6abe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
license: apache-2.0
tags:
- generated_from_trainer
- text-generation
- opt
- non-commercial
- dialogue
- chatbot
inference: false
---
# pszemraj/opt-peter-2.7B
This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b) on about 80k whatsapp/text messages (mine). Please use responsibly :)
Test it out on Google Colab [here](https://colab.research.google.com/gist/pszemraj/26a69775c9d012051396ab5ae980f5c1/example-text-gen-pszemraj-opt-peter-2-7b.ipynb)!
![chatdemo](https://i.imgur.com/1EgQYat.png)
## Model description
- Exploring to see how OPT does in terms of dialogue/conversational applications
- Seems to do a lot better than GPT-Neo with similar training parameters
## Intended uses & limitations
> The base model has a custom license which propogates to this one. Most importantly, it cannot be used commercially. Read more here: [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b)
- the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 gb, Colab notebook linked above.
- alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation
- **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs.
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
**SESSION ONE**
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 3
**SESSION TWO**
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 4
### Framework versions
- Transformers 4.19.2
- Pytorch 1.10.0+cu113
- Datasets 2.2.2
- Tokenizers 0.12.1
|