File size: 4,251 Bytes
bfb6abe 53eae04 d619970 bfb6abe bfa6d6d d619970 bfa6d6d b4d0e1d 2b33b78 d619970 19cbcd4 bfb6abe 71e37d7 bfb6abe 2b33b78 674d0fc bfb6abe 2b33b78 2d66b0f 966ec27 94faa3c bfb6abe 70f6c54 bfa6d6d ea9a4f7 bfb6abe 674d0fc bfb6abe 674d0fc bfb6abe 674d0fc bfa6d6d bfb6abe 674d0fc bfb6abe b19b86b bfb6abe b19b86b bfb6abe b19b86b bfb6abe d619970 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
---
license:
- other
- apache-2.0
tags:
- generated_from_trainer
- text-generation
- OPT
- non-commercial
- dialogue
- chatbot
- ai-msgbot
library_name: transformers
pipeline_tag: text-generation
widget:
- text: 'If you could live anywhere, where would it be? peter szemraj:'
example_title: live anywhere
- text: 'What would you sing at Karaoke night? peter szemraj:'
example_title: Karaoke
- text: >-
If you could hire someone to help you, would it be with cleaning, cooking,
or yard work? peter szemraj:
example_title: help
- text: >-
What form of public transportation do you prefer? (air, boat, train, bus,
car, etc.) peter szemraj:
example_title: transportation
- text: 'What''s your favorite zoo animal? peter szemraj:'
example_title: animal
- text: 'Do you like or dislike surprises? Why or why not? peter szemraj:'
example_title: surprises
- text: >-
What celebrity would you like to meet at Starbucks for a cup of coffee?
peter szemraj:
example_title: 'celebrity '
inference:
parameters:
min_length: 2
max_length: 64
temperature: 0.5
no_repeat_ngram_size: 2
repetition_penalty: 4.5
---
# pszemraj/opt-peter-2.7B
<a href="https://colab.research.google.com/gist/pszemraj/26a69775c9d012051396ab5ae980f5c1/example-text-gen-pszemraj-opt-peter-2-7b.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b) on about 80k WhatsApp/text messages (mine). Please use responsibly :)
Test it out on Google Colab by clicking the button above.
![chatdemo](https://i.imgur.com/1EgQYat.png)
## Model description
- Exploring to see how OPT does in terms of dialogue/conversational applications
- Seems to do a lot better than GPT-Neo with similar training parameters
- you can create your own digital clone and deploy it leveraging [this repository I am working on](https://github.com/pszemraj/ai-msgbot).
### sharded checkpoint
As this model file is 10+ GB, it can impose some constraints with lower RAM runtimes and/or download speeds. To help with this issue, a sharded checkpoint of this model is available [here](https://huggingface.co./pszemraj/opt-peter-2.7B-sharded).
The `pszemraj/opt-peter-2.7B-sharded` model can be used as a drop-in replacement for this one for all use cases.
## Intended uses & limitations
> The base model has a custom license that propagates to this one. **Most importantly, it cannot be used commercially**. Read more here: [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b)
- the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 GB, Colab notebook linked above.
- alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation
- **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs.
## Training and evaluation data
WhatsApp & iMessage data were parsed using [ai-msgbot](https://github.com/pszemraj/ai-msgbot) and then fed as a text dataset to the HF trainer.
## Training procedure
### Training hyperparameters
**SESSION ONE**
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 3
**SESSION TWO**
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 4
### Framework versions
- Transformers 4.19.2
- Pytorch 1.10.0+cu113
- Datasets 2.2.2
- Tokenizers 0.12.1 |