--- license: - apache-2.0 - other tags: - generated_from_trainer - text-generation - opt - non-commercial - dialogue - chatbot - ai-msgbot inference: false --- # pszemraj/opt-peter-2.7B Open In Colab This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b) on about 80k WhatsApp/text messages (mine). Please use responsibly :) Test it out on Google Colab by clicking the button above. ![chatdemo](https://i.imgur.com/1EgQYat.png) ## Model description - Exploring to see how OPT does in terms of dialogue/conversational applications - Seems to do a lot better than GPT-Neo with similar training parameters - you can create your own digital clone and deploy it leveraging [this repository I am working on](https://github.com/pszemraj/ai-msgbot). ### sharded checkpoint As this model file is 10+ GB, it can impose some constraints with lower RAM runtimes and/or download speeds. To help with this issue, a sharded checkpoint of this model is available [here](https://huggingface.co./pszemraj/opt-peter-2.7B-sharded). The `pszemraj/opt-peter-2.7B-sharded` model can be used as a drop-in replacement for this one for all use cases. ## Intended uses & limitations > The base model has a custom license that propagates to this one. **Most importantly, it cannot be used commercially**. Read more here: [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b) - the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 GB, Colab notebook linked above. - alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation - **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs. ## Training and evaluation data WhatsApp & iMessage data were parsed using [ai-msgbot](https://github.com/pszemraj/ai-msgbot) and then fed as a text dataset to the HF trainer. ## Training procedure ### Training hyperparameters **SESSION ONE** The following hyperparameters were used during training: - learning_rate: 4e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.01 - num_epochs: 3 **SESSION TWO** The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 4 ### Framework versions - Transformers 4.19.2 - Pytorch 1.10.0+cu113 - Datasets 2.2.2 - Tokenizers 0.12.1