File size: 4,251 Bytes

bfb6abe
53eae04
 
d619970
bfb6abe
 
bfa6d6d
d619970
bfa6d6d
b4d0e1d
 
2b33b78
d619970
 
19cbcd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bfb6abe
 
71e37d7
bfb6abe
2b33b78
 
 
 
674d0fc
bfb6abe
2b33b78
2d66b0f
966ec27
94faa3c
bfb6abe
 
70f6c54
bfa6d6d
ea9a4f7
bfb6abe
674d0fc
 
 
 
 
 
bfb6abe
 
674d0fc
bfb6abe
674d0fc
bfa6d6d
 
 
bfb6abe
 
674d0fc
bfb6abe
 
 
 
 
b19b86b
 
bfb6abe
 
 
 
 
 
 
 
 
 
 
 
 
b19b86b
bfb6abe
b19b86b
 
 
 
 
 
 
 
 
 
 
 
bfb6abe
 
 
 
 
 
 
d619970

---
license: 
- other
- apache-2.0
tags:
- generated_from_trainer
- text-generation
- OPT
- non-commercial
- dialogue
- chatbot
- ai-msgbot
library_name: transformers
pipeline_tag: text-generation
widget:
  - text: 'If you could live anywhere, where would it be? peter szemraj:'
    example_title: live anywhere
  - text: 'What would you sing at Karaoke night? peter szemraj:'
    example_title: Karaoke
  - text: >-
      If you could hire someone to help you, would it be with cleaning, cooking,
      or yard work? peter szemraj:
    example_title: help
  - text: >-
      What form of public transportation do you prefer? (air, boat, train, bus,
      car, etc.) peter szemraj:
    example_title: transportation
  - text: 'What''s your favorite zoo animal? peter szemraj:'
    example_title: animal
  - text: 'Do you like or dislike surprises? Why or why not? peter szemraj:'
    example_title: surprises
  - text: >-
      What celebrity would you like to meet at Starbucks for a cup of coffee?
      peter szemraj:
    example_title: 'celebrity '
inference:
  parameters:
    min_length: 2
    max_length: 64
    temperature: 0.5
    no_repeat_ngram_size: 2
    repetition_penalty: 4.5
---

# pszemraj/opt-peter-2.7B

 <a href="https://colab.research.google.com/gist/pszemraj/26a69775c9d012051396ab5ae980f5c1/example-text-gen-pszemraj-opt-peter-2-7b.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b) on about 80k WhatsApp/text messages (mine). Please use responsibly :)

Test it out on Google Colab by clicking the button above.

![chatdemo](https://i.imgur.com/1EgQYat.png)

## Model description

- Exploring to see how OPT does in terms of dialogue/conversational applications
- Seems to do a lot better than GPT-Neo with similar training parameters 
- you can create your own digital clone and deploy it leveraging [this repository I am working on](https://github.com/pszemraj/ai-msgbot).

### sharded checkpoint

As this model file is 10+ GB, it can impose some constraints with lower RAM runtimes and/or download speeds. To help with this issue, a sharded checkpoint of this model is available [here](https://huggingface.co./pszemraj/opt-peter-2.7B-sharded).

The `pszemraj/opt-peter-2.7B-sharded` model can be used as a drop-in replacement for this one for all use cases.

## Intended uses & limitations

> The base model has a custom license that propagates to this one. **Most importantly, it cannot be used commercially**. Read more here: [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b) 

- the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 GB, Colab notebook linked above.
  - alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation
- **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs.

## Training and evaluation data

WhatsApp & iMessage data were parsed using [ai-msgbot](https://github.com/pszemraj/ai-msgbot) and then fed as a text dataset to the HF trainer.

## Training procedure

### Training hyperparameters

**SESSION ONE**

The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 3

**SESSION TWO**

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 4


### Framework versions

- Transformers 4.19.2
- Pytorch 1.10.0+cu113
- Datasets 2.2.2
- Tokenizers 0.12.1