File size: 2,677 Bytes
bfb6abe
53eae04
 
 
bfb6abe
 
bfa6d6d
 
 
b4d0e1d
 
bfa6d6d
3d50205
bfb6abe
 
71e37d7
bfb6abe
71e37d7
bfb6abe
2d66b0f
 
966ec27
94faa3c
bfb6abe
 
70f6c54
bfa6d6d
ea9a4f7
bfb6abe
 
 
71e37d7
bfb6abe
2d66b0f
bfa6d6d
 
 
bfb6abe
 
ea9a4f7
bfb6abe
 
 
 
 
b19b86b
 
bfb6abe
 
 
 
 
 
 
 
 
 
 
 
 
b19b86b
bfb6abe
b19b86b
 
 
 
 
 
 
 
 
 
 
 
bfb6abe
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: 
- apache-2.0
- other
tags:
- generated_from_trainer
- text-generation
- opt
- non-commercial
- dialogue
- chatbot

inference: false
---

# pszemraj/opt-peter-2.7B

This model is a fine-tuned version of [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b) on about 80k whatsapp/text messages (mine). Please use responsibly :)

Test it out on Google Colab [here](https://colab.research.google.com/gist/pszemraj/26a69775c9d012051396ab5ae980f5c1/example-text-gen-pszemraj-opt-peter-2-7b.ipynb)!

![chatdemo](https://i.imgur.com/1EgQYat.png)

## Model description

- Exploring to see how OPT does in terms of dialogue/conversational applications
- Seems to do a lot better than GPT-Neo with similar training parameters 
- you can create your own digital clone and deploy it leveraging [this repository I am working on](https://github.com/pszemraj/ai-msgbot).

## Intended uses & limitations

> The base model has a custom license which propogates to this one. Most importantly, it cannot be used commercially. Read more here: [facebook/opt-2.7b](https://huggingface.co./facebook/opt-2.7b) 

- the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 gb, Colab notebook linked above.
  - alternatively, you can message [a bot on telegram](http://t.me/GPTPeter_bot) where I test LLMs for dialogue generation
- **any statements or claims made by this model do not reflect actual claims/statements by me.** Keep in mind it is a _fine-tuned_ version of the model on my data, so things from pre-training are also present in outputs.

## Training and evaluation data

WhatsApp & iMessage parsed using [ai-msgbot](https://github.com/pszemraj/ai-msgbot) and then fed as a text dataset to the HF trainer.

## Training procedure

### Training hyperparameters

**SESSION ONE**

The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 3

**SESSION TWO**

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 4


### Framework versions

- Transformers 4.19.2
- Pytorch 1.10.0+cu113
- Datasets 2.2.2
- Tokenizers 0.12.1