pszemraj's picture
Create README.md
756a6d8
|
raw
history blame
448 Bytes

ai-msgbot GPT2-L + daily dialogues

NOTE: this model card is a WIP

GPT2-L (774M parameters) trained on the Wizard of Wikipedia dataset for 40k steps with 34/36 layers frozen using aitextgen. This model was then subsequently trained on the Daily Dialogues dataset for an additional 40k steps, this time with 35 of 36 layers frozen.

Designed for use with ai-msgbot.