erwanf
/

gpt2-mini

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt2-mini / README.md

erwanf's picture

Create README.md

f12cc7e verified 8 months ago

|

history blame contribute delete

3.61 kB

	---
	license: mit
	datasets:
	- Skylion007/openwebtext
	language:
	- en
	metrics:
	- perplexity
	pipeline_tag: text-generation
	---
	# GPT-2 Mini

	A smaller GPT-2 model with (only) 39M parameters. It was pretrained on a subset of OpenWebText, the open-source version of the pretraining dataset used by OpenAI for the original GPT-2 models.

	## Uses

	The purpose of this model is mainly for research and education. Its small size allows for fast experiments in resource-limited settings, while still being able of generating complex and coherent text.

	## Getting Started

	Use the code below to get started with the model:
	```py
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model
	model = AutoModelForCausalLM.from_pretrained("erwanf/gpt2-mini")
	model.eval()

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("erwanf/gpt2-mini")

	# Generate text
	prompt = "Hello, I'm a language model,"
	input_ids = tokenizer.encode(prompt, return_tensors="pt")

	output = model.generate(input_ids, do_sample=True, max_length=50, num_return_sequences=5)
	output_text = tokenizer.batch_decode(output, skip_special_tokens=True)
	print(output_text)
	```

	Output:
	```
	["Hello, I'm a language model, I can't be more efficient in words.\n\nYou can use this as a point to find out the next bit in your system, and learn more about me.\n\nI think a lot of the",
	"Hello, I'm a language model, my teacher is a good teacher - a good school teacher – and one thing you have to remember:\n\nIt's not perfect. A school is not perfect; it isn't perfect at all!\n\n",
	'Hello, I\'m a language model, but if I can do something for you then go for it (for a word). Here is my blog, the language:\n\nI\'ve not used "normal" in English words, but I\'ve always',
	'Hello, I\'m a language model, I\'m talking to you the very first time I used a dictionary and it can be much better than one word in my dictionary. What would an "abnormal" English dictionary have to do with a dictionary and',
	'Hello, I\'m a language model, the most powerful representation of words and phrases in the language I\'m using."\n\nThe new rules change that makes it much harder for people to understand a language that does not have a native grammar (even with']
	```

	## Training Details

	The architecture relies on the GPT-2 model, with smaller dimensions and less layers. It uses the same tokenizer as GPT-2. We used the first 2M rows from the OpenWebText dataset, out of which we use 1k for test and validation sets.

	### Hyperparameters

	\| Hyperparameter \| Value \|
	\|------------------------\|------------------\|
	\| Model Parameters \| \|
	\| Vocabulary Size \| 50,257 \|
	\| Context Length \| 512 \|
	\| Number of Layers \| 4 \|
	\| Hidden Size \| 512 \|
	\| Number of Attention Heads \| 8 \|
	\| Intermediate Size \| 2048 \|
	\| Activation Function \| GELU \|
	\| Dropout \| No \|
	\| Training Parameters\| \|
	\| Learning Rate \| 5e-4 \|
	\| Batch Size \| 256 \|
	\| Optimizer \| AdamW \|
	\| beta1 \| 0.9 \|
	\| beta2 \| 0.98 \|
	\| Weight Decay \| 0.1 \|
	\| Training Steps \| 100,000 \|
	\| Warmup Steps \| 4,000 \|
	\| Learning Rate Scheduler\| Cosine \|
	\| Training Dataset Size \| 1M samples \|
	\| Validation Dataset Size\| 1k samples \|
	\| Float Type \| bf16 \|