nondeterministic-shuffle-gpt2 / README.md

Update README.md

b020be2 verified 10 days ago

3.75 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co./docs/hub/model-cards
	{}
	---

	# Model Card for NondeterministicShuffle GPT-2

	<!-- Provide a quick summary of what the model is/does. -->

	This is one model in a collection of models trained on the impossible
	languages of [Kallini et al. 2024](https://arxiv.org/abs/2401.06416).

	This model is a GPT-2 Small model trained from scratch on the *NondeterministicShuffle*
	language. We include a total of 30 checkpoints over the course of
	model training, from step 100 to 3000 in increments of 100 steps.
	The main branch contains the final checkpoint (3000), and the other
	checkpoints are accessible as revisions.

	![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png)

	## Model Details

	- Developed by: Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
	- Model type: Causal Language Model
	- Language(s) (NLP): English
	- GitHub Repository: https://github.com/jkallini/mission-impossible-language-models
	- Paper: https://arxiv.org/pdf/2401.06416

	## Uses

	This artefact is solely intended for the study of language learning
	and acquisition in computational models. It should not be
	used in any production setting.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import GPT2LMHeadModel, GPT2Tokenizer
	import torch

	# Load model and tokenizer
	model_id = "mission-impossible-lms/nondeterministic-shuffle-gpt2"
	model = GPT2LMHeadModel.from_pretrained(model_id)
	tokenizer = GPT2Tokenizer.from_pretrained(model_id)

	# Set up the prompt and encode it
	prompt = "He clean"
	inputs = tokenizer(prompt, return_tensors="pt")

	# Generate text
	output = model.generate(inputs.input_ids, max_length=20)

	# Decode and print the generated text
	generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
	print(generated_text)
	```

	By default, the `main` branch of this model repo loads the
	last model checkpoint (3000). To access the other checkpoints,
	use the `revision` argument:

	```
	model = GPT2LMHeadModel.from_pretrained(model_id, revision="checkpoint-500")
	```
	This loads the model at checkpoint 500.

	## Training Details

	### Training Data

	This model was trained on the [100M-word BabyLM dataset](https://babylm.github.io/).
	Before training, we first transform the dataset into
	the corresponding impossible language, as described in
	our paper.

	### Training Procedure

	This model was trained for 3,000 gradient steps with
	a batch size of 2^19 tokens. We train with a learning
	rate that linearly warms up from 0 to 6e-4 over 300 steps.

	## Environmental Impact

	- Hardware Type: NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs.
	- Hours used: ~24 hours.

	## Citation

	```bibtex
	@inproceedings{kallini-etal-2024-mission,
	title = "Mission: Impossible Language Models",
	author = "Kallini, Julie and
	Papadimitriou, Isabel and
	Futrell, Richard and
	Mahowald, Kyle and
	Potts, Christopher",
	editor = "Ku, Lun-Wei and
	Martins, Andre and
	Srikumar, Vivek",
	booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
	month = aug,
	year = "2024",
	address = "Bangkok, Thailand",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2024.acl-long.787",
	doi = "10.18653/v1/2024.acl-long.787",
	pages = "14691--14714",
	}
	```

	## Model Card Authors

	Julie Kallini

	## Model Card Contact

	[email protected]