File size: 4,130 Bytes
0b1ce7c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co./docs/hub/model-cards
{}
---

# Model Card for *DeterministicShuffle(s=57)* GPT-2 (without Positional Encodings)

<!-- Provide a quick summary of what the model is/does. -->

This is one model in a collection of models trained on the impossible
languages of [Kallini et al. 2024](https://arxiv.org/abs/2401.06416).

This model is a GPT-2 Small model trained *without positional encodings*
from scratch on the ***DeterministicShuffle(s=57)***
language. We include a total of 30 checkpoints over the course of
model training, from step 100 to 3000 in increments of 100 steps.
The main branch contains the final checkpoint (3000), and the other
checkpoints are accessible as revisions.

![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png)

## Model Details

- **Developed by:** Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
- **Model type:** Causal Language Model
- **Language(s) (NLP):** English
- **GitHub Repository:** https://github.com/jkallini/mission-impossible-language-models
- **Paper:** https://arxiv.org/pdf/2401.06416

## Uses

This artefact is solely intended for the study of language learning
and acquisition in computational models. It should not be
used in any production setting.

## How to Get Started with the Model

Use the code below to get started with the model.

**Important:** This will download our modified GPT-2 code that does
not have absolute positional encodings. If using this model in the
same environment as another GPT-2 model with positional encodings,
load the second model as a `GPT2Model` explicitly.

```python
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_id = "mission-impossible-lms/deterministic-shuffle-s57-gpt2-no-pos"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Set up the prompt and encode it
prompt = "He clean"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
output = model.generate(inputs.input_ids, max_length=20)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
```

By default, the `main` branch of this model repo loads the
last model checkpoint (3000). To access the other checkpoints,
use the `revision` argument:

```
model = GPT2LMHeadModel.from_pretrained(model_id, revision="checkpoint-500")
```
This loads the model at checkpoint 500.

## Training Details

### Training Data

This model was trained on the [100M-word BabyLM dataset](https://babylm.github.io/).
Before training, we first transform the dataset into
the corresponding impossible language, as described in
our paper.

### Training Procedure

This model was trained for 3,000 gradient steps with
a batch size of 2^19 tokens. We train with a learning
rate that linearly warms up from 0 to 6e-4 over 300 steps.

## Environmental Impact

- **Hardware Type:** NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs.
- **Hours used:** ~24 hours.

## Citation 

```bibtex
@inproceedings{kallini-etal-2024-mission,
    title = "Mission: Impossible Language Models",
    author = "Kallini, Julie  and
      Papadimitriou, Isabel  and
      Futrell, Richard  and
      Mahowald, Kyle  and
      Potts, Christopher",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.787",
    doi = "10.18653/v1/2024.acl-long.787",
    pages = "14691--14714",
}
```

## Model Card Authors

Julie Kallini

## Model Card Contact

[email protected]