Model Card for DeterministicShuffle(s=84) GPT-2
This is one model in a collection of models trained on the impossible languages of Kallini et al. 2024.
This model is a GPT-2 Small model trained from scratch on the DeterministicShuffle(s=84) language. We include a total of 30 checkpoints over the course of model training, from step 100 to 3000 in increments of 100 steps. The main branch contains the final checkpoint (3000), and the other checkpoints are accessible as revisions.
Model Details
- Developed by: Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
- Model type: Causal Language Model
- Language(s) (NLP): English
- GitHub Repository: https://github.com/jkallini/mission-impossible-language-models
- Paper: https://arxiv.org/pdf/2401.06416
Uses
This artefact is solely intended for the study of language learning and acquisition in computational models. It should not be used in any production setting.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
# Load model and tokenizer
model_id = "mission-impossible-lms/deterministic-shuffle-s84-gpt2"
model = GPT2LMHeadModel.from_pretrained(model_id)
tokenizer = GPT2Tokenizer.from_pretrained(model_id)
# Set up the prompt and encode it
prompt = "He clean"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate text
output = model.generate(inputs.input_ids, max_length=20)
# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
By default, the main
branch of this model repo loads the
last model checkpoint (3000). To access the other checkpoints,
use the revision
argument:
model = GPT2LMHeadModel.from_pretrained(model_id, revision="checkpoint-500")
This loads the model at checkpoint 500.
Training Details
Training Data
This model was trained on the 100M-word BabyLM dataset. Before training, we first transform the dataset into the corresponding impossible language, as described in our paper.
Training Procedure
This model was trained for 3,000 gradient steps with a batch size of 2^19 tokens. We train with a learning rate that linearly warms up from 0 to 6e-4 over 300 steps.
Environmental Impact
- Hardware Type: NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs.
- Hours used: ~24 hours.
Citation
@inproceedings{kallini-etal-2024-mission,
title = "Mission: Impossible Language Models",
author = "Kallini, Julie and
Papadimitriou, Isabel and
Futrell, Richard and
Mahowald, Kyle and
Potts, Christopher",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.787",
doi = "10.18653/v1/2024.acl-long.787",
pages = "14691--14714",
}
Model Card Authors
Julie Kallini
Model Card Contact
- Downloads last month
- 2