Model Card for Model ID

Model Details

Model Description

Model Description: This model card presents details for the gpt2-xl model, a large autoregressive language model optimized for text generation tasks. The model uses the GPT-2 architecture developed by OpenAI.

  • Model type: Autoregressive Language Model
  • Language(s) (NLP): English]

Uses

Direct Use

The model can be used for text generation tasks, such as completing sentences or generating coherent paragraphs.

Bias, Risks, and Limitations

The model may exhibit biases present in the training data and could generate inappropriate or sensitive content. Users should exercise caution when deploying the model in production.

Recommendations

Users should be aware of potential biases and limitations of the model, particularly when used in applications that involve sensitive or high-stakes content.

How to Get Started with the Model

Use the code below to get started with the model.

import torch from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "gpt2-xl" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name)

input_txt = "Bananas are a great" input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"]

output = model.generate(input_ids, max_length=200, do_sample=False) print(tokenizer.decode(output[0]))

Training Details

Training Data

The model was trained on a diverse range of internet text, including news articles, books, and websites.

Training Hyperparameters

Training regime: Autoregressive training with large-scale language modeling objectives Compute infrastructure: GPUs (specific details not disclosed)

Evaluation

Testing Data, Factors & Metrics

The model was evaluated on standard language modeling benchmarks, including perplexity scores on held-out data.

Downloads last month
19
Safetensors
Model size
1.56B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.