|
--- |
|
library_name: transformers |
|
tags: |
|
- text-generation-inference |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Model Description: |
|
This model card presents details for the gpt2-xl model, a large autoregressive language model optimized for text generation tasks. The model uses the GPT-2 architecture developed by OpenAI. |
|
|
|
- **Model type:** Autoregressive Language Model |
|
- **Language(s) (NLP):** English] |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
The model can be used for text generation tasks, such as completing sentences or generating coherent paragraphs. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The model may exhibit biases present in the training data and could generate inappropriate or sensitive content. Users should exercise caution when deploying the model in production. |
|
|
|
### Recommendations |
|
|
|
Users should be aware of potential biases and limitations of the model, particularly when used in applications that involve sensitive or high-stakes content. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "gpt2-xl" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
input_txt = "Bananas are a great" |
|
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"] |
|
|
|
output = model.generate(input_ids, max_length=200, do_sample=False) |
|
print(tokenizer.decode(output[0])) |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on a diverse range of internet text, including news articles, books, and websites. |
|
|
|
#### Training Hyperparameters |
|
|
|
Training regime: Autoregressive training with large-scale language modeling objectives |
|
Compute infrastructure: GPUs (specific details not disclosed) |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
The model was evaluated on standard language modeling benchmarks, including perplexity scores on held-out data. |