Adriana213
/

gpt2-xl-text-generation

Text Generation

text-generation-inference

Model card Files Files and versions Community

gpt2-xl-text-generation / README.md

Adriana213's picture

Update README.md

559b5ee verified about 1 year ago

|

history blame contribute delete

2.21 kB

	---
	library_name: transformers
	tags:
	- text-generation-inference
	license: mit
	language:
	- en
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details

	### Model Description

	Model Description:
	This model card presents details for the gpt2-xl model, a large autoregressive language model optimized for text generation tasks. The model uses the GPT-2 architecture developed by OpenAI.

	- Model type: Autoregressive Language Model
	- Language(s) (NLP): English]

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	The model can be used for text generation tasks, such as completing sentences or generating coherent paragraphs.

	## Bias, Risks, and Limitations

	The model may exhibit biases present in the training data and could generate inappropriate or sensitive content. Users should exercise caution when deploying the model in production.

	### Recommendations

	Users should be aware of potential biases and limitations of the model, particularly when used in applications that involve sensitive or high-stakes content.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "gpt2-xl"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	input_txt = "Bananas are a great"
	input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"]

	output = model.generate(input_ids, max_length=200, do_sample=False)
	print(tokenizer.decode(output[0]))


	## Training Details

	### Training Data

	The model was trained on a diverse range of internet text, including news articles, books, and websites.

	#### Training Hyperparameters

	Training regime: Autoregressive training with large-scale language modeling objectives
	Compute infrastructure: GPUs (specific details not disclosed)

	## Evaluation

	### Testing Data, Factors & Metrics

	The model was evaluated on standard language modeling benchmarks, including perplexity scores on held-out data.