language:
- en
tags:
- gpt2
license: apache-2.0
widget:
- text: >-
It was a bright cold day in April, and the clocks were striking thirteen.
Winston Smith,
datasets:
- wikitext
- openwebtext
- spacemanidol/cc-stories
model-index:
- name: megatron-gpt2-345m
results:
- task:
type: text-generation
name: Text generation
dataset:
name: WikiText-103
type: wikitext
metrics:
- type: wikitext
value: 19.31
name: Perplexity
- task:
type: text-generation
name: Text generation
dataset:
name: WikiText-2
type: wikitext
metrics:
- type: wikitext
value: 17.151
name: Perplexity
- task:
type: text-generation
name: Text generation
dataset:
name: LAMBADA
type: lambada
metrics:
- type: lambada
value: 5.509
name: Perplexity
- type: lambada
value: 68.31%
name: Accuracy
This is an archive of nvidia/megatron-gpt2-345m that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.1 In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.2
References
- Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, https://doi.org/10.48550/ARXIV.1909.08053.
- Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
Description
Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters.
Find more information at https://github.com/NVIDIA/Megatron-LM
How to run Megatron GPT2 using Transformers
Text generation
The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text.
import os
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m")
if torch.cuda.is_available():
device = torch.device("cuda")
model.half()
else:
device = torch.device("cpu")
model.to(device)
model.eval()
# Generate
prompt = (
"It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,"
)
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(
input_ids=input_ids,
max_length=len(input_ids) + 128,
do_sample=True,
top_k=64,
top_p=0.9,
temperature=0.8,
num_return_sequences=2,
repetition_penalty=1.025
)
# Output the text
print("Prompt:", prompt)
print("*" * 3)
for i, sentence in enumerate(output):
text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
print(f"{i}:", text)
print("*" * 3)
Original code
The original Megatron code can be found here: https://github.com/NVIDIA/Megatron-LM.