Added LAMBADA evaluation

f1901b3 almost 2 years ago

4.58 kB

	---
	language:
	- en
	tags:
	- gpt2
	license: apache-2.0
	widget:
	- text: It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,
	datasets:
	- wikitext
	- openwebtext
	- spacemanidol/cc-stories
	model-index:
	- name: megatron-gpt2-345m
	results:
	- task:
	type: text-generation
	name: Text generation
	dataset:
	name: WikiText-103
	type: wikitext
	metrics:
	- type: wikitext
	value: 19.31
	name: Perplexity
	- task:
	type: text-generation
	name: Text generation
	dataset:
	name: WikiText-2
	type: wikitext
	metrics:
	- type: wikitext
	value: 17.151
	name: Perplexity
	- task:
	type: text-generation
	name: Text generation
	dataset:
	name: LAMBADA
	type: lambada
	metrics:
	- type: lambada
	value: 5.509
	name: Perplexity
	- type: lambada
	value: 68.31%
	name: Accuracy
	---

	<!---
	# ##############################################################################################
	#
	# Copyright (c) 2021-, NVIDIA CORPORATION. All rights reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software
	# distributed under the License is distributed on an "AS IS" BASIS,
	# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	# See the License for the specific language governing permissions and
	# limitations under the License.
	#
	# ##############################################################################################
	-->

	This is an archive of [nvidia/megatron-gpt2-345m](https://huggingface.co/nvidia/megatron-gpt2-345m) that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.<sup>1</sup> In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.<sup>2</sup>

	### References

	1. Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, [https://doi.org/10.48550/ARXIV.1909.08053](https://doi.org/10.48550/ARXIV.1909.08053).
	2. Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. [https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

	## Description

	[Megatron](https://arxiv.org/pdf/1909.08053.pdf) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters.

	Find more information at [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)

	# How to run Megatron GPT2 using Transformers

	## Text generation

	The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text.

	```python
	import os
	import torch

	from transformers import GPT2Tokenizer, GPT2LMHeadModel

	tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
	model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m")

	if torch.cuda.is_available():
	device = torch.device("cuda")
	model.half()
	else:
	device = torch.device("cpu")
	model.to(device)
	model.eval()

	# Generate
	prompt = (
	"It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,"
	)
	input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
	output = model.generate(
	input_ids=input_ids,
	max_length=len(input_ids) + 128,
	do_sample=True,
	top_k=64,
	top_p=0.9,
	temperature=0.8,
	num_return_sequences=2,
	repetition_penalty=1.025
	)

	# Output the text
	print("Prompt:", prompt)
	print("" 3)
	for i, sentence in enumerate(output):
	text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
	print(f"{i}:", text)
	print("" 3)
	```

	# Original code

	The original Megatron code can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).