|
--- |
|
title: Nanogpt2 Text Generator |
|
emoji: 🏢 |
|
colorFrom: blue |
|
colorTo: gray |
|
sdk: gradio |
|
sdk_version: 4.37.1 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
--- |
|
|
|
## Dataset |
|
Collection of William Shakespeare plays |
|
- tiktoken - gpt2 tokenizer is used for tokenization |
|
- Number of total tokens - 338025 |
|
|
|
## Model |
|
|
|
The model is available [here](https://huggingface.co/sayanbanerjee32/nanogpt2_test) |
|
|
|
## The HuggingFace Spaces Gradio App |
|
|
|
The App takes following as input |
|
1. Seed Text (Prompt) - This is provided as input text to the GPT model, based on which it generates further contents. If no data is provided, the only a space (" ") is provided as input |
|
2. Max tokens to generate - This controls the numbers of tokens it will generate. The default value is 100. |
|
3. Temperature - This accepts values between 0 to 1. Higher value introduces more randomness in the next token generation. Default value is set to 0.7. |
|
4. Select Top N in each step - This is an optional field. If no value is provided (or <= 0), all available tokens are considered for the next token prediction based on SoftMax probability. However, if a number is set then only that many top tokes will be considered for the next token prediction. |
|
|
|
|