File size: 5,476 Bytes
29f7ee7 26d8ca1 b293209 29f7ee7 32b9ffa 29f7ee7 7c7ac28 29f7ee7 947c26c 29f7ee7 26d8ca1 90c53e8 26d8ca1 29f7ee7 947c26c b7d87e9 947c26c 29f7ee7 5c02b8f c5ba20f 29f7ee7 26d8ca1 29f7ee7 39d059f 29f7ee7 26d8ca1 29f7ee7 4c0a55d aee7493 29f7ee7 947c26c 26d8ca1 29f7ee7 947c26c 26d8ca1 29f7ee7 947c26c 26d8ca1 1b0a758 aee7493 1b0a758 26d8ca1 29f7ee7 26d8ca1 a4a2790 32b9ffa 7c7ac28 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 1b0a758 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 1b0a758 32b9ffa 1b0a758 32b9ffa 1b0a758 32b9ffa 1b0a758 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa 947c26c 32b9ffa b293209 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
---
library_name: transformers
tags:
- lyrics
- text
- text-to-lyrics
- artist-to-lyrics
- text-generation
datasets:
- smgriffin/modern-pop-lyrics
language:
- en
base_model:
- openai-community/gpt2
pipeline_tag: text-generation
---
# Model Card for pop-lyrics-generator-v1
<!-- Provide a quick summary of what the model is/does. -->
Finetuned from openai-community/gpt2 on smgriffin/modern-pop-lyrics - generates lyrics for specific pop artists.
### Model Description
<!-- Provide a longer summary of what this model is. -->
It's pretty good at generating a song structure and stylized lyrics by artist, but bad at rhyming. Sometimes repeats the same thing over and over, but so do pop artists.
It might be good for inspiration while writing lyrics. Some of the content generated can be really silly and potentially offensive - especially if you input Lil Wayne.
- **Developed by:** Scott Griffin
- **Model type:** Generative Language
- **Language(s) (NLP):** English, Spanish
- **Finetuned from model [optional]:** openai-community/gpt2
Check out the w&b run here: [https://wandb.ai/scottgriffinm-scott-griffin-industrial-complex/pop-lyrics-generator-v1?nw=nwuserscottgriffinm](https://wandb.ai/scottgriffinm-scott-griffin-industrial-complex/pop-lyrics-generator-v1?nw=nwuserscottgriffinm)
& my blog post on making it here: [https://scottsblog.glitch.me#pop-lyrics-generator-v1](https://scottsblog.glitch.me#pop-lyrics-generator-v1)
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
This model is not for commercial use. The content is the property of the individual artist from which the model was finetuned.
This is for research purposes only.
## How to Use
Use the code below to generate lyrics:
```python
from transformers import GPT2LMHeadModel, GPT2Tokenizer, pipeline
# load model
model_name = "smgriffin/pop-lyrics-generator-v1"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
# create text generation pipeline
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
# prompt for justin bieber lyrics
artist_name = "Justin Bieber"
prompt = f"Artist: {artist_name}\nLyrics:"
# generate and print
generated_texts = text_generator(
prompt,
max_length=150,
num_return_sequences=1,
temperature=0.9, # less than .9 results in a lot of repeated lyrics
top_k=50,
top_p=0.95,
do_sample=True,
)
print("Generated Lyrics:")
print(generated_texts[0]["generated_text"])
```
## How to Fine-Tune Your Own Lyric Generation Model
Use the code below to get finetune your own GPT2 model (for example on the smgriffin/modern-pop-lyrics dataset):
```python
import os
import pandas as pd
from datasets import load_dataset
from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TrainingArguments, DataCollatorForLanguageModeling
# output directory
output_dir = "/your/output/directory"
os.makedirs(output_dir, exist_ok=True)
# load dataset
dataset = load_dataset("smgriffin/modern-pop-lyrics")
# preprocess dataset
def preprocess_function(example):
# Combine artist name with lyrics for conditioning
combined = [f"Artist: {artist}\nLyrics: {lyrics}\n\n" for artist, lyrics in zip(example['artist'], example['lyrics'])]
return {"text": combined}
processed_dataset = dataset.map(preprocess_function, batched=True)
# split to train and test sets
train_test_split = processed_dataset["train"].train_test_split(test_size=0.1, seed=42)
train_dataset = train_test_split["train"]
val_dataset = train_test_split["test"]
# load tokenizer, model
model_name = "gpt2" # Base GPT-2 model for fine-tuning
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
# fill pad_token with eos_tone (gpt2 doesn't have a padding token)
tokenizer.pad_token = tokenizer.eos_token
# tokenize dataset
def tokenize_function(example):
tokenized = tokenizer(
example["text"],
truncation=True,
padding="max_length",
max_length=512,
)
return {
"input_ids": tokenized["input_ids"],
"attention_mask": tokenized["attention_mask"],
"labels": tokenized["input_ids"],
}
train_dataset = train_dataset.map(tokenize_function, batched=True, remove_columns=["artist", "lyrics", "text"])
val_dataset = val_dataset.map(tokenize_function, batched=True, remove_columns=["artist", "lyrics", "text"])
# data collator
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
# load GPT-2
model = GPT2LMHeadModel.from_pretrained(model_name)
# training arguments
training_args = TrainingArguments(
output_dir=output_dir,
overwrite_output_dir=True,
evaluation_strategy="epoch",
learning_rate=2e-5,
weight_decay=0.01,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=10,
save_steps=1000,
save_total_limit=1,
logging_dir=f"{output_dir}/logs",
logging_steps=50,
gradient_accumulation_steps=2,
fp16=True,
max_grad_norm=1.0,
push_to_hub=False,
)
# init trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
tokenizer=tokenizer,
data_collator=data_collator,
)
# start fine-tuning
trainer.train()
# save model
trainer.save_model(output_dir)
tokenizer.save_pretrained(output_dir)
``` |