File size: 3,848 Bytes
bc5ad48
 
 
 
 
 
 
 
 
 
 
 
 
88e79d9
b97b4c1
88e79d9
b97b4c1
 
57c5b11
88e79d9
 
 
 
b97b4c1
88e79d9
b97b4c1
88e79d9
57c5b11
88e79d9
57c5b11
88e79d9
 
 
b97b4c1
88e79d9
b97b4c1
57c5b11
88e79d9
 
 
b97b4c1
88e79d9
 
 
 
57c5b11
88e79d9
 
 
 
 
 
 
b97b4c1
88e79d9
 
087bce7
88e79d9
 
57c5b11
88e79d9
 
 
 
 
 
 
 
 
 
 
57c5b11
88e79d9
b97b4c1
88e79d9
 
 
 
 
b97b4c1
88e79d9
b97b4c1
88e79d9
b97b4c1
57c5b11
087bce7
 
 
 
88e79d9
b97b4c1
88e79d9
57c5b11
087bce7
 
 
 
88e79d9
b97b4c1
88e79d9
b97b4c1
88e79d9
b97b4c1
88e79d9
b97b4c1
88e79d9
 
b97b4c1
88e79d9
b97b4c1
57c5b11
 
b97b4c1
bc5ad48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88e79d9
b97b4c1
88e79d9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
language: en
tags:
- text-generation
- YouTube-scripts
- fine-tuned
- causal-lm
datasets:
- custom
license: mit
model_name: Gemma 2 Scripter
---

# Gemma 2 Scripter

**Gemma 2 Scripter** is a fine-tuned version of the Gemma 2 2B instruct model designed for generating high-quality YouTube scripts based on provided keywords. It is optimized for text generation tasks, delivering coherent and contextually relevant outputs.

## Model Details

- **Model Name**: `Sidharthan/gemma2_scripter`
- **Architecture**: Causal Language Model
- **Base Model**: Gemma 2 2B
- **Fine-tuning Objective**: Script generation using prompt-based keywords.

## How to Use

### Installation

Ensure you have the following dependencies installed:

```bash
pip install torch transformers peft
```

### Code Sample

```python
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM
import torch

# Load the model and tokenizer
model_name = "Sidharthan/gemma2_scripter"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

model = AutoPeftModelForCausalLM.from_pretrained(
    model_name,
    device_map=None,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    trust_remote_code=True,
    low_cpu_mem_usage=True
).to(device)

# Generate a script
def generate_script(prompt):
    formatted_prompt = f"<bos><start_of_turn>keywords\n{prompt}<end_of_turn>\n<start_of_turn>script\n"
    inputs = tokenizer(formatted_prompt, return_tensors="pt")
    inputs = {key: value.to(device) for key, value in inputs.items()}
    
    outputs = model.generate(
        **inputs,
        max_length=1024,
        do_sample=True,
        temperature=0.7,
        top_p=0.95,
        top_k=50,
        repetition_penalty=1.2,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id
    )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

# Example usage
prompt = "crosshatch waffle texture, dark chocolate, four bar crispy wafers, kat, milk chocolate"
response = generate_script(prompt)
print(f"Generated Script:\n{response}")
```

### Input Format

The model expects prompts in the following format:

```
<bos><start_of_turn>keywords
<your_keywords_here><end_of_turn>
<start_of_turn>script

```

Example:
```
<bos><start_of_turn>keywords
crosshatch waffle texture, dark chocolate, four bar crispy wafers, kat, milk chocolate<end_of_turn>
<start_of_turn>script

```

### Output

The output is a YouTube script generated based on the keywords provided.

### Performance

- CPU: Slower inference due to computational constraints.
- GPU: Optimized for faster inference with FP16 support.

### Applications

- Generating structured scripts for video content
- Keyword-based text generation for creative tasks

## Training Details

### Training Data

The model was fine-tuned on a custom dataset of YouTube scripts paired with their corresponding keywords.

### Training Procedure

- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Optimization**: AdamW optimizer
- **Learning Rate**: 2e-4
- **Batch Size**: 4
- **Training Steps**: 1000

## Limitations

- The model's output quality depends on the clarity and relevance of input keywords
- May occasionally generate repetitive content
- Performance may vary based on hardware capabilities

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{gemma2_scripter,
  author = {Sidharthan},
  title = {Gemma 2 Scripter: Fine-tuned YouTube Script Generator},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/Sidharthan/gemma2_scripter}}
}
```

### License

This model is released under the MIT License.