File size: 3,202 Bytes
2612b48
 
 
 
 
 
 
 
 
 
 
 
 
7af3242
2612b48
e5aa181
2612b48
 
 
 
 
 
 
 
8aa9b51
2612b48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b6a587
2612b48
1b6a587
 
 
 
 
 
 
 
 
2612b48
 
4e99fd3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2612b48
 
 
 
 
 
 
 
4e99fd3
 
 
7af3242
4e99fd3
 
2612b48
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
language:
- fr
library_name: transformers
tags:
- music
- rap
- lyrics
---
# Kichtral-7B-v0.3: a Mistral-7B Casual LM for French Rap Lyrics

## Overview

__Kichtral-7B-v0.3__ is a Casual Language Model fine-tuned from the __Mistral 7B__ model on __french rap lyrics__. The training dataset consists of cleaned French verses, with no repetitions, from songs that have at least 10k streams on Spotify. This dataset contains a total of __36M tokens__.

This model aims to __understand and generate__ french rap lyrics, making it a valuable tool for __research__ in __french slang__ and __music lyrics generation__.

## Model Details

Kichtral-7B-v0.3 is based on the Mistral 7B v0.3 architecture and has been fine-tuned with the following hyperparameters:

| Parameter           | Value    |
|---------------------|----------|
| Epochs              | 1        |
| LoRA Rank           | 64       |
| LoRA Alpha          | 128      |
| LoRA Dropout        | 0.1      |
| Learning Rate       | 1e-4     |
| Learning Scheduler  | Cosine   |

### Versions

The model was trained using AWS SageMaker on a single ml.g5.2xlarge instance during 15 hours with the following software versions:

| Requirement            | Version   |
|------------------------|-----------|
| Transformers           | 4.28      |
| PyTorch                | 2.0       |
| Python                 | 3.10      |

## Installation

Install the required Python libraries:

```bash
pip install transformers
```

## Loading the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("rapminerz/Kichtral-7B-v0.3")
model = AutoModelForCausalLM.from_pretrained("rapminerz/Kichtral-7B-v0.3")
```

## Using the Model

```python
def generate_lyrics(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        inputs["input_ids"], 
        max_length=300, 
        num_return_sequences=1,
        top_k=10,
        top_p=0.95,
        temperature=1.0,
        repetition_penalty=1.2
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

generate_lyrics("Okay ça fait")
"""
Okay ça fait un moment que tu m'appelles
Sans t'écouter, j'ai dû me tailler
Jusqu'à présent, je sais pas qui t'es mais je peux pas t'oublier
Tu m'as laissé des images dans l'crâne
Quand je repense à ce soir-là
"""

generate_lyrics("Je viens de là où")
"""
Je viens de là où ça tire
Je fais la loi je suis pas le roi
Et je sais que tu penses à moi quand t'as besoin d'aide
Quand y a trop d'ennemis autour de toi qui se mêlent
"""
```

## Purpose and Disclaimer

This model is designed for academic and research purposes only. It is not intended for commercial use. The creators of this model do not endorse or promote any specific views or opinions that may be represented in the dataset.

__Please mention @RapMinerz if you use our models__


## Improvements

This model doesn't totally capture rhymes, another method should be needed to prompt for example rhymes and topics


## Contact

For any questions or issues, please contact the repository owner, __RapMinerz__, at [email protected].