File size: 3,508 Bytes
4e074ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f3b6ca7
4e074ab
 
 
 
 
 
 
6f7ecb6
 
 
4e074ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f7ecb6
 
 
 
 
 
 
 
 
 
4e074ab
 
6f7ecb6
 
 
 
 
 
 
 
 
 
4e074ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f7ecb6
4e074ab
 
6f7ecb6
4e074ab
 
 
 
 
 
 
 
 
 
 
 
 
 
d6ee561
 
4e074ab
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
language:
- fr
tags:
- music
- rap
- lyrics
- word2vec
library_name: gensim
---
# Word2Bezbar: Word2Vec Models for French Rap Lyrics

## Overview

__Word2Bezbar__ are __Word2Vec__ models trained on __french rap lyrics__ sourced from __Genius__. Tokenization has been done using __NLTK__ french `word_tokenze` function, with a prior processing to remove __french oral contractions__. Used dataset size was __323MB__, corresponding to __77M tokens__.

The model captures the __semantic relationships__ between words in the context of __french rap__, providing a useful tool for studies associated to __french slang__ and __music lyrics analysis__.

## Model Details

Size of this model is __large__

| Parameter      | Value        |
|----------------|--------------|
| Dimensionality | 300          |
| Window Size    | 10           |
| Epochs         | 20           |
| Algorithm      | CBOW         |

## Versions

This model has been trained with the followed software versions

| Requirement    | Version      |
|----------------|--------------|
| Python         | 3.8.5        |
| Gensim library | 4.3.2        |
| NTLK library   | 3.8.1        |

## Installation

1. **Install Required Python Libraries**:

    ```bash
    pip install gensim
    ```

2. **Clone the Repository**:

    ```bash
    git clone https://github.com/rapminerz/Word2Bezbar-large.git
    ```

3. **Navigate to the Model Directory**:

    ```bash
    cd Word2Bezbar-large
    ```
    
## Loading the Model

To load the Word2Bezbar Word2Vec model, use the following Python code:

```python
import gensim

# Load the Word2Vec model
model = gensim.models.Word2Vec.load("word2vec.model")
```

## Using the Model

Once the model is loaded, you can use it as shown:

1. **To get the most similary words regarding a word**

```python
model.wv.most_similar("bendo")
[('binks', 0.7082766890525818),
 ('bando', 0.684855043888092),
 ('tieks', 0.664956271648407),
 ('hall', 0.6226587295532227),
 ('ghetto', 0.6097022294998169),
 ('barrio', 0.5864858627319336),
 ('hood', 0.5714126229286194),
 ('block', 0.5666197538375854),
 ('quartier', 0.557117760181427),
 ('bloc', 0.5540688037872314)]

model.wv.most_similar("kichta")
[('liasse', 0.7318882942199707),
 ('sse-lia', 0.7186722755432129),
 ('kishta', 0.6604368686676025),
 ('kich', 0.6188479661941528),
 ('moula', 0.570914626121521),
 ('sacoche', 0.553415834903717),
 ('skalape', 0.5243070125579834),
 ('Kichta', 0.49806657433509827),
 ('ppe-fra', 0.49229520559310913),
 ('valise', 0.49089524149894714)]
```

2. **To find the word that doesn't match in a list of words**

```python
model.wv.doesnt_match(["racli","gow","gadji","fimbi","boug"])
'boug'

model.wv.doesnt_match(["Zidane","Mbappé","Ronaldo","Messi","Jordan"])
'Jordan'
```

3. **To find the similarity between two words**

```python
model.wv.similarity("kichta", "moula")
0.57091457

model.wv.similarity("bonheur", "moula")
0.09769239
```

4. **Or even get the vector representation of a word**

```python
model.wv['ekip']
array([ 1.4757039e-01,  ... 1.1260221e+00],
      dtype=float32)
```

## Purpose and Disclaimer

This model is designed for academic and research purposes only. It is not intended for commercial use. The creators of this model do not endorse or promote any specific views or opinions that may be represented in the dataset.

__Please mention @RapMinerz if you use our models__

## Contact

For any questions or issues, please contact the repository owner, __RapMinerz__, at [email protected].