Update README.md
Browse files
README.md
CHANGED
@@ -9,23 +9,21 @@ license: mit
|
|
9 |
metrics:
|
10 |
- accuracy
|
11 |
---
|
12 |
-
#
|
13 |
|
14 |
-
|
15 |
|
16 |
-
The model is
|
17 |
|
18 |
-
## Training corpora
|
19 |
|
20 |
-
|
21 |
|
|
|
22 |
|
23 |
-
With the Tokenizers library, I created a 52K BPE vocab based on the training corpus.
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
-
https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars
|
29 |
|
30 |
|
31 |
|
@@ -35,16 +33,17 @@ The model itself can be used in this way:
|
|
35 |
|
36 |
``` python
|
37 |
from transformers import AutoTokenizer, AutoModelWithLMHead
|
38 |
-
tokenizer = AutoTokenizer.from_pretrained("ahmet1338/
|
39 |
-
model = AutoModelWithLMHead.from_pretrained("ahmet1338/
|
40 |
```
|
41 |
|
42 |
-
|
|
|
43 |
|
44 |
``` python
|
45 |
from transformers import pipeline
|
46 |
-
pipe = pipeline('text-generation', model="ahmet1338/
|
47 |
-
tokenizer="ahmet1338/
|
48 |
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
|
49 |
print(text)
|
50 |
```
|
|
|
9 |
metrics:
|
10 |
- accuracy
|
11 |
---
|
12 |
+
# Turkish GPT-2 Model (Experimental)
|
13 |
|
14 |
+
I've made available a GPT-2 model for Turkish that I trained on a variety of texts.
|
15 |
|
16 |
+
The model is intended to serve as a starting point for text-specific adjustments.
|
17 |
|
|
|
18 |
|
19 |
+
## Training Source
|
20 |
|
21 |
+
I used a Turkish corpus that is taken from different written and oral sources.
|
22 |
|
|
|
23 |
|
24 |
+
I developed a LLM model with 50k vocabulary using the Custom Tokenizers library using the training resources.
|
25 |
|
26 |
+
I could train the GPT-2 for Turkish using the entire training corpus (ten epochs) after developing the vocabulary.
|
|
|
27 |
|
28 |
|
29 |
|
|
|
33 |
|
34 |
``` python
|
35 |
from transformers import AutoTokenizer, AutoModelWithLMHead
|
36 |
+
tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt-2-experimental")
|
37 |
+
model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt-2-experimental")
|
38 |
```
|
39 |
|
40 |
+
|
41 |
+
To generating text, we can use these lines of code which is Transformers Pipelines:
|
42 |
|
43 |
``` python
|
44 |
from transformers import pipeline
|
45 |
+
pipe = pipeline('text-generation', model="ahmet1338/gpt-2-experimental",
|
46 |
+
tokenizer="ahmet1338/gpt-2-experimental", config={'max_length':800})
|
47 |
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
|
48 |
print(text)
|
49 |
```
|