|
--- |
|
language: tr |
|
tags: |
|
- turkish |
|
- tr |
|
- gpt2-tr |
|
- gpt2-turkish |
|
license: mit |
|
metrics: |
|
- accuracy |
|
--- |
|
# Turkish GPT-2 Model (Experimental) |
|
|
|
I've made available a GPT-2 model for Turkish that I trained on a variety of texts. |
|
|
|
The model is intended to serve as a starting point for text-specific adjustments. |
|
|
|
|
|
## Training Source |
|
|
|
I used a Turkish corpus that is taken from different written and oral sources. |
|
|
|
|
|
I developed a LLM model with 50k vocabulary using the Custom Tokenizers library using the training resources. |
|
|
|
I could train the GPT-2 for Turkish using the entire training corpus (ten epochs) after developing the vocabulary. |
|
|
|
|
|
|
|
## Using the model |
|
|
|
The model itself can be used in this way: |
|
|
|
``` python |
|
from transformers import AutoTokenizer, AutoModelWithLMHead |
|
tokenizer = AutoTokenizer.from_pretrained("ahmet1338/gpt-2-experimental") |
|
model = AutoModelWithLMHead.from_pretrained("ahmet1338/gpt-2-experimental") |
|
``` |
|
|
|
|
|
To generating text, we can use these lines of code which is Transformers Pipelines: |
|
|
|
``` python |
|
from transformers import pipeline |
|
pipe = pipeline('text-generation', model="ahmet1338/gpt-2-experimental", |
|
tokenizer="ahmet1338/gpt-2-experimental", config={'max_length':800}) |
|
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"] |
|
print(text) |
|
``` |
|
|
|
### How to clone the model repo? |
|
``` |
|
git lfs install |
|
git clone https://huggingface.co/ahmet1338/gpt-2-experimential |
|
``` |
|
|
|
|