gpt2-arnavaz-beta / README.md
rezalatifi's picture
Config Files, Model, Tokenizer and README added.
237509d
|
raw
history blame
1.96 kB
---
language: fa
license: apache-2.0
tags:
- Farsi
---
# Arnavāz (ارنواز)
**Model Description:** Arnavaz/gpt-arnavaz-beta is gpt2 language model that is fine-tuned using [bolbolzaban/gpt2-persian](https://huggingface.co/bolbolzaban/gpt2-persian) pretrained model.
[bolbolzaban/gpt2-persian](https://huggingface.co/bolbolzaban/gpt2-persian) has been trained similar to [gpt2-medium](https://huggingface.co/gpt2-medium) with differences in context size, tokenizer and language [(Read more)](https://medium.com/@khashei/a-not-so-dangerous-ai-in-the-persian-language-39172a641c84).
- **Developed by:** [Rezā Latifi](https://rezalatifi.ir)
- **Model Type:** Transformer-based language model
- **Language:** Persian (All characters other than the Persian alphabet are replaced with special tokens)
- **License:** [Modified MIT License](https://github.com/openai/gpt-2/blob/master/LICENSE)
- **Related Models:** [bolbolzaban/gpt2-persian](https://huggingface.co/bolbolzaban/gpt2-persian), [gpt2-medium](https://huggingface.co/gpt2-medium)
- **Resources for more information:**
- [Arnavaz Website](https://openai.com/blog/better-language-models/)
## How to utilize
Using a pipeline for text generation, Arnavaz can be utilized like this:
```python
from transformers import pipeline, AutoTokenizer, GPT2LMHeadModel, AutoConfig
tokenizer = AutoTokenizer.from_pretrained('Arnavaz/gpt2-arnavaz-beta')
model = GPT2LMHeadModel.from_pretrained('Arnavaz/gpt2-arnavaz-beta')
config = AutoConfig.from_pretrained('Arnavaz/gpt2-arnavaz-beta', max_length=512)
generator = pipeline('text-generation', model, tokenizer=tokenizer, config=config)
def getEloquent(ineloquent):
result = generator(f"[BOS]{ineloquent}[SEP]")[0]['generated_text']
return result[result.find('[SEP]')+5:]
sample = getEloquent('استفاده از کاغذ پاپیروس برای نوشتن کتاب از حدود دو هزار سال قبل از میلاد در مصر رایج شد.')
```