File size: 1,737 Bytes
5d29828 4fe8de9 c33f042 ae0ce69 a552278 e554aae 4fb40df 0a71205 35d9215 892b749 bc61fa1 35d9215 0508099 b1a8188 0508099 b1a8188 0508099 35d9215 c11444d 35d9215 d84a36e 35d9215 c11444d f782b5b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
license: apache-2.0
language: fa
widget:
- text: "این بود [MASK] های ما؟"
- text: "داداچ داری [MASK] میزنی"
- text: 'به علی [MASK] میگفتن جادوگر'
- text: 'آخه محسن [MASK] هم شد خواننده؟'
- text: 'پسر عجب [MASK] زد'
tags:
- BERTweet
model-index:
- name: BERTweet-FA
results: []
---
What is BERTweet-FA?
---
BERTweet-FA is a transformer-based model trained on 20665964 Persian tweets. The model has been trained on the data only for 1 epoch (322906 steps), and yet it has the ability to recognize the meaning of most of the conversational sentences used in Farsi. Note that the architecture of this model follows the original BERT.
How to use the Model
---
```
from transformers import BertForMaskedLM, BertTokenizer, pipeline
model = BertForMaskedLM.from_pretrained('arm-on/BERTweet-FA')
tokenizer = BertTokenizer.from_pretrained('arm-on/BERTweet-FA')
fill_sentence = pipeline('fill-mask', model=model, tokenizer=tokenizer)
fill_sentence('اینجا جمله مورد نظر خود را بنویسید و کلمه موردنظر را [MASK] کنید')
```
The Training Data
---
The first version of the model was trained on the "[Large Scale Colloquial Persian Dataset](https://iasbs.ac.ir/~ansari/lscp/)" containing more than 20 million tweets in Farsi, gathered by Khojasteh et al., and published on 2020.
Evaluation
---
| Training Loss | Epoch | Step |
|:-------------:|:-----:|:-----:|
| 0.0036 | 1.0 | 322906 |
Contributors
---
- [Arman Malekzadeh](http://ce.sharif.edu/~malekzaadeh/), PhD Student in AI @ Sharif University of Technology | [Linkedin](https://www.linkedin.com/in/arman-malekzadeh/) | [Github](https://github.com/arm-on) |