|
--- |
|
language: |
|
- ar |
|
pipeline_tag: text-generation |
|
--- |
|
# Model Card for Model ID |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Language(s) (NLP):** [Arabic] |
|
|
|
- **Finetuned from model :** [aragpt2-mega](https://huggingface.co/aubmindlab/aragpt2-mega) |
|
|
|
|
|
|
|
## Uses |
|
|
|
|
|
1. pose tagging for arabic language and it may use for other languages |
|
2. The model can be helpful for the arabic langauge students/researchers, since it provide the sentence anaylsis (اعراب الجملة ) in the context. |
|
3. arabic word toknizer |
|
4. it may use for translate the arabic dailects to MSA |
|
|
|
|
|
|
|
|
|
## Main Labels |
|
|
|
{'حرف جر': 'preposition', |
|
'اسم': 'noun', |
|
'اسم علم': 'proper noun', |
|
'لام التعريف': 'determiner', |
|
'صفة': 'adjective', |
|
'ضمير': 'personal pronoun', |
|
'فعل': 'verb', |
|
'حرف عطف': 'conjunction', |
|
'اسم موصول': 'relative pronoun', |
|
'حرف نفي': 'negative particle', |
|
'حروف مقطعة': 'quranic initials', |
|
'اسم اشارة': 'demonstrative pronoun', |
|
'حرف استئنافية': 'resumption', |
|
'حرف نصب': 'accusative particle', |
|
'حرف تسوية': 'equalization particle', |
|
'حرف حال': 'circumstantial particle', |
|
'أداة حصر': 'restriction particle', |
|
'ظرف زمان': 'time adverb', |
|
'حرف نهي': 'prohibition particle', |
|
'حرف كاف': 'preventive particle', |
|
'حرف ابتداء': 'inceptive particle', |
|
'حرف زائد': 'supplemental particle', |
|
'حرف استدراك': 'amendment particle', |
|
'حرف مصدري': 'subordinating conjunction', |
|
'حرف استفهام': 'interrogative particle', |
|
'ظرف مكان': 'location adverb', |
|
'حرف شرط': 'conditional particle', |
|
'لام التوكيد': 'emphatic', |
|
'حرف نداء': 'vocative particle', |
|
'حرف واقع في جواب الشرط': 'result particle', |
|
'حرف تفصيل': 'explanation particle', |
|
'أداة استثناء': 'exceptive particle', |
|
'حرف سببية': 'particle of cause', |
|
'التوكيد - النون الثقيلة': 'heavy noon emphesis', |
|
'حرف استقبال': 'future particle', |
|
'حرف تحقيق': 'particle of certainty', |
|
'لام التعليل': 'purpose', |
|
'حرف جواب': 'answer particle', |
|
'حرف اضراب': 'retraction particle', |
|
'حرف تحضيض': 'exhortation particle', |
|
'حرف تفسير': 'particle of interpretation', |
|
'لام الامر': 'imperative', |
|
'واو المعية': 'comitative particle', |
|
'حرف فجاءة': 'surprise particle', |
|
'حرف ردع': 'aversion particle', |
|
'اسم فعل أمر': 'imperative verbal noun'} |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import GPT2Tokenizer |
|
from pyarabic.araby import strip_diacritics,strip_tatweel |
|
from arabert.aragpt2.grover.modeling_gpt2 import GPT2LMHeadModel |
|
from transformers import pipeline |
|
|
|
model_name='alsubari/aragpt2-mega-pos-msa' |
|
|
|
|
|
tokenizer = GPT2Tokenizer.from_pretrained('alsubari/aragpt2-mega-pos-msa') |
|
model = GPT2LMHeadModel.from_pretrained('alsubari/aragpt2-mega-pos-msa').to("cuda") |
|
|
|
generator = pipeline("text-generation",model=model,tokenizer=tokenizer,device=0) |
|
def generate(text): |
|
prompt = f'<|startoftext|>Instruction: {text}<|pad|>Answer:' |
|
pred_text= generator(prompt, |
|
pad_token_id=tokenizer.eos_token_id, |
|
num_beams=20, |
|
max_length=256, |
|
#min_length = 200, |
|
do_sampling=False, |
|
top_p=0.5, |
|
top_k=1, |
|
repetition_penalty = 3.0, |
|
# temperature=0.8, |
|
no_repeat_ngram_size = 3)[0]['generated_text'] |
|
try: |
|
pred_sentiment = re.findall("Answer:(.*)", pred_text,re.S)[-1] |
|
except: |
|
pred_sentiment = "None" |
|
|
|
return pred_sentiment |
|
text='تعلَّمْ من أخطائِكَ' |
|
generate(strip_tatweel(strip_diacritics(text))) |
|
#' تعلم ( تعلم : فعل ) من ( من : حرف جر ) أخطائك ( اخطاء : اسم ، ك : ضمير )' |
|
``` |
|
|
|
|
|
### Results |
|
|
|
Epoch 1 |
|
Training Loss 0.108500 |
|
Validation Loss 0.082612 |
|
|
|
|
|
|
|
## Model Card Contact |
|
|
|
[[email protected]] |