Model Card for Model ID

Model Details

Model Description

Language(s) (NLP): [Arabic]
Finetuned from model : aragpt2-mega

Uses

pose tagging for arabic language and it may use for other languages
The model can be helpful for the arabic langauge students/researchers, since it provide the sentence anaylsis (اعراب الجملة ) in the context.
arabic word toknizer
it may use for translate the arabic dailects to MSA

Main Labels

{'حرف جر': 'preposition', 'اسم': 'noun', 'اسم علم': 'proper noun', 'لام التعريف': 'determiner', 'صفة': 'adjective', 'ضمير': 'personal pronoun', 'فعل': 'verb', 'حرف عطف': 'conjunction', 'اسم موصول': 'relative pronoun', 'حرف نفي': 'negative particle', 'حروف مقطعة': 'quranic initials', 'اسم اشارة': 'demonstrative pronoun', 'حرف استئنافية': 'resumption', 'حرف نصب': 'accusative particle', 'حرف تسوية': 'equalization particle', 'حرف حال': 'circumstantial particle', 'أداة حصر': 'restriction particle', 'ظرف زمان': 'time adverb', 'حرف نهي': 'prohibition particle', 'حرف كاف': 'preventive particle', 'حرف ابتداء': 'inceptive particle', 'حرف زائد': 'supplemental particle', 'حرف استدراك': 'amendment particle', 'حرف مصدري': 'subordinating conjunction', 'حرف استفهام': 'interrogative particle', 'ظرف مكان': 'location adverb', 'حرف شرط': 'conditional particle', 'لام التوكيد': 'emphatic', 'حرف نداء': 'vocative particle', 'حرف واقع في جواب الشرط': 'result particle', 'حرف تفصيل': 'explanation particle', 'أداة استثناء': 'exceptive particle', 'حرف سببية': 'particle of cause', 'التوكيد - النون الثقيلة': 'heavy noon emphesis', 'حرف استقبال': 'future particle', 'حرف تحقيق': 'particle of certainty', 'لام التعليل': 'purpose', 'حرف جواب': 'answer particle', 'حرف اضراب': 'retraction particle', 'حرف تحضيض': 'exhortation particle', 'حرف تفسير': 'particle of interpretation', 'لام الامر': 'imperative', 'واو المعية': 'comitative particle', 'حرف فجاءة': 'surprise particle', 'حرف ردع': 'aversion particle', 'اسم فعل أمر': 'imperative verbal noun'}

How to Get Started with the Model

from transformers import GPT2Tokenizer 
from pyarabic.araby import strip_diacritics,strip_tatweel
from arabert.aragpt2.grover.modeling_gpt2 import GPT2LMHeadModel
from transformers import pipeline
import re
model_name='alsubari/aragpt2-mega-pos-msa'


tokenizer = GPT2Tokenizer.from_pretrained('alsubari/aragpt2-mega-pos-msa')
model = GPT2LMHeadModel.from_pretrained('alsubari/aragpt2-mega-pos-msa').to("cuda")

generator = pipeline("text-generation",model=model,tokenizer=tokenizer,device=0)
def generate(text):
    prompt = f'<|startoftext|>Instruction: {text}<|pad|>Answer:'    
    pred_text=  generator(prompt,
      pad_token_id=tokenizer.eos_token_id,
      num_beams=20, 
      max_length=256,
      #min_length = 200,
      do_sample=False,
      top_p=0.5,
      top_k=1,
      repetition_penalty = 3.0,
      # temperature=0.8,
      no_repeat_ngram_size = 3)[0]['generated_text']
    try:
        pred_sentiment = re.findall("Answer:(.*)", pred_text,re.S)[-1]
    except:
        pred_sentiment = "None"   

    return pred_sentiment
text='تعلَّمْ من أخطائِكَ'
generate(strip_tatweel(strip_diacritics(text)))
#' تعلم ( تعلم : فعل ) من ( من : حرف جر ) أخطائك ( اخطاء : اسم ، ك : ضمير )'

Results

Epoch 1 Training Loss 0.108500 Validation Loss 0.082612