ArGTClass is a bloomz based classification model, finetuned to categorize a comprehensive spectrum of fourteen distinct subjects that are Religion, Finance and Economics, Politics, Medical, Cul- ture, Sports, Science and Technology, Anthro- pology and Sociology, Art and Literature, Edu- cation, History, Language and Linguistics, Law, as well as Philosophy in Arabic.

For more details, check out our paper

Finetuning code in the following notebook: Open In Colab

Full classification example (CPU)

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass")

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

inputs = tokenizer(text, return_tensors= 'pt')
outputs = model(**inputs)
ind = outputs.logits.argmax(dim=-1)[0]
predicted_class = model.config.id2label[ind.item()]

Full classification example (GPU)

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto')

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

inputs = tokenizer(text, return_tensors= 'pt').to("cuda")
outputs = model(**inputs)
ind = outputs.logits.argmax(dim=-1)[0]
predicted_class = model.config.id2label[ind.item()]

Pipeline example (CPU & GPU)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto')

classifier = pipeline("text-classification", model=model, tokenizer= tokenizer)

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

classifier(text)
Downloads last month
34
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Datasets used to train dru-ac/ArGTC