language: - tr tags: - translation license: apache-2.0

About the model

It has been trained with 15451 real job advertisement data.

Included classes;

  • Uygun İlan
  • Is Ilani Degil
  • Mustehcen
  • Cift Pozisyon

Accordingly, the success rates in education are as follows;

  • Model is Turkish bert-based.
  • Used StratifiedKFold(5) for validation.
  • results [0.806858621805241, 0.8912621359223301, 0.9440129449838188, 0.9750809061488673, 0.9851132686084142]

Mean-Precision: 0.9204655754937342

Uygun İlan Is Ilani Degil Mustehcen Cift Pozisyon
Precision 0.986 0.996 0.966 0.970
Recall 0.992 0.986 0.966 0.959
F1 Score 0.989 0.991 0.966 0.965
Accuracy : 0.975

Example

!IMPORTANT_HINT: The sentence given to pipe must not contain Turkish characters.

from transformers import AutoTokenizer, TextClassificationPipeline, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("nanelimon/bert-base-turkish-job-advertisement")
model = AutoModelForSequenceClassification.from_pretrained("nanelimon/bert-base-turkish-job-advertisement")
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer)


def set_sentence(sentence: str):
    result = sentence.lower().replace('ö', 'o').replace('ı', 'i').replace('ü', 'u').replace('ç', 'c').replace('ğ', 'g').replace('ş', 's')
    return result


print(pipe(set_sentence('Fiziği düzgün 17 yaş kızlar aranıyor')))

Result;

output: [{'label': 'Mustehcen', 'score': 0.9992677569389343}]
  • label= It shows which class the sent Turkish text belongs to according to the model.
  • score= It shows the compliance rate of the Turkish text sent to the label found.

Authors

License

apache-2.0

Free Software, Hell Yeah!

Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.