Model overview

This model is the baseline model for awesome-japanese-nlp-classification-dataset. It was trained on this dataset, saved using the development data, and evaluated using the test data. The following table shows the evaluation results.

Label Precision Recall F1-Score Support
0 0.98 0.99 0.98 796
1 0.79 0.70 0.74 60
Accuracy 0.97 856
Macro Avg 0.89 0.84 0.86 856
Weighted Avg 0.96 0.97 0.97 856

Usage

Please install the following library.

pip install transformers

You can easily use a classification model with the pipeline method.

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="taishi-i/awesome-japanese-nlp-classification-model",
)

# Relevant sample
text = "ディープラーニングによる自然言語処理(共立出版)のサポートページです"
label = pipe(text)
print(label) # [{'label': '1', 'score': 0.9910495281219482}]

# Not Relevant sample
text = "AIイラストを管理するデスクトップアプリ"
label = pipe(text)
print(label) # [{'label': '0', 'score': 0.9986791014671326}]

Evaluation

Please install the following library.

pip install evaluate scikit-learn datasets transformers torch
import evaluate
from datasets import load_dataset
from sklearn.metrics import classification_report
from transformers import pipeline

# Evaluation dataset
dataset = load_dataset("taishi-i/awesome-japanese-nlp-classification-dataset")

# Text classification model
pipe = pipeline(
    "text-classification",
    model="taishi-i/awesome-japanese-nlp-classification-model",
)

# Evaluation metric
f1 = evaluate.load("f1")

# Predict process
predicted_labels = []
for text in dataset["test"]["text"]:
    prediction = pipe(text)
    predicted_label = prediction[0]["label"]
    predicted_labels.append(int(predicted_label))

score = f1.compute(
    predictions=predicted_labels, references=dataset["test"]["label"]
)
print(score)

report = classification_report(
    y_true=dataset["test"]["label"], y_pred=predicted_labels
)
print(report)

License

This model was trained from a dataset collected from the GitHub API under GitHub Acceptable Use Policies - 7. Information Usage Restrictions and GitHub Terms of Service - H. API Terms. It should be used solely for research verification purposes. Adhering to GitHub's regulations is mandatory.

Downloads last month
14
Safetensors
Model size
178M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for taishi-i/awesome-japanese-nlp-classification-model

Finetunes
1 model

Dataset used to train taishi-i/awesome-japanese-nlp-classification-model