--- license: other datasets: - taishi-i/awesome-japanese-nlp-classification-dataset language: - en - ja metrics: - f1 library_name: transformers pipeline_tag: text-classification --- # Model overview This model is the baseline model for [awesome-japanese-nlp-classification-dataset](https://huggingface.co/datasets/taishi-i/awesome-japanese-nlp-classification-dataset). It was trained on this dataset, saved using the development data, and evaluated using the test data. The following table shows the evaluation results. | Label | Precision | Recall | F1-Score | Support | |--------------|-----------|--------|----------|---------| | 0 | 0.98 | 0.99 | 0.98 | 796 | | 1 | 0.79 | 0.70 | **0.74** | 60 | | Accuracy | | | 0.97 | 856 | | Macro Avg | 0.89 | 0.84 | 0.86 | 856 | | Weighted Avg | 0.96 | 0.97 | 0.97 | 856 | # Usage Please install the following library. ```bash pip install transformers ``` You can easily use a classification model with the pipeline method. ```python from transformers import pipeline pipe = pipeline( "text-classification", model="taishi-i/awesome-japanese-nlp-classification-model", ) # Relevant sample text = "ディープラーニングによる自然言語処理(共立出版)のサポートページです" label = pipe(text) print(label) # [{'label': '1', 'score': 0.9910495281219482}] # Not Relevant sample text = "AIイラストを管理するデスクトップアプリ" label = pipe(text) print(label) # [{'label': '0', 'score': 0.9986791014671326}] ``` # Evaluation Please install the following library. ```bash pip install evaluate scikit-learn datasets transformers torch ``` ```python import evaluate from datasets import load_dataset from sklearn.metrics import classification_report from transformers import pipeline # Evaluation dataset dataset = load_dataset("taishi-i/awesome-japanese-nlp-classification-dataset") # Text classification model pipe = pipeline( "text-classification", model="taishi-i/awesome-japanese-nlp-classification-model", ) # Evaluation metric f1 = evaluate.load("f1") # Predict process predicted_labels = [] for text in dataset["test"]["text"]: prediction = pipe(text) predicted_label = prediction[0]["label"] predicted_labels.append(int(predicted_label)) score = f1.compute( predictions=predicted_labels, references=dataset["test"]["label"] ) print(score) report = classification_report( y_true=dataset["test"]["label"], y_pred=predicted_labels ) print(report) ``` # License This model was trained from a dataset collected from the GitHub API under [GitHub Acceptable Use Policies - 7. Information Usage Restrictions](https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies#7-information-usage-restrictions) and [GitHub Terms of Service - H. API Terms](https://docs.github.com/en/site-policy/github-terms/github-terms-of-service#h-api-terms). It should be used solely for research verification purposes. Adhering to GitHub's regulations is mandatory.