Sengil's picture
Update README.md
df0a4f8 verified
|
raw
history blame
4.12 kB
metadata
library_name: transformers
tags:
  - sentiment-analysis
  - aspect-based-sentiment-analysis
  - transformers
  - bert
language:
  - tr
metrics:
  - accuracy
base_model:
  - dbmdz/bert-base-turkish-cased
pipeline_tag: text-classification
datasets:
  - Sengil/Turkish-ABSA-Wsynthetic

Aspect Based Sentiment Analysis with Turkish 🇹🇷 Data

This model performs Aspect-Based Sentiment Analysis (ABSA) 🚀 for Turkish text. It predicts sentiment polarity (Positive, Neutral, Negative) towards specific aspects within a given sentence.


Model Details

Model Description

This model is fine-tuned from the dbmdz/bert-base-turkish-cased pretrained BERT model. It is trained on the Turkish-ABSA-Wsynthetic dataset, which contains Turkish restaurant reviews annotated with aspect-based sentiments. The model is capable of identifying the sentiment polarity for specific aspects (e.g., "servis," "fiyatlar") mentioned in Turkish sentences.

  • Developed by: Sengil
  • Language(s): Turkish 🇹🇷
  • License: Apache-2.0
  • Finetuned from model: dbmdz/bert-base-turkish-cased
  • Number of Labels: 3 (Negative, Neutral, Positive)

Sources


Uses

Direct Use

This model can be used directly for analyzing aspect-specific sentiment in Turkish text, especially in domains like restaurant reviews.

Downstream Use

It can be fine-tuned for similar tasks in different domains (e.g., e-commerce, hotel reviews, or customer feedback analysis).

Out-of-Scope Use

  • Not suitable for tasks unrelated to sentiment analysis or Turkish language.
  • May not perform well on datasets with significantly different domain-specific vocabulary.

Limitations

  • May struggle with rare or ambiguous aspects not covered in the training data.
  • May exhibit biases present in the training dataset.

How to Get Started with the Model

!pip install -U transformers

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Sengil/ABSA_Turkish_BERT_Based_Small")
tokenizer = AutoTokenizer.from_pretrained("Sengil/ABSA_Turkish_BERT_Based_Small")

# Example inference
text = "Servis çok yavaştı ama yemekler lezzetliydi."
aspect = "servis"
formatted_text = f"[CLS] {text} [SEP] {aspect} [SEP]"

inputs = tokenizer(formatted_text, return_tensors="pt", padding="max_length", truncation=True, max_length=128)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=1).item()

# Map prediction to label
labels = {0: "Negative", 1: "Neutral", 2: "Positive"}
print(f"Sentiment for '{aspect}': {labels[predicted_class]}")

Training Details

Training Data

Training Data The model was fine-tuned on the Turkish-ABSA-Wsynthetic.csv dataset. The dataset contains semi-synthetic Turkish sentences annotated for aspect-based sentiment analysis.

  • Training Procedure
  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Batch Size: 16
  • Epochs: 5
  • Max Sequence Length: 128

Evaluation

The model achieved the following scores on the test set:

  • Accuracy: 95.48%
  • F1 Score (Weighted): 95.46%

Citation

@misc{absa_turkish_bert_based_small,
  title={Aspect-Based Sentiment Analysis for Turkish},
  author={Sengil},
  year={2024},
  url={https://huggingface.co/Sengil/ABSA_Turkish_BERT_Based_Small}
}

Model Card Contact

For any questions or issues, please open an issue in the repository or contact LinkedIN.