File size: 2,477 Bytes
07f569f e37810c 07f569f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
tags:
- wangchanberta
- sentiment-analysis
- thai
- simpletransformers
---
# WangchanBERTa Base for Sentiment Analysis
This is a fine-tuned version of the [WangchanBERTa](https://huggingface.co/airesearch/wangchanberta-base-att-spm-uncased) model, trained for **sentiment analysis** in Thai language using `simpletransformers`.
## Model Details
- **Model Name**: WangchanBERTa Base Sentiment Analysis
- **Pretrained Base Model**: `airesearch/wangchanberta-base-att-spm-uncased`
- **Architecture**: CamemBERT
- **Language**: Thai
- **Task**: Sentiment Classification
## Training Configuration
- **Training Dataset**: (e.g., your dataset name or a public dataset if applicable)
- **Number of Training Epochs**: 6
- **Train Batch Size**: 16
- **Eval Batch Size**: 32
- **Learning Rate**: 2e-5
- **Optimizer**: AdamW
- **Scheduler**: Cosine
- **Gradient Accumulation Steps**: 2
- **Seed**: 42
- **Training Framework**: `simpletransformers`
- **FP16**: Disabled
## Model Performance
Provide any performance metrics here, such as accuracy, F1-score, etc., depending on your dataset.
## Usage
To use this model, you can load it as follows:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
import numpy as np
from pythainlp.tokenize import word_tokenize
tokenizer = AutoTokenizer.from_pretrained("Pongsathorn/wangchanberta-base-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("Pongsathorn/wangchanberta-base-sentiment")
id2label = {
0: "pos",
1: "neu",
2: "neg",
}
input_text = "พนักงานบริการดีมาก สัญญาณก็ดี แต่ร้านอยู่ที่ไหน อยากได้ข้อมูลเพิ่มเติม จะได้ประกาศบนเว็บถูก"
segmented_text = word_tokenize(input_text, engine="longest")
preprocessed_text = " ".join(segmented_text)
inputs = tokenizer(preprocessed_text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = F.softmax(logits, dim=-1)
predicted_class = torch.argmax(probs, dim=-1).item()
predicted_label = id2label[predicted_class]
print("Predicted Label (ID):", predicted_class)
print("Predicted Label (Description):", predicted_label)
max_prob = np.max(probs.numpy())
print(f"Maximum Probability: {max_prob:.4f}")
|