File size: 2,477 Bytes
07f569f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e37810c
 
 
07f569f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
tags:
- wangchanberta
- sentiment-analysis
- thai
- simpletransformers
---

# WangchanBERTa Base for Sentiment Analysis

This is a fine-tuned version of the [WangchanBERTa](https://huggingface.co/airesearch/wangchanberta-base-att-spm-uncased) model, trained for **sentiment analysis** in Thai language using `simpletransformers`.

## Model Details

- **Model Name**: WangchanBERTa Base Sentiment Analysis
- **Pretrained Base Model**: `airesearch/wangchanberta-base-att-spm-uncased`
- **Architecture**: CamemBERT
- **Language**: Thai
- **Task**: Sentiment Classification

## Training Configuration

- **Training Dataset**: (e.g., your dataset name or a public dataset if applicable)
- **Number of Training Epochs**: 6
- **Train Batch Size**: 16
- **Eval Batch Size**: 32
- **Learning Rate**: 2e-5
- **Optimizer**: AdamW
- **Scheduler**: Cosine
- **Gradient Accumulation Steps**: 2
- **Seed**: 42
- **Training Framework**: `simpletransformers`
- **FP16**: Disabled

## Model Performance

Provide any performance metrics here, such as accuracy, F1-score, etc., depending on your dataset.

## Usage

To use this model, you can load it as follows:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
import numpy as np
from pythainlp.tokenize import word_tokenize

tokenizer = AutoTokenizer.from_pretrained("Pongsathorn/wangchanberta-base-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("Pongsathorn/wangchanberta-base-sentiment")

id2label = {
    0: "pos", 
    1: "neu", 
    2: "neg",  
}

input_text = "พนักงานบริการดีมาก สัญญาณก็ดี แต่ร้านอยู่ที่ไหน อยากได้ข้อมูลเพิ่มเติม จะได้ประกาศบนเว็บถูก"  

segmented_text = word_tokenize(input_text, engine="longest")

preprocessed_text = " ".join(segmented_text)

inputs = tokenizer(preprocessed_text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

probs = F.softmax(logits, dim=-1)

predicted_class = torch.argmax(probs, dim=-1).item()

predicted_label = id2label[predicted_class]

print("Predicted Label (ID):", predicted_class)
print("Predicted Label (Description):", predicted_label)
max_prob = np.max(probs.numpy())
print(f"Maximum Probability: {max_prob:.4f}")