philipp-zettl's picture
Upload folder using huggingface_hub
79c7b39 verified
---
license: mit
language: multilingual
library_name: torch
tags: []
base_model: BAAI/bge-m3
datasets:
- philipp-zettl/GGU-xx
- philipp-zettl/sentiment
metrics:
- accuracy
- precision
- recall
- f1-score
model_name: Multi-Head Sequence Classification Model
pipeline_tag: text-classification
widget:
- text: "Hello, how are you?"
label: "[GGU] Greeting"
- text: "Thank you for your help"
label: "[GGU] Gratitude"
- text: "Hallo, wie geht es dir?"
label: "[GGU] Greeting (de)"
- text: "Danke dir."
label: "[GGU] Gratitude (de)"
- text: "I am not sure what you mean"
label: "[GGU] Other"
- text: "Generate me an image of a dog!"
label: "[GGU] Other"
- text: "What is the weather like today?"
label: "[GGU] Other"
- text: "Wie ist das Wetter heute?"
label: "[GGU] Other (de)"
---
# Multi-Head Sequence Classification Model
## Model description
The model is a simple sequence classification model based on hidden output layers of a pre-trained transformer model. Multiple heads are added to the output of the backbone to classify the input sequence.
### Model architecture
The model is a simple sequence classification model based on hidden output layers of a pre-trained transformer model.
The backbone of the model is BAAI/bge-m3 with 1024 output dimensions.
An additional layer of (GGU: 3, sentiment: 3) is added to the output of the backbone to classify the input sequence.
You can find a mapping for the labels here:
**GGU**
- 0: Greeting
- 1: Gratitude
- 2: Other
**sentiment**
- 0: Positive
- 1: Negative
- 2: Neutral
The joint architecture was trained using the provided implementation (in repository) of `MultiHeadClassificationTrainer`.
### Use cases
Use cases: text classification, sentiment analysis.
## Model Inference
Inference code:
```python
from transformers import AutoModel, AutoTokenizer
from .model import MultiHeadSequenceClassificationModel
import torch
model = MultiHeadSequenceClassificationModel.from_pretrained('philipp-zettl/multi-head-sequence-classification-model')
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-m3')
def predict(text):
inputs = tokenizer([text], return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
return outputs
```
## Model Training
#### Confusion Matrix
**GGU**
![Confusion Matrix GGU](assets/confusion_matrix_GGU.png)
**sentiment**
![Confusion Matrix sentiment](assets/confusion_matrix_sentiment.png)
#### Training Loss
**GGU**
![Loss GGU](assets/loss_plot_GGU.png)
**sentiment**
![Loss sentiment](assets/loss_plot_sentiment.png)
### Training data
The model has been trained on the following datasets:
- [philipp-zettl/GGU-xx](https://huggingface.co/datasets/philipp-zettl/GGU-xx)
- [philipp-zettl/sentiment](https://huggingface.co/datasets/philipp-zettl/sentiment)
Using the implementation provided by MultiHeadClassificationTrainer
### Training procedure
The following code has been executed to train the model:
```python
def train_classifier():
backbone = AutoModel.from_pretrained('BAAI/bge-m3').to(torch.float16)
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-m3')
device = 'cuda' if torch.cuda.is_available() else 'cpu'
ggu_label_map = {
0: 'Greeting',
1: 'Gratitude',
2: 'Other'
}
sentiment_label_map = {
0: 'Positive',
1: 'Negative',
2: 'Neutral'
}
num_labels = len(ggu_label_map.keys())
# HParams
dropout = 0.25
learning_rate = 3e-5
momentum = 0.9
l2_reg = 0.25
l2_loss_weight = 0.25
model_conf = {
'backbone': backbone,
'head_config': {
'GGU': num_labels,
},
'dropout': dropout,
'l2_reg': l2_reg,
}
optimizer_conf = {
'lr': learning_rate,
'momentum': momentum
}
scheduler_conf = {
'factor': 0.2,
'patience': 3,
'min_lr': 1e-8
}
train_run = 1000
trainer = MultiHeadClassificationTrainer(
model_conf=model_conf,
optimizer_conf={**optimizer_conf, 'lr': 1e-4},
scheduler_conf=scheduler_conf,
num_epochs=35,
l2_loss_weight=l2_loss_weight,
use_lr_scheduler=True,
train_run=train_run,
auto_find_batch_size=False
)
new_model, history = trainer.train(dataset_name='philipp-zettl/GGU-xx', target_heads=['GGU'])
metrics = history['metrics']
history['loss_plot'] = trainer._plot_history(**metrics)
res = trainer.eval({'GGU': ggu_label_map})
history['evaluation'] = res['GGU']
total_history = {
'GGU': deepcopy(history),
}
trainer.classifier.add_head('sentiment', 3)
trainer.auto_find_batch_size = False
new_model, history = trainer.train(dataset_name='philipp-zettl/sentiment', target_heads=['sentiment'], sample_key='text', num_epochs=10, lr=1e-4)
metrics = history['metrics']
history['loss_plot'] = trainer._plot_history(**metrics)
res = trainer.eval({'sentiment': sentiment_label_map}, sample_key='text')
history['evaluation'] = res['sentiment']
total_history['sentiment'] = deepcopy(history)
label_maps = {
'GGU': ggu_label_map,
'sentiment': sentiment_label_map,
}
return new_model, total_history, trainer, label_maps
```
### Evaluation
### Evaluation data
For model evaluation, a 20% validation split was used from the training data.
### Evaluation procedure
The model was evaluated using the `eval` method provided by the `MultiHeadClassificationTrainer` class:
```python
def _eval_model(self, dataloader, label_map, sample_key, label_key):
self.classifier.train(False)
eval_heads = list(label_map.keys())
y_pred = {h: [] for h in eval_heads}
y_test = {h: [] for h in eval_heads}
for sample in tqdm(dataloader, total=len(dataloader), desc='Evaluating model...'):
labels = {name: sample[label_key] for name in eval_heads}
embeddings = BatchEncoding({k: torch.stack(v, dim=1).to(self.device) for k, v in sample.items() if k not in [label_key, sample_key]})
output = self.classifier(embeddings.to('cuda'), head_names=eval_heads)
for head in eval_heads:
y_pred[head].extend(output[head].argmax(dim=1).cpu())
y_test[head].extend(labels[head])
torch.cuda.empty_cache()
accuracies = {h: accuracy_score(y_test[h], y_pred[h]) for h in eval_heads}
f1_scores = {h: f1_score(y_test[h], y_pred[h], average="macro") for h in eval_heads}
recalls = {h: recall_score(y_test[h], y_pred[h], average='macro') for h in eval_heads}
report = {}
for head in eval_heads:
cm = confusion_matrix(y_test[head], y_pred[head], labels=list(label_map[head].keys()))
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=list(label_map[head].values()))
clf_report = classification_report(
y_test[head], y_pred[head], output_dict=True, target_names=list(label_map[head].values())
)
del clf_report["accuracy"]
clf_report = pd.DataFrame(clf_report).T.reset_index()
report[head] = dict(
clf_report=clf_report, confusion_matrix=disp, metrics={'accuracy': accuracies[head], 'f1': f1_scores[head], 'recall': recalls[head]}
)
return report
```
### Metrics
For evaluation, we used the following metrics: accuracy, precision, recall, f1-score. You can find a detailed classification report here:
**GGU:**
| | index | precision | recall | f1-score | support |
|---:|:-------------|------------:|---------:|-----------:|----------:|
| 0 | Greeting | 0.904762 | 0.974359 | 0.938272 | 39 |
| 1 | Gratitude | 0.958333 | 0.851852 | 0.901961 | 27 |
| 2 | Other | 1 | 1 | 1 | 39 |
| 3 | macro avg | 0.954365 | 0.94207 | 0.946744 | 105 |
| 4 | weighted avg | 0.953912 | 0.952381 | 0.951862 | 105 |
**sentiment:**
| | index | precision | recall | f1-score | support |
|---:|:-------------|------------:|---------:|-----------:|----------:|
| 0 | Positive | 0.783088 | 0.861878 | 0.820596 | 12851 |
| 1 | Negative | 0.802105 | 0.819524 | 0.810721 | 14229 |
| 2 | Neutral | 0.7874 | 0.6913 | 0.736227 | 13126 |
| 3 | macro avg | 0.790864 | 0.790901 | 0.789181 | 40206 |
| 4 | weighted avg | 0.791226 | 0.7912 | 0.789557 | 40206 |