Model Performance

On a macro-level, this model performs quite poorly. Do use the other model which takes only text data as input. This model takes a lot of other features.

Model Epoch Learning Rate Grad Norm Training Loss Validation Loss Accuracy F1 Score (Weighted) F1 Score (Macro) Precision (Weighted) Precision (Macro) Recall (Weighted) Recall (Macro)
DeBERTaV3 (Text & Non-text) 15 0.0000100 264.891 0.441 0.494 0.883 0.908 0.604 0.948 0.593 0.883 0.758

How To Use

mkdir data/
cd data/ && wget https://raw.githubusercontent.com/wanadzhar913/bank-transaction-classification/refs/heads/master/data/test.csv -q
import pandas as pd
from datasets import load_dataset, Dataset

import torch
from transformers import AutoTokenizer

from classifier import CustomSequenceClassification

### Load sample dataset ###
test = pd.read_csv('data/test.csv')
df_test = Dataset.from_pandas(test)

cols_to_remove = ['client_id', 'bank_id', 'account_id', 'txn_id', 'txn_date',  'description_stem', 'description_clean_len', 'category']
cols_to_preprocess = sorted(list(set(df_test.column_names) - set(cols_to_remove)))

def transform_row(row):
    """This is a function to extract the text data and convert our non-text features into a list."""

    # Separate description
    description = row["description"]

    # Collect remaining features and convert to a FloatTensor
    extra_data = [row[key] for key in cols_to_preprocess if key != "description"]

    return {
        "description": description,
        "extra_data": extra_data, # for all the extra columns
    }

test_X = df_test.remove_columns(cols_to_remove) \
                .map(transform_row, num_proc=2) \
                .select_columns(['description', 'extra_data'])

### Load Model ###
model = CustomSequenceClassification.from_pretrained(
    "wanadzhar913/debertav3-finetuned-banking-transaction-classification",
    num_labels=33,
    num_extra_dims=44,
)

_ = model.eval().cuda()

tokenizer = AutoTokenizer.from_pretrained('wanadzhar913/debertav3-finetuned-banking-transaction-classification')

### Inference ###
padded = tokenizer(test_X['description'][0], padding = 'longest', return_tensors = 'pt')
for k in padded.keys():
    padded[k] = padded[k].cuda()

with torch.no_grad():
    pred = model(
        **padded,
        extra_data=torch.FloatTensor([test_X['extra_data'][0]]).cuda(),
        return_dict = False
    )
    print(pred[0].argmax(axis = 1).detach().cpu().numpy()[0])
>>> 19 # Restaurants (refer to `config.json`)
Downloads last month
11
Safetensors
Model size
185M params
Tensor type
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for wanadzhar913/debertav3-finetuned-banking-transaction-classification

Finetuned
(299)
this model