FinBERT Sentiment Analysis on English/Quotes Dataset

📌 Overview

This repository hosts the FinBERT model fine-tuned for sentiment analysis using the English/Quotes dataset. The model classifies text into sentiment categories such as positive, negative, or neutral.

🏗 Model Details

Model Architecture: FinBERT (BERT-based model for sentiment analysis)
Task: Sentiment Analysis
Dataset: English/quotes dataset
Fine-tuning Framework: Hugging Face Transformers

🚀 Usage

Installation

pip install transformers torch

Loading the Model

from transformers import BertTokenizer, BertForSequenceClassification
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = "Aventiq-AI/finbert-english/quotes"
model = BertForSequenceClassification.from_pretrained(model_name).to(device)
tokenizer = BertTokenizer.from_pretrained(model_name)

Sentiment Classification Inference

def predict_sentiment(text):
    inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt")
    inputs = {key: val.to(device) for key, val in inputs.items()}  # Move inputs to device
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    prediction = torch.argmax(logits, dim=-1).item()
    label_map = {0: "negative", 1: "neutral", 2: "positive"}
    return label_map[prediction]
 
# Test on the original 5 quotes
original_quotes = [
    "“Be yourself; everyone else is already taken.”",
    "“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”",
    "“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”",
    "“So many books, so little time.”",
    "“A room without books is like a body without a soul.”"
]
 
print("Predictions for original quotes:")
for quote in original_quotes:
    pred = predict_sentiment(quote)
    print(f"Quote: {quote}\nPredicted Sentiment: {pred}\n")
 
# Test on a new example
new_quote = "Life is beautiful when you smile."
print("Prediction for new quote:")
print(f"Quote: {new_quote}\nPredicted Sentiment: {predict_sentiment(new_quote)}")

📊 Evaluation Metric: Accuracy & F1 Score

For sentiment analysis, accuracy and F1-score are key evaluation metrics. The model achieves:

Accuracy: 88%
F1 Score: 0.85

📂 Repository Structure

.
├── model/               # Contains the fine-tuned model files
├── tokenizer_config/    # Tokenizer configuration and vocabulary files
├── model.safetensors/   # Model weights
├── README.md            # Model documentation

⚠️ Limitations

The model may struggle with ambiguous phrases.
Performance might vary across different jurisdictions and terminologies.
The dataset primarily contains English text, making it less effective for multilingual applications.

🤝 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.