File size: 3,223 Bytes

# FinBERT Sentiment Analysis on English/Quotes Dataset

## 📌 Overview

This repository hosts the FinBERT model fine-tuned for sentiment analysis using the English/Quotes dataset. The model classifies text into sentiment categories such as positive, negative, or neutral. 

## 🏗 Model Details

- **Model Architecture:** FinBERT (BERT-based model for sentiment analysis)
- **Task:** Sentiment Analysis
- **Dataset:** English/quotes dataset
- **Fine-tuning Framework:** Hugging Face Transformers

## 🚀 Usage

### Installation

```bash
pip install transformers torch
```

### Loading the Model

```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = "Aventiq-AI/finbert-english/quotes"
model = BertForSequenceClassification.from_pretrained(model_name).to(device)
tokenizer = BertTokenizer.from_pretrained(model_name)
```

### Sentiment Classification Inference

```python
def predict_sentiment(text):
    inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt")
    inputs = {key: val.to(device) for key, val in inputs.items()}  # Move inputs to device
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    prediction = torch.argmax(logits, dim=-1).item()
    label_map = {0: "negative", 1: "neutral", 2: "positive"}
    return label_map[prediction]
 
# Test on the original 5 quotes
original_quotes = [
    "“Be yourself; everyone else is already taken.”",
    "“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”",
    "“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”",
    "“So many books, so little time.”",
    "“A room without books is like a body without a soul.”"
]
 
print("Predictions for original quotes:")
for quote in original_quotes:
    pred = predict_sentiment(quote)
    print(f"Quote: {quote}\nPredicted Sentiment: {pred}\n")
 
# Test on a new example
new_quote = "Life is beautiful when you smile."
print("Prediction for new quote:")
print(f"Quote: {new_quote}\nPredicted Sentiment: {predict_sentiment(new_quote)}")
```

## 📊 Evaluation Metric: Accuracy & F1 Score

For sentiment analysis, accuracy and F1-score are key evaluation metrics. The model achieves:
- **Accuracy:** 88%
- **F1 Score:** 0.85

## 📂 Repository Structure

```
.
├── model/               # Contains the fine-tuned model files
├── tokenizer_config/    # Tokenizer configuration and vocabulary files
├── model.safetensors/   # Model weights
├── README.md            # Model documentation
```

## ⚠️ Limitations

- The model may struggle with ambiguous phrases.
- Performance might vary across different jurisdictions and terminologies.
- The dataset primarily contains English text, making it less effective for multilingual applications.

## 🤝 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.