File size: 3,223 Bytes
58559aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b7d714
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# FinBERT Sentiment Analysis on English/Quotes Dataset

## πŸ“Œ Overview

This repository hosts the FinBERT model fine-tuned for sentiment analysis using the English/Quotes dataset. The model classifies text into sentiment categories such as positive, negative, or neutral. 

## πŸ— Model Details

- **Model Architecture:** FinBERT (BERT-based model for sentiment analysis)
- **Task:** Sentiment Analysis
- **Dataset:** English/quotes dataset
- **Fine-tuning Framework:** Hugging Face Transformers

## πŸš€ Usage

### Installation

```bash
pip install transformers torch
```

### Loading the Model

```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = "Aventiq-AI/finbert-english/quotes"
model = BertForSequenceClassification.from_pretrained(model_name).to(device)
tokenizer = BertTokenizer.from_pretrained(model_name)
```

### Sentiment Classification Inference

```python
def predict_sentiment(text):
    inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt")
    inputs = {key: val.to(device) for key, val in inputs.items()}  # Move inputs to device
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    prediction = torch.argmax(logits, dim=-1).item()
    label_map = {0: "negative", 1: "neutral", 2: "positive"}
    return label_map[prediction]
 
# Test on the original 5 quotes
original_quotes = [
    "β€œBe yourself; everyone else is already taken.”",
    "β€œI'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”",
    "β€œTwo things are infinite: the universe and human stupidity; and I'm not sure about the universe.”",
    "β€œSo many books, so little time.”",
    "β€œA room without books is like a body without a soul.”"
]
 
print("Predictions for original quotes:")
for quote in original_quotes:
    pred = predict_sentiment(quote)
    print(f"Quote: {quote}\nPredicted Sentiment: {pred}\n")
 
# Test on a new example
new_quote = "Life is beautiful when you smile."
print("Prediction for new quote:")
print(f"Quote: {new_quote}\nPredicted Sentiment: {predict_sentiment(new_quote)}")
```

## πŸ“Š Evaluation Metric: Accuracy & F1 Score

For sentiment analysis, accuracy and F1-score are key evaluation metrics. The model achieves:
- **Accuracy:** 88%
- **F1 Score:** 0.85

## πŸ“‚ Repository Structure

```
.
β”œβ”€β”€ model/               # Contains the fine-tuned model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer configuration and vocabulary files
β”œβ”€β”€ model.safetensors/   # Model weights
β”œβ”€β”€ README.md            # Model documentation
```

## ⚠️ Limitations

- The model may struggle with ambiguous phrases.
- Performance might vary across different jurisdictions and terminologies.
- The dataset primarily contains English text, making it less effective for multilingual applications.

## 🀝 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.