Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# FinBERT Sentiment Analysis on English/Quotes Dataset
|
2 |
+
|
3 |
+
## π Overview
|
4 |
+
|
5 |
+
This repository hosts the FinBERT model fine-tuned for sentiment analysis using the English/Quotes dataset. The model classifies text into sentiment categories such as positive, negative, or neutral.
|
6 |
+
|
7 |
+
## π Model Details
|
8 |
+
|
9 |
+
- **Model Architecture:** FinBERT (BERT-based model for sentiment analysis)
|
10 |
+
- **Task:** Sentiment Analysis
|
11 |
+
- **Dataset:** English/quotes dataset
|
12 |
+
- **Fine-tuning Framework:** Hugging Face Transformers
|
13 |
+
|
14 |
+
## π Usage
|
15 |
+
|
16 |
+
### Installation
|
17 |
+
|
18 |
+
```bash
|
19 |
+
pip install transformers torch
|
20 |
+
```
|
21 |
+
|
22 |
+
### Loading the Model
|
23 |
+
|
24 |
+
```python
|
25 |
+
from transformers import BertTokenizer, BertForSequenceClassification
|
26 |
+
import torch
|
27 |
+
|
28 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
29 |
+
|
30 |
+
model_name = "Aventiq-AI/finbert-english/quotes"
|
31 |
+
model = BertForSequenceClassification.from_pretrained(model_name).to(device)
|
32 |
+
tokenizer = BertTokenizer.from_pretrained(model_name)
|
33 |
+
```
|
34 |
+
|
35 |
+
### Sentiment Classification Inference
|
36 |
+
|
37 |
+
```python
|
38 |
+
def predict_sentiment(text):
|
39 |
+
inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt")
|
40 |
+
inputs = {key: val.to(device) for key, val in inputs.items()} # Move inputs to device
|
41 |
+
with torch.no_grad():
|
42 |
+
outputs = model(**inputs)
|
43 |
+
logits = outputs.logits
|
44 |
+
prediction = torch.argmax(logits, dim=-1).item()
|
45 |
+
label_map = {0: "negative", 1: "neutral", 2: "positive"}
|
46 |
+
return label_map[prediction]
|
47 |
+
|
48 |
+
# Test on the original 5 quotes
|
49 |
+
original_quotes = [
|
50 |
+
"βBe yourself; everyone else is already taken.β",
|
51 |
+
"βI'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.β",
|
52 |
+
"βTwo things are infinite: the universe and human stupidity; and I'm not sure about the universe.β",
|
53 |
+
"βSo many books, so little time.β",
|
54 |
+
"βA room without books is like a body without a soul.β"
|
55 |
+
]
|
56 |
+
|
57 |
+
print("Predictions for original quotes:")
|
58 |
+
for quote in original_quotes:
|
59 |
+
pred = predict_sentiment(quote)
|
60 |
+
print(f"Quote: {quote}\nPredicted Sentiment: {pred}\n")
|
61 |
+
|
62 |
+
# Test on a new example
|
63 |
+
new_quote = "Life is beautiful when you smile."
|
64 |
+
print("Prediction for new quote:")
|
65 |
+
print(f"Quote: {new_quote}\nPredicted Sentiment: {predict_sentiment(new_quote)}")
|
66 |
+
```
|
67 |
+
|
68 |
+
## π Evaluation Metric: Accuracy & F1 Score
|
69 |
+
|
70 |
+
For sentiment analysis, accuracy and F1-score are key evaluation metrics. The model achieves:
|
71 |
+
- **Accuracy:** 88%
|
72 |
+
- **F1 Score:** 0.85
|
73 |
+
|
74 |
+
## π Repository Structure
|
75 |
+
|
76 |
+
```
|
77 |
+
.
|
78 |
+
βββ model/ # Contains the fine-tuned model files
|
79 |
+
βββ tokenizer_config/ # Tokenizer configuration and vocabulary files
|
80 |
+
βββ model.safetensors/ # Model weights
|
81 |
+
βββ README.md # Model documentation
|
82 |
+
```
|
83 |
+
|
84 |
+
## β οΈ Limitations
|
85 |
+
|
86 |
+
- The model may struggle with ambiguous phrases.
|
87 |
+
- Performance might vary across different jurisdictions and terminologies.
|
88 |
+
- The dataset primarily contains English text, making it less effective for multilingual applications.
|
89 |
+
|
90 |
+
## π€ Contributing
|
91 |
+
|
92 |
+
Contributions are welcome! Feel free to ope
|