|
# FinBERT Sentiment Analysis on English/Quotes Dataset |
|
|
|
## π Overview |
|
|
|
This repository hosts the FinBERT model fine-tuned for sentiment analysis using the English/Quotes dataset. The model classifies text into sentiment categories such as positive, negative, or neutral. |
|
|
|
## π Model Details |
|
|
|
- **Model Architecture:** FinBERT (BERT-based model for sentiment analysis) |
|
- **Task:** Sentiment Analysis |
|
- **Dataset:** English/quotes dataset |
|
- **Fine-tuning Framework:** Hugging Face Transformers |
|
|
|
## π Usage |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers torch |
|
``` |
|
|
|
### Loading the Model |
|
|
|
```python |
|
from transformers import BertTokenizer, BertForSequenceClassification |
|
import torch |
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
model_name = "Aventiq-AI/finbert-english/quotes" |
|
model = BertForSequenceClassification.from_pretrained(model_name).to(device) |
|
tokenizer = BertTokenizer.from_pretrained(model_name) |
|
``` |
|
|
|
### Sentiment Classification Inference |
|
|
|
```python |
|
def predict_sentiment(text): |
|
inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt") |
|
inputs = {key: val.to(device) for key, val in inputs.items()} # Move inputs to device |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
prediction = torch.argmax(logits, dim=-1).item() |
|
label_map = {0: "negative", 1: "neutral", 2: "positive"} |
|
return label_map[prediction] |
|
|
|
# Test on the original 5 quotes |
|
original_quotes = [ |
|
"βBe yourself; everyone else is already taken.β", |
|
"βI'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.β", |
|
"βTwo things are infinite: the universe and human stupidity; and I'm not sure about the universe.β", |
|
"βSo many books, so little time.β", |
|
"βA room without books is like a body without a soul.β" |
|
] |
|
|
|
print("Predictions for original quotes:") |
|
for quote in original_quotes: |
|
pred = predict_sentiment(quote) |
|
print(f"Quote: {quote}\nPredicted Sentiment: {pred}\n") |
|
|
|
# Test on a new example |
|
new_quote = "Life is beautiful when you smile." |
|
print("Prediction for new quote:") |
|
print(f"Quote: {new_quote}\nPredicted Sentiment: {predict_sentiment(new_quote)}") |
|
``` |
|
|
|
## π Evaluation Metric: Accuracy & F1 Score |
|
|
|
For sentiment analysis, accuracy and F1-score are key evaluation metrics. The model achieves: |
|
- **Accuracy:** 88% |
|
- **F1 Score:** 0.85 |
|
|
|
## π Repository Structure |
|
|
|
``` |
|
. |
|
βββ model/ # Contains the fine-tuned model files |
|
βββ tokenizer_config/ # Tokenizer configuration and vocabulary files |
|
βββ model.safetensors/ # Model weights |
|
βββ README.md # Model documentation |
|
``` |
|
|
|
## β οΈ Limitations |
|
|
|
- The model may struggle with ambiguous phrases. |
|
- Performance might vary across different jurisdictions and terminologies. |
|
- The dataset primarily contains English text, making it less effective for multilingual applications. |
|
|
|
## π€ Contributing |
|
|
|
Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements. |