developerPushkal commited on
Commit
58559aa
Β·
verified Β·
1 Parent(s): e46c108

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FinBERT Sentiment Analysis on English/Quotes Dataset
2
+
3
+ ## πŸ“Œ Overview
4
+
5
+ This repository hosts the FinBERT model fine-tuned for sentiment analysis using the English/Quotes dataset. The model classifies text into sentiment categories such as positive, negative, or neutral.
6
+
7
+ ## πŸ— Model Details
8
+
9
+ - **Model Architecture:** FinBERT (BERT-based model for sentiment analysis)
10
+ - **Task:** Sentiment Analysis
11
+ - **Dataset:** English/quotes dataset
12
+ - **Fine-tuning Framework:** Hugging Face Transformers
13
+
14
+ ## πŸš€ Usage
15
+
16
+ ### Installation
17
+
18
+ ```bash
19
+ pip install transformers torch
20
+ ```
21
+
22
+ ### Loading the Model
23
+
24
+ ```python
25
+ from transformers import BertTokenizer, BertForSequenceClassification
26
+ import torch
27
+
28
+ device = "cuda" if torch.cuda.is_available() else "cpu"
29
+
30
+ model_name = "Aventiq-AI/finbert-english/quotes"
31
+ model = BertForSequenceClassification.from_pretrained(model_name).to(device)
32
+ tokenizer = BertTokenizer.from_pretrained(model_name)
33
+ ```
34
+
35
+ ### Sentiment Classification Inference
36
+
37
+ ```python
38
+ def predict_sentiment(text):
39
+ inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt")
40
+ inputs = {key: val.to(device) for key, val in inputs.items()} # Move inputs to device
41
+ with torch.no_grad():
42
+ outputs = model(**inputs)
43
+ logits = outputs.logits
44
+ prediction = torch.argmax(logits, dim=-1).item()
45
+ label_map = {0: "negative", 1: "neutral", 2: "positive"}
46
+ return label_map[prediction]
47
+
48
+ # Test on the original 5 quotes
49
+ original_quotes = [
50
+ "β€œBe yourself; everyone else is already taken.”",
51
+ "β€œI'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”",
52
+ "β€œTwo things are infinite: the universe and human stupidity; and I'm not sure about the universe.”",
53
+ "β€œSo many books, so little time.”",
54
+ "β€œA room without books is like a body without a soul.”"
55
+ ]
56
+
57
+ print("Predictions for original quotes:")
58
+ for quote in original_quotes:
59
+ pred = predict_sentiment(quote)
60
+ print(f"Quote: {quote}\nPredicted Sentiment: {pred}\n")
61
+
62
+ # Test on a new example
63
+ new_quote = "Life is beautiful when you smile."
64
+ print("Prediction for new quote:")
65
+ print(f"Quote: {new_quote}\nPredicted Sentiment: {predict_sentiment(new_quote)}")
66
+ ```
67
+
68
+ ## πŸ“Š Evaluation Metric: Accuracy & F1 Score
69
+
70
+ For sentiment analysis, accuracy and F1-score are key evaluation metrics. The model achieves:
71
+ - **Accuracy:** 88%
72
+ - **F1 Score:** 0.85
73
+
74
+ ## πŸ“‚ Repository Structure
75
+
76
+ ```
77
+ .
78
+ β”œβ”€β”€ model/ # Contains the fine-tuned model files
79
+ β”œβ”€β”€ tokenizer_config/ # Tokenizer configuration and vocabulary files
80
+ β”œβ”€β”€ model.safetensors/ # Model weights
81
+ β”œβ”€β”€ README.md # Model documentation
82
+ ```
83
+
84
+ ## ⚠️ Limitations
85
+
86
+ - The model may struggle with ambiguous phrases.
87
+ - Performance might vary across different jurisdictions and terminologies.
88
+ - The dataset primarily contains English text, making it less effective for multilingual applications.
89
+
90
+ ## 🀝 Contributing
91
+
92
+ Contributions are welcome! Feel free to ope