Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,47 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- stanfordnlp/imdb
|
5 |
+
---
|
6 |
+
# Binary Sentiment Classification Using Transformers
|
7 |
+
|
8 |
+
## Introduction
|
9 |
+
|
10 |
+
This project demonstrates fine-tuning a pre-trained transformer model to perform binary sentiment classification using the IMDb dataset. The task involves classifying movie reviews as either negative (0) or positive (1). The implementation leverages the Hugging Face Transformers and Datasets libraries, along with PyTorch, to preprocess the data, fine-tune a pre-trained DistilBERT model, evaluate the model, and save the final model for future use.
|
11 |
+
|
12 |
+
## Task Description
|
13 |
+
|
14 |
+
The assignment includes the following key steps:
|
15 |
+
|
16 |
+
1. **Dataset Selection and Preprocessing**
|
17 |
+
- Using the IMDb dataset from Hugging Face, which contains text reviews and their corresponding binary sentiment labels.
|
18 |
+
- Tokenizing the dataset with a pre-trained DistilBERT tokenizer.
|
19 |
+
- Splitting the data into training, validation, and test sets.
|
20 |
+
|
21 |
+
2. **Model Selection and Fine-Tuning**
|
22 |
+
- Loading a pre-trained DistilBERT model for sequence classification.
|
23 |
+
- Fine-tuning the model on the processed dataset using the Hugging Face `Trainer` API.
|
24 |
+
- Configuring training parameters, including learning rate, batch size, number of epochs, and evaluation strategy.
|
25 |
+
|
26 |
+
3. **Evaluation**
|
27 |
+
- Evaluating model performance using metrics such as accuracy, F1-score, precision, and recall.
|
28 |
+
- Analyzing the model's performance on the test set.
|
29 |
+
|
30 |
+
4. **Saving the Model**
|
31 |
+
- Saving the fine-tuned model for later use.
|
32 |
+
|
33 |
+
## Requirements
|
34 |
+
|
35 |
+
- Python 3.x
|
36 |
+
- [Transformers](https://github.com/huggingface/transformers)
|
37 |
+
- [Datasets](https://github.com/huggingface/datasets)
|
38 |
+
- [Scikit-Learn](https://scikit-learn.org/)
|
39 |
+
- [PyTorch](https://pytorch.org/)
|
40 |
+
- (Optional) Google Colab for easy experimentation
|
41 |
+
|
42 |
+
## Installation
|
43 |
+
|
44 |
+
Install the necessary libraries using pip:
|
45 |
+
|
46 |
+
```bash
|
47 |
+
pip install -U transformers datasets scikit-learn torch
|