dhruthick commited on
Commit
55eb78d
·
1 Parent(s): ba4d86e

updated backend readme

Browse files
backend/README.md CHANGED
@@ -1 +1,13 @@
1
  Placeholder - backend + model stuff.
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  Placeholder - backend + model stuff.
2
+
3
+ ## Models
4
+
5
+ ### Mood classification
6
+
7
+ - *Model*: BERT (Bidirectional Encoder Representations from Transformers), a pre-trained transformer-based model that achieves state-of-the-art performance across various NLP tasks by capturing bidirectional contextual information from text. Here, the pre-trained model is fine-tuned to take the lyrics as input, and predict one of four moods - happy, relaxed, sad, or angry.
8
+ - *Data*: The labelled lyrics dataset - MoodyLyrics is used to train the model. The dataset contains 2,595 songs, all of which are labelled with one of the above moods.
9
+ Only the song titles and the artists' names were included, since there are copyright restrictions on lyrics. The lyrics were later gathered through the popular song lyric website - Genius, by using its API. We were able to get the lyrics for 2,523 songs.
10
+ - *Data Preprocessing*: Lyrics are tokenized using the BERT tokenizer, generating input IDs and attention masks. Special tokens such as [CLS] and [SEP] are added to the begining and the ending of each example, while the [PAD] token is used to maintain a consistent maximum length of model inputs.
11
+ - *Training Environment*: The model was trained on Google Colab GPU sing PyTorch. The data was accessed through Google Drive. The pre-trained BERT model was imported from HuggingFace's transformers library. Pandas and Numpy were readily used to help with data transformations and splitting. Off the 2,523 examples - 2018 (80%) were used for training, 253 (10%) for validating, and 252 (10%) for testing the model.
12
+ - *Hyperparameters*: The best results were acheived with the following hyperparameter settings, batch size - 32, maximum input length - 256, learning rate - 3 x 10^-5, epochs - 2. Due to memeory constarints, we weren't able to experiment with higher alues for maximum input length.
13
+ - *Results*: The model acheieved a 93% accuracy on the test set.
backend/models/train/train-bert-classifier-pytorch.py CHANGED
@@ -14,7 +14,7 @@ Imports and Setup
14
  from google.colab import drive
15
  drive.mount('/content/drive')
16
 
17
- !pip install transformers
18
 
19
  import torch
20
 
 
14
  from google.colab import drive
15
  drive.mount('/content/drive')
16
 
17
+ # !pip install transformers
18
 
19
  import torch
20