File size: 3,631 Bytes

---
license: mit
---
# CNN Musical Note Classifier

A deep learning model for classifying musical notes based on their **pitch** and **length**. This model achieves remarkable performance on the dataset, with a test accuracy of **99.66%**.

---

## Model Overview

### Architecture
This model is built using a Convolutional Neural Network (CNN) architecture with the following features:
- Input size: `(64, 64, 1)`
- Number of parameters: **696,255**
- Layers include:
  - Multiple `Conv2D` and `BatchNormalization` layers for feature extraction
  - `GlobalAveragePooling2D` and `Dense` layers for classification
  - Regularization via `Dropout` layers
- Output: **85 classes**, representing combinations of pitch and note length.

### Dataset
- **Original dataset size**: 1,785 samples
- **Augmented dataset size**: 71,400 samples
- **Total size**: 73,185 samples
- Labels include various combinations of pitch (`A3`, `B4`, `C6`, etc.) and note lengths (`16th`, `quarter`, `whole`, etc.).

---

## Training Details

- **Optimizer**: Adam
- **Loss function**: Categorical Crossentropy
- **Epochs**: 400+
- **Batch size**: Optimized for balanced training speed and accuracy
- Final evaluation results:
  - **Test Loss**: `0.1286`
  - **Test Accuracy**: `99.66%`

---

## Examples of Feature Detection

### Filter Visualization
Filters from the first convolutional layers demonstrate the features captured by the network.

![Filter 1](https://huggingface.co/dongim04/musical-note-classifier/blob/main/assets/filter1.png)
![Filter 2](https://huggingface.co/dongim04/musical-note-classifier/blob/main/assets/filter2.png)
![Filter 3](https://huggingface.co/dongim04/musical-note-classifier/blob/main/assets/filter3.png)
![Filter 4](https://huggingface.co/dongim04/musical-note-classifier/blob/main/assets/filter4.png)

### Training and Validation Loss
The following graph shows the training and validation loss during model training:

![Loss Curves](https://huggingface.co/dongim04/musical-note-classifier/blob/main/assets/loss.png)

---

## Labels
The model supports **85 classes**, which include:
- **Pitches**: `A3`, `B4`, `C6`, etc.
- **Note Lengths**: `16th`, `quarter`, `whole`, etc.

Full label list:
```python
['A316th', 'A3eighth', 'A3half', 'A3quarter', 'A3whole', 'A416th', 'A4eighth', 'A4half', 'A4quarter', 'A4whole', 'A516th', 'A5eighth', 'A5half', 'A5quarter', 'A5whole', 'B316th', 'B3eighth', 'B3half', 'B3quarter', 'B3whole', 'B416th', 'B4eighth', 'B4half', 'B4quarter', 'B4whole', 'B516th', 'B5eighth', 'B5half', 'B5quarter', 'B5whole', 'C416th', 'C4eighth', 'C4half', 'C4quarter', 'C4whole', 'C516th', 'C5eighth', 'C5half', 'C5quarter', 'C5whole', 'C616th', 'C6eighth', 'C6half', 'C6quarter', 'C6whole', 'D416th', 'D4eighth', 'D4half', 'D4quarter', 'D4whole', 'D516th', 'D5eighth', 'D5half', 'D5quarter', 'D5whole', 'E416th', 'E4eighth', 'E4half', 'E4quarter', 'E4whole', 'E516th', 'E5eighth', 'E5half', 'E5quarter', 'E5whole', 'F416th', 'F4eighth', 'F4half', 'F4quarter', 'F4whole', 'F516th', 'F5eighth', 'F5half', 'F5quarter', 'F5whole', 'G416th', 'G4eighth', 'G4half', 'G4quarter', 'G4whole', 'G516th', 'G5eighth', 'G5half', 'G5quarter', 'G5whole']
```
---

## Datasets
To ensure the model generalizes well, we augmented the data using the following techniques:
- Cropping the notes in different ways to simulate varied positioning on the musical staff.
- Using random transformations like rotation, zooming, and shifting with TensorFlow's ImageDataGenerator.
- Augmentation expanded the dataset from 1,785 images to a total of 55,335 images.
https://huggingface.co/datasets/dongim04/augmented_musical_note_dataset