File size: 3,631 Bytes
8b544ab cc4bd4d 8b544ab efcbc2b 8b544ab efcbc2b 8b544ab a1420bd d80afd3 29f8c07 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
license: mit
---
# CNN Musical Note Classifier
A deep learning model for classifying musical notes based on their **pitch** and **length**. This model achieves remarkable performance on the dataset, with a test accuracy of **99.66%**.
---
## Model Overview
### Architecture
This model is built using a Convolutional Neural Network (CNN) architecture with the following features:
- Input size: `(64, 64, 1)`
- Number of parameters: **696,255**
- Layers include:
- Multiple `Conv2D` and `BatchNormalization` layers for feature extraction
- `GlobalAveragePooling2D` and `Dense` layers for classification
- Regularization via `Dropout` layers
- Output: **85 classes**, representing combinations of pitch and note length.
### Dataset
- **Original dataset size**: 1,785 samples
- **Augmented dataset size**: 71,400 samples
- **Total size**: 73,185 samples
- Labels include various combinations of pitch (`A3`, `B4`, `C6`, etc.) and note lengths (`16th`, `quarter`, `whole`, etc.).
---
## Training Details
- **Optimizer**: Adam
- **Loss function**: Categorical Crossentropy
- **Epochs**: 400+
- **Batch size**: Optimized for balanced training speed and accuracy
- Final evaluation results:
- **Test Loss**: `0.1286`
- **Test Accuracy**: `99.66%`
---
## Examples of Feature Detection
### Filter Visualization
Filters from the first convolutional layers demonstrate the features captured by the network.




### Training and Validation Loss
The following graph shows the training and validation loss during model training:

---
## Labels
The model supports **85 classes**, which include:
- **Pitches**: `A3`, `B4`, `C6`, etc.
- **Note Lengths**: `16th`, `quarter`, `whole`, etc.
Full label list:
```python
['A316th', 'A3eighth', 'A3half', 'A3quarter', 'A3whole', 'A416th', 'A4eighth', 'A4half', 'A4quarter', 'A4whole', 'A516th', 'A5eighth', 'A5half', 'A5quarter', 'A5whole', 'B316th', 'B3eighth', 'B3half', 'B3quarter', 'B3whole', 'B416th', 'B4eighth', 'B4half', 'B4quarter', 'B4whole', 'B516th', 'B5eighth', 'B5half', 'B5quarter', 'B5whole', 'C416th', 'C4eighth', 'C4half', 'C4quarter', 'C4whole', 'C516th', 'C5eighth', 'C5half', 'C5quarter', 'C5whole', 'C616th', 'C6eighth', 'C6half', 'C6quarter', 'C6whole', 'D416th', 'D4eighth', 'D4half', 'D4quarter', 'D4whole', 'D516th', 'D5eighth', 'D5half', 'D5quarter', 'D5whole', 'E416th', 'E4eighth', 'E4half', 'E4quarter', 'E4whole', 'E516th', 'E5eighth', 'E5half', 'E5quarter', 'E5whole', 'F416th', 'F4eighth', 'F4half', 'F4quarter', 'F4whole', 'F516th', 'F5eighth', 'F5half', 'F5quarter', 'F5whole', 'G416th', 'G4eighth', 'G4half', 'G4quarter', 'G4whole', 'G516th', 'G5eighth', 'G5half', 'G5quarter', 'G5whole']
```
---
## Datasets
To ensure the model generalizes well, we augmented the data using the following techniques:
- Cropping the notes in different ways to simulate varied positioning on the musical staff.
- Using random transformations like rotation, zooming, and shifting with TensorFlow's ImageDataGenerator.
- Augmentation expanded the dataset from 1,785 images to a total of 55,335 images.
https://huggingface.co/datasets/dongim04/augmented_musical_note_dataset |