CNN Musical Note Classifier
A deep learning model for classifying musical notes based on their pitch and length. This model achieves remarkable performance on the dataset, with a test accuracy of 99.66%.
Model Overview
Architecture
This model is built using a Convolutional Neural Network (CNN) architecture with the following features:
- Input size:
(64, 64, 1)
- Number of parameters: 696,255
- Layers include:
- Multiple
Conv2D
andBatchNormalization
layers for feature extraction GlobalAveragePooling2D
andDense
layers for classification- Regularization via
Dropout
layers
- Multiple
- Output: 85 classes, representing combinations of pitch and note length.
Dataset
- Original dataset size: 1,785 samples
- Augmented dataset size: 71,400 samples
- Total size: 73,185 samples
- Labels include various combinations of pitch (
A3
,B4
,C6
, etc.) and note lengths (16th
,quarter
,whole
, etc.).
Training Details
- Optimizer: Adam
- Loss function: Categorical Crossentropy
- Epochs: 400+
- Batch size: Optimized for balanced training speed and accuracy
- Final evaluation results:
- Test Loss:
0.1286
- Test Accuracy:
99.66%
- Test Loss:
Examples of Feature Detection
Filter Visualization
Filters from the first convolutional layers demonstrate the features captured by the network.
Training and Validation Loss
The following graph shows the training and validation loss during model training:
Labels
The model supports 85 classes, which include:
- Pitches:
A3
,B4
,C6
, etc. - Note Lengths:
16th
,quarter
,whole
, etc.
Full label list:
python ['A316th', 'A3eighth', 'A3half', 'A3quarter', 'A3whole', 'A416th', 'A4eighth', 'A4half', 'A4quarter', 'A4whole', 'A516th', 'A5eighth', 'A5half', 'A5quarter', 'A5whole', 'B316th', 'B3eighth', 'B3half', 'B3quarter', 'B3whole', 'B416th', 'B4eighth', 'B4half', 'B4quarter', 'B4whole', 'B516th', 'B5eighth', 'B5half', 'B5quarter', 'B5whole', 'C416th', 'C4eighth', 'C4half', 'C4quarter', 'C4whole', 'C516th', 'C5eighth', 'C5half', 'C5quarter', 'C5whole', 'C616th', 'C6eighth', 'C6half', 'C6quarter', 'C6whole', 'D416th', 'D4eighth', 'D4half', 'D4quarter', 'D4whole', 'D516th', 'D5eighth', 'D5half', 'D5quarter', 'D5whole', 'E416th', 'E4eighth', 'E4half', 'E4quarter', 'E4whole', 'E516th', 'E5eighth', 'E5half', 'E5quarter', 'E5whole', 'F416th', 'F4eighth', 'F4half', 'F4quarter', 'F4whole', 'F516th', 'F5eighth', 'F5half', 'F5quarter', 'F5whole', 'G416th', 'G4eighth', 'G4half', 'G4quarter', 'G4whole', 'G516th', 'G5eighth', 'G5half', 'G5quarter', 'G5whole']
Datasets
To ensure the model generalizes well, we augmented the data using the following techniques:
- Cropping the notes in different ways to simulate varied positioning on the musical staff.
- Using random transformations like rotation, zooming, and shifting with TensorFlow's ImageDataGenerator.
- Augmentation expanded the dataset from 1,785 images to a total of 55,335 images. https://huggingface.co/datasets/dongim04/augmented_musical_note_dataset
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.