dongim04's picture
Update README.md
d80afd3 verified
metadata
license: mit

CNN Musical Note Classifier

A deep learning model for classifying musical notes based on their pitch and length. This model achieves remarkable performance on the dataset, with a test accuracy of 99.66%.


Model Overview

Architecture

This model is built using a Convolutional Neural Network (CNN) architecture with the following features:

  • Input size: (64, 64, 1)
  • Number of parameters: 696,255
  • Layers include:
    • Multiple Conv2D and BatchNormalization layers for feature extraction
    • GlobalAveragePooling2D and Dense layers for classification
    • Regularization via Dropout layers
  • Output: 85 classes, representing combinations of pitch and note length.

Dataset

  • Original dataset size: 1,785 samples
  • Augmented dataset size: 71,400 samples
  • Total size: 73,185 samples
  • Labels include various combinations of pitch (A3, B4, C6, etc.) and note lengths (16th, quarter, whole, etc.).

Training Details

  • Optimizer: Adam
  • Loss function: Categorical Crossentropy
  • Epochs: 400+
  • Batch size: Optimized for balanced training speed and accuracy
  • Final evaluation results:
    • Test Loss: 0.1286
    • Test Accuracy: 99.66%

Examples of Feature Detection

Filter Visualization

Filters from the first convolutional layers demonstrate the features captured by the network.

Filter 1 Filter 2 Filter 3 Filter 4

Training and Validation Loss

The following graph shows the training and validation loss during model training:

Loss Curves


Labels

The model supports 85 classes, which include:

  • Pitches: A3, B4, C6, etc.
  • Note Lengths: 16th, quarter, whole, etc.

Full label list: python ['A316th', 'A3eighth', 'A3half', 'A3quarter', 'A3whole', 'A416th', 'A4eighth', 'A4half', 'A4quarter', 'A4whole', 'A516th', 'A5eighth', 'A5half', 'A5quarter', 'A5whole', 'B316th', 'B3eighth', 'B3half', 'B3quarter', 'B3whole', 'B416th', 'B4eighth', 'B4half', 'B4quarter', 'B4whole', 'B516th', 'B5eighth', 'B5half', 'B5quarter', 'B5whole', 'C416th', 'C4eighth', 'C4half', 'C4quarter', 'C4whole', 'C516th', 'C5eighth', 'C5half', 'C5quarter', 'C5whole', 'C616th', 'C6eighth', 'C6half', 'C6quarter', 'C6whole', 'D416th', 'D4eighth', 'D4half', 'D4quarter', 'D4whole', 'D516th', 'D5eighth', 'D5half', 'D5quarter', 'D5whole', 'E416th', 'E4eighth', 'E4half', 'E4quarter', 'E4whole', 'E516th', 'E5eighth', 'E5half', 'E5quarter', 'E5whole', 'F416th', 'F4eighth', 'F4half', 'F4quarter', 'F4whole', 'F516th', 'F5eighth', 'F5half', 'F5quarter', 'F5whole', 'G416th', 'G4eighth', 'G4half', 'G4quarter', 'G4whole', 'G516th', 'G5eighth', 'G5half', 'G5quarter', 'G5whole']

Datasets

To ensure the model generalizes well, we augmented the data using the following techniques: