File size: 3,035 Bytes
1ba58bb
 
7c0dabb
 
 
 
 
 
 
6e1b8c8
 
80c8b34
7efd18b
1ba58bb
a1366d0
bec6bf2
7ee72a5
 
ca6d4bb
2b1c575
 
80c8b34
 
a1366d0
88854b4
80c8b34
 
7ee72a5
46f19b9
a1366d0
7ee72a5
 
 
7c0dabb
 
2cd11bb
 
 
 
 
 
 
 
 
 
 
b57a08d
5ce0fff
2cd11bb
 
b57a08d
5ce0fff
2cd11bb
 
b57a08d
5ce0fff
2cd11bb
7ee72a5
 
2b1c575
 
 
7ee72a5
20de02d
7ee72a5
2b1c575
7efd18b
50a86ff
 
20de02d
1bcbad2
01b88c3
1c1de91
1bcbad2
02ba99a
1bcbad2
 
20de02d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: mit
datasets:
- ccmusic-database/pianos
language:
- en
tags:
- music
- art
metrics:
- accuracy
pipeline_tag: audio-classification
library_name: https://github.com/monetjoe/pianos
---

# Intro
This study, based on deep learning technology, draws inspiration from classical backbone network structures in the computer vision domain to construct an innovative 8-class piano timbre discriminator model through audio data processing. The model focuses on eight brands and types of pianos, including Kawai, Kawai Grand, YOUNG CHANG, HSINGHAI, Steinway Theatre, Steinway Grand, Pearl River, and Yamaha. By transforming audio data into Mel spectrograms and conducting supervised learning in the fine-tuning phase, the model accurately distinguishes different piano timbres and performs well in practical testing. In the training process, a large-scale annotated audio dataset is utilized, and the introduction of deep learning technology provides crucial support for improving the model's performance by progressively learning to extract key features from audio. The piano timbre discriminator model has broad potential applications in music assessment, audio engineering, and other fields, offering an advanced and reliable solution for piano timbre discrimination. This study expands new possibilities for the application of deep learning in the audio domain, providing valuable references for future research and applications in related fields.

## Demo (inference code)
<https://huggingface.co/spaces/ccmusic-database/pianos>

## Usage
```python
from huggingface_hub import snapshot_download
model_dir = snapshot_download("ccmusic-database/pianos")
```

## Maintenance
```bash
GIT_LFS_SKIP_SMUDGE=1 git clone [email protected]:ccmusic-database/pianos
cd pianos
```

## Results
A demo result of SqueezeNet fine-tuning:
<style>
  #pianos td {
    vertical-align: middle !important;
    text-align: center;
  }
  #pianos th {
    text-align: center;
  }
</style>
<table id="pianos">
    <tr>
        <th>Loss curve</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/pianos/resolve/master/loss.jpg"></td>
    </tr>
    <tr>
        <th>Training and validation accuracy</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/pianos/resolve/master/acc.jpg"></td>
    </tr>
    <tr>
        <th>Confusion matrix</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/pianos/resolve/master/mat.jpg"></td>
    </tr>
</table>

## Dataset
<https://huggingface.co/datasets/ccmusic-database/pianos>

## Mirror
<https://www.modelscope.cn/models/ccmusic-database/pianos>

## Evaluation
<https://github.com/monetjoe/pianos>

## Cite
```bibtex
@inproceedings{zhou2023holistic,
  title        = {A Holistic Evaluation of Piano Sound Quality},
  author       = {Monan Zhou and Shangda Wu and Shaohua Ji and Zijin Li and Wei Li},
  booktitle    = {National Conference on Sound and Music Technology},
  pages        = {3--17},
  year         = {2023},
  organization = {Springer}
}
```