Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,47 +1,52 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
license: mit
|
4 |
-
tags:
|
5 |
-
- audio
|
6 |
-
- captioning
|
7 |
-
- text
|
8 |
-
- audio-captioning
|
9 |
-
- automated-audio-captioning
|
10 |
-
task_categories:
|
11 |
-
- audio-captioning
|
12 |
---
|
|
|
13 |
|
14 |
-
# CoNeTTE
|
15 |
|
16 |
-
<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
19 |
|
20 |
## Installation
|
21 |
```bash
|
22 |
-
pip install conette
|
|
|
23 |
```
|
24 |
|
25 |
-
## Usage
|
26 |
```py
|
27 |
from conette import CoNeTTEConfig, CoNeTTEModel
|
28 |
|
29 |
config = CoNeTTEConfig.from_pretrained("Labbeti/conette")
|
30 |
model = CoNeTTEModel.from_pretrained("Labbeti/conette", config=config)
|
31 |
|
32 |
-
path = "/
|
33 |
outputs = model(path)
|
34 |
candidate = outputs["cands"][0]
|
35 |
print(candidate)
|
36 |
```
|
37 |
|
38 |
-
The model can also accept several audio files at the same time (list[str]), or a list of pre-loaded audio files (list[Tensor]).
|
39 |
|
40 |
```py
|
41 |
import torchaudio
|
42 |
|
43 |
-
path_1 = "/
|
44 |
-
path_2 = "/
|
45 |
|
46 |
audio_1, sr_1 = torchaudio.load(path_1)
|
47 |
audio_2, sr_2 = torchaudio.load(path_2)
|
@@ -63,11 +68,19 @@ candidate = outputs["cands"][0]
|
|
63 |
print(candidate)
|
64 |
```
|
65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
## Performance
|
67 |
-
|
68 |
-
|
|
69 |
-
|
|
70 |
-
|
|
|
|
71 |
|
72 |
This model checkpoint has been trained for the Clotho dataset, but it can also reach a good performance on AudioCaps with the "audiocaps" task.
|
73 |
|
@@ -89,7 +102,9 @@ The preprint version of the paper describing CoNeTTE is available on arxiv: http
|
|
89 |
|
90 |
## Additional information
|
91 |
|
92 |
-
|
93 |
-
More precisely, the encoder weights used are named "convnext_tiny_465mAP_BL_AC_70kit.pth", available on Zenodo: https://zenodo.org/record/8020843.
|
94 |
|
95 |
-
|
|
|
|
|
|
1 |
---
|
2 |
+
{}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
+
<div align="center">
|
5 |
|
6 |
+
# CoNeTTE model source
|
7 |
|
8 |
+
<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/-Python 3.10+-blue?style=for-the-badge&logo=python&logoColor=white"></a>
|
9 |
+
<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/-PyTorch 1.10.1+-ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white"></a>
|
10 |
+
<a href="https://black.readthedocs.io/en/stable/"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-black.svg?style=for-the-badge&labelColor=gray"></a>
|
11 |
+
<a href="https://github.com/Labbeti/conette-audio-captioning/actions">
|
12 |
+
<img alt="Build" src="https://img.shields.io/github/actions/workflow/status/Labbeti/conette-audio-captioning/python-package-pip.yaml?branch=main&style=for-the-badge&logo=github">
|
13 |
+
</a>
|
14 |
+
<!-- <a href='https://aac-metrics.readthedocs.io/en/stable/?badge=stable'>
|
15 |
+
<img src='https://readthedocs.org/projects/aac-metrics/badge/?version=stable&style=for-the-badge' alt='Documentation Status' />
|
16 |
+
</a> -->
|
17 |
|
18 |
+
CoNeTTE is an audio captioning system, which generate a short textual description of the sound events in any audio file.
|
19 |
+
|
20 |
+
</div>
|
21 |
+
|
22 |
+
CoNeTTE has been developped by me ([Étienne Labbé](https://labbeti.github.io/)) during my PhD. CoNeTTE stands for ConvNeXt-Transformer model with Task Embedding, and the architecture and training is explained in the corresponding [paper](https://arxiv.org/pdf/2309.00454.pdf).
|
23 |
|
24 |
## Installation
|
25 |
```bash
|
26 |
+
python -m pip install conette
|
27 |
+
python -m spacy download en_core_web_sm
|
28 |
```
|
29 |
|
30 |
+
## Usage with python
|
31 |
```py
|
32 |
from conette import CoNeTTEConfig, CoNeTTEModel
|
33 |
|
34 |
config = CoNeTTEConfig.from_pretrained("Labbeti/conette")
|
35 |
model = CoNeTTEModel.from_pretrained("Labbeti/conette", config=config)
|
36 |
|
37 |
+
path = "/your/path/to/audio.wav"
|
38 |
outputs = model(path)
|
39 |
candidate = outputs["cands"][0]
|
40 |
print(candidate)
|
41 |
```
|
42 |
|
43 |
+
The model can also accept several audio files at the same time (list[str]), or a list of pre-loaded audio files (list[Tensor]). In this second case you also need to provide the sampling rate of this files:
|
44 |
|
45 |
```py
|
46 |
import torchaudio
|
47 |
|
48 |
+
path_1 = "/your/path/to/audio_1.wav"
|
49 |
+
path_2 = "/your/path/to/audio_2.wav"
|
50 |
|
51 |
audio_1, sr_1 = torchaudio.load(path_1)
|
52 |
audio_2, sr_2 = torchaudio.load(path_2)
|
|
|
68 |
print(candidate)
|
69 |
```
|
70 |
|
71 |
+
## Usage with command line
|
72 |
+
Simply use the command `conette-predict` with `--audio PATH1 PATH2 ...` option. You can also export results to a CSV file using `--csv_export PATH`.
|
73 |
+
|
74 |
+
```bash
|
75 |
+
conette-predict --audio "/your/path/to/audio.wav"
|
76 |
+
```
|
77 |
+
|
78 |
## Performance
|
79 |
+
|
80 |
+
| Test data | SPIDEr (%) | SPIDEr-FL (%) | FENSE (%) | Vocab | Outputs | Scores |
|
81 |
+
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
|
82 |
+
| AC-test | 44.14 | 43.98 | 60.81 | 309 | [:clipboard:](results/conette/outputs_audiocaps_test.csv) | [:chart_with_upwards_trend:](results/conette/scores_audiocaps_test.yaml) |
|
83 |
+
| CL-eval | 30.97 | 30.87 | 51.72 | 636 | [:clipboard:](results/conette/outputs_clotho_eval.csv) | [:chart_with_upwards_trend:](results/conette/scores_clotho_eval.yaml) |
|
84 |
|
85 |
This model checkpoint has been trained for the Clotho dataset, but it can also reach a good performance on AudioCaps with the "audiocaps" task.
|
86 |
|
|
|
102 |
|
103 |
## Additional information
|
104 |
|
105 |
+
- Model weights are available on HuggingFace: https://huggingface.co/Labbeti/conette
|
106 |
+
- The encoder part of the architecture is based on a ConvNeXt model for audio classification, available here: https://huggingface.co/topel/ConvNeXt-Tiny-AT. More precisely, the encoder weights used are named "convnext_tiny_465mAP_BL_AC_70kit.pth", available on Zenodo: https://zenodo.org/record/8020843.
|
107 |
|
108 |
+
## Contact
|
109 |
+
Maintainer:
|
110 |
+
- Etienne Labbé "Labbeti": [email protected]
|