superb-hidden-set
commited on
Commit
·
1ff2c02
1
Parent(s):
c7d9dc5
move model interface functions description from website to here
Browse files
README.md
CHANGED
@@ -19,13 +19,71 @@ If you are not feasible to submit the pre-trained model, please [fill this form]
|
|
19 |
|
20 |
## Quickstart
|
21 |
|
22 |
-
### 1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
First create an account on the Hugging Face Hub and you can sign up [here](https://huggingface.co/join) if you haven't already! Next, create a new organization and invite the SUPERB Hidden Set Committee to join. You will upload your model to a repository under this organization so that members inside it can access the model which is not publicly available.
|
25 |
|
26 |
* [superb-hidden-set](https://huggingface.co/superb-hidden-set)
|
27 |
|
28 |
-
###
|
29 |
|
30 |
The next step is to create a template repository on your local machine that contains various files and a CLI to help you validate and submit your pretrained models. The Hugging Face Hub uses [Git Large File Storage (LFS)](https://git-lfs.github.com) to manage large files, so first install it if you don't have it already. For example, on macOS you can run:
|
31 |
|
@@ -72,7 +130,7 @@ my-superb-submission
|
|
72 |
└── model.pt <- Your model weights
|
73 |
```
|
74 |
|
75 |
-
###
|
76 |
|
77 |
The final step is to install the project's dependencies:
|
78 |
|
|
|
19 |
|
20 |
## Quickstart
|
21 |
|
22 |
+
### 1. Add model interfaces
|
23 |
+
|
24 |
+
#### forward
|
25 |
+
|
26 |
+
Extract features from waveforms.
|
27 |
+
|
28 |
+
- **Input:** A list of waveforms in 16000 Hz
|
29 |
+
|
30 |
+
```python
|
31 |
+
SAMPLE_RATE = 16000
|
32 |
+
BATCH_SIZE = 8
|
33 |
+
EXAMPLE_SEC = 10
|
34 |
+
wavs = [torch.randn(SAMPLE_RATE * EXAMPLE_SEC).cuda() for _ in range(BATCH_SIZE)]
|
35 |
+
results = upstream(wavs)
|
36 |
+
```
|
37 |
+
|
38 |
+
- **Output:** A dictionary with a key for each task. If any task-specific key is not presented, a "hidden_states" key should be provided as the default key. The value for each key is **a list** of padded sequences in the same shape of **(batch_size, max_sequence_length_of_batch, hidden_size)** for weighted-sum to work. It is welcome to perform some preprocessing on the upstream's raw hidden-sets, including upsampling and downsampling. However, all the values must come from **a single upstream model**:
|
39 |
+
|
40 |
+
```python
|
41 |
+
assert isinstance(results, dict)
|
42 |
+
tasks = ["PR", "SID", "ER", "ASR", "ASV", "SD", "QbE", "ST", "SS", "SE"]
|
43 |
+
for task in tasks:
|
44 |
+
hidden_states = results.get(task, "hidden_states")
|
45 |
+
assert isinstance(hidden_states, list)
|
46 |
+
|
47 |
+
for state in hidden_states:
|
48 |
+
assert isinstance(state, torch.Tensor)
|
49 |
+
assert state.dim() == 3, "(batch_size, max_sequence_length_of_batch, hidden_size)"
|
50 |
+
assert state.shape == hidden_states[0].shape
|
51 |
+
```
|
52 |
+
|
53 |
+
#### get_downsample_rates
|
54 |
+
|
55 |
+
Provide the downsample rate **from 16000 Hz waveforms** for each task's representation in the dict. For the standard 10ms stride representation, the downsample rate is 160.
|
56 |
+
|
57 |
+
```python
|
58 |
+
SAMPLE_RATE = 16000
|
59 |
+
MSEC_PER_SEC = 1000
|
60 |
+
downsample_rate = SAMPLE_RATE * 10 / MSEC_PER_SEC # 160
|
61 |
+
```
|
62 |
+
|
63 |
+
The downsample rate will be used to:
|
64 |
+
|
65 |
+
1. Calculate the valid representation length of each utterance in the output padded representation.
|
66 |
+
2. Prepare the training materials according to the representation's downsample rate for frame-level tasks, e.g. SD, SE, and SS.
|
67 |
+
|
68 |
+
- **Input:** the task key (str)
|
69 |
+
- **Output:** the downsample rate (int) of the representation for that task
|
70 |
+
|
71 |
+
```python
|
72 |
+
for task in tasks:
|
73 |
+
assert isinstance(task, str)
|
74 |
+
downsample_rate = upstream.get_downsample_rate(task)
|
75 |
+
assert isinstance(downsample_rate, int)
|
76 |
+
print("The upstream's representation for {task}"
|
77 |
+
f" has the downsample rate of {downsample_rate}.")
|
78 |
+
```
|
79 |
+
|
80 |
+
### 2. Create an account and organization on the Hugging Face Hub
|
81 |
|
82 |
First create an account on the Hugging Face Hub and you can sign up [here](https://huggingface.co/join) if you haven't already! Next, create a new organization and invite the SUPERB Hidden Set Committee to join. You will upload your model to a repository under this organization so that members inside it can access the model which is not publicly available.
|
83 |
|
84 |
* [superb-hidden-set](https://huggingface.co/superb-hidden-set)
|
85 |
|
86 |
+
### 3. Create a template repository on your machine
|
87 |
|
88 |
The next step is to create a template repository on your local machine that contains various files and a CLI to help you validate and submit your pretrained models. The Hugging Face Hub uses [Git Large File Storage (LFS)](https://git-lfs.github.com) to manage large files, so first install it if you don't have it already. For example, on macOS you can run:
|
89 |
|
|
|
130 |
└── model.pt <- Your model weights
|
131 |
```
|
132 |
|
133 |
+
### 4. Install the dependencies
|
134 |
|
135 |
The final step is to install the project's dependencies:
|
136 |
|