|
Use Problem module to run customizable recipes |
|
======================================================= |
|
|
|
The :obj:`s3prl.problem` module provides customizable recipes in pure python (almost). |
|
See :obj:`s3prl.problem` for all the recipes ready to be ran. |
|
|
|
|
|
Usage 1. Import and run on Colab |
|
-------------------------------- |
|
|
|
All the problem class follows the same usage |
|
|
|
>>> import torch |
|
>>> from s3prl.problem import SuperbASR |
|
... |
|
>>> problem = SuperbASR() |
|
>>> config = problem.default_config() |
|
>>> print(config) |
|
... |
|
>>> # See the config for the '???' required fields and fill them |
|
>>> config["target_dir"] = "result/asr_exp" |
|
>>> config["prepare_data"]["dataset_root"] = "/corpus/LibriSpeech/" |
|
... |
|
>>> problem.run(**config) |
|
|
|
|
|
Usage 2. Run & configure from CLI |
|
----------------------------------- |
|
|
|
If you want to directly run from command-line, write a python script (:code:`asr.py`) as follow: |
|
|
|
.. code-block:: |
|
|
|
# This is asr.py |
|
|
|
from s3prl.problem import SuperbASR |
|
SuperbASR().main() |
|
|
|
Then, run the command below: |
|
|
|
>>> # Note that the main function supports overridding a field in the config by: |
|
>>> # --{field_name} {value} |
|
>>> # --{outer_field_name}.{inner_field_name} {value} |
|
... |
|
>>> python3 asr.py --target_dir result/asr_exp --prepare_data.dataset_root /corpus/LibriSpeech/ |
|
|
|
|
|
Usage 3. Run & configure with the unified :obj:`s3prl-main` |
|
----------------------------------------------------------- |
|
|
|
However, this means that for every problem you still need to create a file. |
|
Hence, we provide an easy helper supporting all the problems in :obj:`s3prl.problem`: |
|
|
|
>>> python3 -m s3prl.main SuperbASR --target_dir result/asr_exp --prepare_data.dataset_root /corpus/LibriSpeech/ |
|
|
|
or use our CLI entry: :code:`s3prl-main` |
|
|
|
>>> s3prl-main SuperbASR --target_dir result/asr_exp --prepare_data.dataset_root /corpus/LibriSpeech/ |
|
|
|
Customization |
|
------------- |
|
|
|
The core feature of the :obj:`s3prl.problem` module is customization. |
|
You can easily change the corpus, change the SSL upstream model, change the downstream model, |
|
optimizer, scheduler... etc, which can all be freely defined by you! |
|
|
|
We demonstrate how to change the corpus and the downstream model in the following :code:`new_asr.py`: |
|
|
|
.. code-block:: python |
|
|
|
# This is new_asr.py |
|
|
|
import torch |
|
import pandas as pd |
|
from s3prl.problem import SuperbASR |
|
|
|
|
|
class LowResourceLinearSuperbASR(SuperbASR): |
|
def prepare_data( |
|
self, prepare_data: dict, target_dir: str, cache_dir: str, get_path_only=False |
|
): |
|
train_path, valid_path, test_paths = super().prepare_data( |
|
prepare_data, target_dir, cache_dir, get_path_only |
|
) |
|
|
|
# Take only the first 100 utterances for training |
|
df = pd.read_csv(train_path) |
|
df = df.iloc[:100] |
|
df.to_csv(train_path, index=False) |
|
|
|
return train_path, valid_path, test_paths |
|
|
|
def build_downstream( |
|
self, |
|
build_downstream: dict, |
|
downstream_input_size: int, |
|
downstream_output_size: int, |
|
downstream_input_stride: int, |
|
): |
|
class Model(torch.nn.Module): |
|
def __init__(self, input_size, output_size) -> None: |
|
super().__init__() |
|
self.linear = torch.nn.Linear(input_size, output_size) |
|
|
|
def forward(self, x, x_len): |
|
return self.linear(x), x_len |
|
|
|
return Model(downstream_input_size, downstream_output_size) |
|
|
|
|
|
if __name__ == "__main__": |
|
LowResourceLinearSuperbASR().main() |
|
|
|
|
|
By subclassing :obj:`SuperbASR`, we create a new problem called :code:`LowResourceLinearSuperbASR` by |
|
overridding the :code:`prepare_data` and :code:`build_downstream` methods. After this simple modification, |
|
now the :code:`LowResourceLinearSuperbASR` works exactly the same as :code:`SuperbASR` while with two slight |
|
setting changes, and then you can follow the first two usages introduced above to launch this new class. |
|
|
|
For example: |
|
|
|
>>> python3 new_asr.py --target_dir result/new_asr_exp --prepare_data.dataset_root /corpus/LibriSpeech/ |
|
|