lmzjms's picture
Upload 1162 files
0b32ad6 verified
Use Problem module to run customizable recipes
=======================================================
The :obj:`s3prl.problem` module provides customizable recipes in pure python (almost).
See :obj:`s3prl.problem` for all the recipes ready to be ran.
Usage 1. Import and run on Colab
--------------------------------
All the problem class follows the same usage
>>> import torch
>>> from s3prl.problem import SuperbASR
...
>>> problem = SuperbASR()
>>> config = problem.default_config()
>>> print(config)
...
>>> # See the config for the '???' required fields and fill them
>>> config["target_dir"] = "result/asr_exp"
>>> config["prepare_data"]["dataset_root"] = "/corpus/LibriSpeech/"
...
>>> problem.run(**config)
Usage 2. Run & configure from CLI
-----------------------------------
If you want to directly run from command-line, write a python script (:code:`asr.py`) as follow:
.. code-block::
# This is asr.py
from s3prl.problem import SuperbASR
SuperbASR().main()
Then, run the command below:
>>> # Note that the main function supports overridding a field in the config by:
>>> # --{field_name} {value}
>>> # --{outer_field_name}.{inner_field_name} {value}
...
>>> python3 asr.py --target_dir result/asr_exp --prepare_data.dataset_root /corpus/LibriSpeech/
Usage 3. Run & configure with the unified :obj:`s3prl-main`
-----------------------------------------------------------
However, this means that for every problem you still need to create a file.
Hence, we provide an easy helper supporting all the problems in :obj:`s3prl.problem`:
>>> python3 -m s3prl.main SuperbASR --target_dir result/asr_exp --prepare_data.dataset_root /corpus/LibriSpeech/
or use our CLI entry: :code:`s3prl-main`
>>> s3prl-main SuperbASR --target_dir result/asr_exp --prepare_data.dataset_root /corpus/LibriSpeech/
Customization
-------------
The core feature of the :obj:`s3prl.problem` module is customization.
You can easily change the corpus, change the SSL upstream model, change the downstream model,
optimizer, scheduler... etc, which can all be freely defined by you!
We demonstrate how to change the corpus and the downstream model in the following :code:`new_asr.py`:
.. code-block:: python
# This is new_asr.py
import torch
import pandas as pd
from s3prl.problem import SuperbASR
class LowResourceLinearSuperbASR(SuperbASR):
def prepare_data(
self, prepare_data: dict, target_dir: str, cache_dir: str, get_path_only=False
):
train_path, valid_path, test_paths = super().prepare_data(
prepare_data, target_dir, cache_dir, get_path_only
)
# Take only the first 100 utterances for training
df = pd.read_csv(train_path)
df = df.iloc[:100]
df.to_csv(train_path, index=False)
return train_path, valid_path, test_paths
def build_downstream(
self,
build_downstream: dict,
downstream_input_size: int,
downstream_output_size: int,
downstream_input_stride: int,
):
class Model(torch.nn.Module):
def __init__(self, input_size, output_size) -> None:
super().__init__()
self.linear = torch.nn.Linear(input_size, output_size)
def forward(self, x, x_len):
return self.linear(x), x_len
return Model(downstream_input_size, downstream_output_size)
if __name__ == "__main__":
LowResourceLinearSuperbASR().main()
By subclassing :obj:`SuperbASR`, we create a new problem called :code:`LowResourceLinearSuperbASR` by
overridding the :code:`prepare_data` and :code:`build_downstream` methods. After this simple modification,
now the :code:`LowResourceLinearSuperbASR` works exactly the same as :code:`SuperbASR` while with two slight
setting changes, and then you can follow the first two usages introduced above to launch this new class.
For example:
>>> python3 new_asr.py --target_dir result/new_asr_exp --prepare_data.dataset_root /corpus/LibriSpeech/