Spaces:
Runtime error
Runtime error
| # Install dependency | |
| ```bash | |
| pip install -r requirement.txt | |
| ``` | |
| # Download the data set | |
| ```bash | |
| export WORKDIR_ROOT=<a directory which will hold all working files> | |
| ``` | |
| The downloaded data will be at $WORKDIR_ROOT/ML50 | |
| # preprocess the data | |
| Install SPM [here](https://github.com/google/sentencepiece) | |
| ```bash | |
| export WORKDIR_ROOT=<a directory which will hold all working files> | |
| export SPM_PATH=<a path pointing to sentencepice spm_encode.py> | |
| ``` | |
| * $WORKDIR_ROOT/ML50/raw: extracted raw data | |
| * $WORKDIR_ROOT/ML50/dedup: dedup data | |
| * $WORKDIR_ROOT/ML50/clean: data with valid and test sentences removed from the dedup data | |