Spaces:
Runtime error
Runtime error
# Token classification | |
## PyTorch version, no Trainer | |
Fine-tuning (m)LUKE for token classification task such as Named Entity Recognition (NER), Parts-of-speech | |
tagging (POS) or phrase extraction (CHUNKS). You can easily | |
customize it to your needs if you need extra processing on your datasets. | |
It will either run on a datasets hosted on our [hub](https://huggingface.co/datasets) or with your own text files for | |
training and validation, you might just need to add some tweaks in the data preprocessing. | |
The script can be run in a distributed setup, on TPU and supports mixed precision by | |
the mean of the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library. You can use the script normally | |
after installing it: | |
```bash | |
pip install git+https://github.com/huggingface/accelerate | |
``` | |
then to train English LUKE on CoNLL2003: | |
```bash | |
export TASK_NAME=ner | |
python run_luke_ner_no_trainer.py \ | |
--model_name_or_path studio-ousia/luke-base \ | |
--dataset_name conll2003 \ | |
--task_name $TASK_NAME \ | |
--max_length 128 \ | |
--per_device_train_batch_size 32 \ | |
--learning_rate 2e-5 \ | |
--num_train_epochs 3 \ | |
--output_dir /tmp/$TASK_NAME/ | |
``` | |
You can then use your usual launchers to run in it in a distributed environment, but the easiest way is to run | |
```bash | |
accelerate config | |
``` | |
and reply to the questions asked. Then | |
```bash | |
accelerate test | |
``` | |
that will check everything is ready for training. Finally, you can launch training with | |
```bash | |
export TASK_NAME=ner | |
accelerate launch run_ner_no_trainer.py \ | |
--model_name_or_path studio-ousia/luke-base \ | |
--dataset_name conll2003 \ | |
--task_name $TASK_NAME \ | |
--max_length 128 \ | |
--per_device_train_batch_size 32 \ | |
--learning_rate 2e-5 \ | |
--num_train_epochs 3 \ | |
--output_dir /tmp/$TASK_NAME/ | |
``` | |
This command is the same and will work for: | |
- a CPU-only setup | |
- a setup with one GPU | |
- a distributed training with several GPUs (single or multi node) | |
- a training on TPUs | |
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it. | |