MILU

MILU is a joint neural model that allows you to simultaneously predict multiple dialog act items (a dialog act item takes a form of domain-intent(slot, value). Since it is common that, in a multi-domain setting, an utterance has multiple dialog act items, MILU is likely to yield higher performance than conventional single-intent models.

Example usage

We based our implementation on the AllenNLP library. For an introduction to this library, you should check these tutorials.

To use this model, you need to additionally install overrides==4.1.2, allennlp==0.9.0 and use python>=3.6,<=3.8.

On MultiWOZ dataset

$ python train.py multiwoz/configs/[base|context3].jsonnet -s serialization_dir
$ python evaluate.py serialization_dir/model.tar.gz {test_file} --cuda-device {CUDA_DEVICE}

If you want to perform end-to-end evaluation, you can include the trained model by adding the model path (serialization_dir/model.tar.gz) to your ConvLab spec file.

Data

We use the multiwoz data (data/multiwoz/[train|val|test].json.zip).

MILU on datasets in unified format

We support training MILU on datasets that are in our unified format.

For non-categorical dialogue acts whose values are in the utterances, we use slot tagging to extract the values.
For categorical and binary dialogue acts whose values may not be presented in the utterances, we treat them as intents of the utterances.

Takes MultiWOZ 2.1 (unified format) as an example,

$ python train.py unified_datasets/configs/multiwoz21_user_context3.jsonnet -s serialization_dir
$ python evaluate.py serialization_dir/model.tar.gz test --cuda-device {CUDA_DEVICE} --output_file output/multiwoz21_user/output.json

# to generate output/multiwoz21_user/predictions.json that merges test data and model predictions.
$ python unified_datasets/merge_predict_res.py -d multiwoz21 -s user -p output/multiwoz21_user/output.json

Note that the config file is different from the above. You should set:

"use_unified_datasets": true in dataset_reader and model
"dataset_name": "multiwoz21" in dataset_reader
"train_data_path": "train"
"validation_data_path": "validation"
"test_data_path": "test"

Predict

See nlu.py under multiwoz and unified_datasets directories.

Performance on unified format datasets

To illustrate that it is easy to use the model for any dataset that in our unified format, we report the performance on several datasets in our unified format. We follow README.md and config files in unified_datasets/ to generate predictions.json, then evaluate it using ../evaluate_unified_datasets.py. Note that we use almost the same hyper-parameters for different datasets, which may not be optimal.

	MultiWOZ 2.1		Taskmaster-1		Taskmaster-2		Taskmaster-3
Model	Acc	F1	Acc	F1	Acc	F1	Acc	F1
MILU	72.9	85.2	72.9	49.2	79.1	68.7	85.4	80.3
MILU (context=3)	76.6	87.9	72.4	48.5	78.9	68.4	85.1	80.1

Acc: whether all dialogue acts of an utterance are correctly predicted
F1: F1 measure of the dialogue act predictions over the corpus.

References

@inproceedings{lee2019convlab,
  title={ConvLab: Multi-Domain End-to-End Dialog System Platform},
  author={Lee, Sungjin and Zhu, Qi and Takanobu, Ryuichi and Li, Xiang and Zhang, Yaoqin and Zhang, Zheng and Li, Jinchao and Peng, Baolin and Li, Xiujun and Huang, Minlie and Gao, Jianfeng},
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  year={2019}
}