File size: 3,066 Bytes
7453689
 
 
 
 
88be931
7453689
 
 
 
 
e64e7fb
7453689
3d1dd01
 
 
 
 
 
 
115511b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7453689
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
language: ja
tags:
- t5
- text2text-generation
- pilota
license: apache-2.0
---

# Pilota model for dialogs

A model for [Pilota](https://github.com/megagonlabs/pilota) trained with [Accommodation Search Dialog Corpus](https://github.com/megagonlabs/asdc) and other additional examples

- ``scud``
    - Fine tuned model of [t5-base-japanese-web (with Byte-fallback, 8K)](https://huggingface.co/megagonlabs/t5-base-japanese-web-8k)
    - The original model is distributed in [the Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
- ``scorer``
    - Fine tuned model of [LINE DistilBERT Japanese](https://huggingface.co/line-corporation/line-distilbert-base-japanese)
    - The original model is distributed in [the Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)

## Usage

1. Install [Pilota](https://github.com/megagonlabs/pilota)
2. Prepare inputs
    - Command

        ```bash
        echo -e 'ใ”่ฆๆœ›ใ‚’ใŠ็Ÿฅใ‚‰ใ›ใใ ใ•ใ„\tใฏใ„ใ€‚้ƒจๅฑ‹ใ‹ใ‚‰ๅฏŒๅฃซๅฑฑใŒ่ฆ‹ใˆใฆใ€ๅคœๆ™ฏใ‚’่ฆ‹ใชใŒใ‚‰้ฃŸไบ‹ใฎใงใใ‚‹ใƒ›ใƒ†ใƒซใŒใ„ใ„ใชใ€‚\nใ“ใ‚“ใซใกใฏ\tใ“ใ‚“ใซใกใฏ' | python -m pilota.convert.plain2request | tee input.jsonl
        ```

    - Output

        ```jsonl
        {"context": [{"name": "agent", "text": "ใ”่ฆๆœ›ใ‚’ใŠ็Ÿฅใ‚‰ใ›ใใ ใ•ใ„"}], "utterance": "ใฏใ„ใ€‚้ƒจๅฑ‹ใ‹ใ‚‰ๅฏŒๅฃซๅฑฑใŒ่ฆ‹ใˆใฆใ€ๅคœๆ™ฏใ‚’่ฆ‹ใชใŒใ‚‰้ฃŸไบ‹ใฎใงใใ‚‹ใƒ›ใƒ†ใƒซใŒใ„ใ„ใชใ€‚", "sentences": null, "meta": {}}
        {"context": [{"name": "agent", "text": "ใ“ใ‚“ใซใกใฏ"}], "utterance": "ใ“ใ‚“ใซใกใฏ", "sentences": null, "meta": {}}
        ```

3. Feed it to Pilota
    - Command

        ```console
        pilota -m megagonlabs/pilota_dialog --batch_size 1 --outlen 60 --nbest 1 --beam 5 < input.jsonl
        ```

    - Output

        ```jsonl
        [{"scuds_nbest": [[]], "original_ranks": [0], "scores": [0.9911208689212798], "scores_detail": [{"OK": 0.9704028964042664, "incorrect_none": 0.04205145686864853, "lack": 0.0007874675211496651, "limited": 0.0003119863977190107, "non_fluent": 0.0002362923405598849, "untruth": 0.0013080810895189643}], "sentence": "ใฏใ„ใ€‚"}, {"scuds_nbest": [["้ƒจๅฑ‹ใ‹ใ‚‰ๅฏŒๅฃซๅฑฑใŒ่ฆ‹ใˆใ‚‹ใƒ›ใƒ†ใƒซใŒ่‰ฏใ„ใ€‚", "ๅคœๆ™ฏใ‚’่ฆ‹ใชใŒใ‚‰้ฃŸไบ‹ใฎใงใใ‚‹ใƒ›ใƒ†ใƒซใŒ่‰ฏใ„ใ€‚"]], "original_ranks": [0], "scores": [0.9952289938926696], "scores_detail": [{"OK": 0.9840966463088989, "incorrect_none": 0.010280555114150047, "lack": 0.0032871251460164785, "limited": 0.00041511686868034303, "non_fluent": 0.0002954243100248277, "untruth": 0.003289491171017289}], "sentence": "้ƒจๅฑ‹ใ‹ใ‚‰ๅฏŒๅฃซๅฑฑใŒ่ฆ‹ใˆใฆใ€ๅคœๆ™ฏใ‚’่ฆ‹ใชใŒใ‚‰้ฃŸไบ‹ใฎใงใใ‚‹ใƒ›ใƒ†ใƒซใŒใ„ใ„ใชใ€‚"}]
        [{"scuds_nbest": [[]], "original_ranks": [0], "scores": [0.9831213414669036], "scores_detail": [{"OK": 0.9704028964042664, "incorrect_none": 0.04205145686864853, "lack": 0.0007874675211496651, "limited": 0.0003119863977190107, "non_fluent": 0.0002362923405598849, "untruth": 0.0013080810895189643}], "sentence": "ใ“ใ‚“ใซใกใฏ"}]
        ```

## License

Apache License 2.0