File size: 2,909 Bytes
0b32ad6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
from .base import SequentialDataPipe
from .common_pipes import (
    EncodeCategory,
    EncodeMultiLabel,
    EncodeMultipleCategory,
    LoadAudio,
    SetOutputKeys,
)


class UtteranceClassificationPipe(SequentialDataPipe):
    """
    each item in the input dataset should have:
        wav_path: str
        label: str
    """

    def __init__(
        self,
        output_keys: dict = None,
        audio_sample_rate: int = 16000,
        audio_channel_reduction: str = "first",
        sox_effects: list = None,
        train_category_encoder: bool = False,
    ):
        output_keys = output_keys or dict(
            x="wav",
            x_len="wav_len",
            class_id="class_id",
            label="label",
            unique_name="id",
        )

        super().__init__(
            LoadAudio(
                audio_sample_rate=audio_sample_rate,
                audio_channel_reduction=audio_channel_reduction,
                sox_effects=sox_effects,
            ),
            EncodeCategory(train_category_encoder=train_category_encoder),
            SetOutputKeys(output_keys=output_keys),
        )


class UtteranceMultipleCategoryClassificationPipe(SequentialDataPipe):
    """
    each item in the input dataset should have:
        wav_path: str
        labels: List[str]
    """

    def __init__(
        self,
        output_keys: dict = None,
        audio_sample_rate: int = 16000,
        audio_channel_reduction: str = "first",
        sox_effects: list = None,
        train_category_encoder: bool = False,
    ):
        output_keys = output_keys or dict(
            x="wav",
            x_len="wav_len",
            class_ids="class_ids",
            labels="labels",
            unique_name="id",
        )

        super().__init__(
            LoadAudio(
                audio_sample_rate=audio_sample_rate,
                audio_channel_reduction=audio_channel_reduction,
                sox_effects=sox_effects,
            ),
            EncodeMultipleCategory(train_category_encoder=train_category_encoder),
            SetOutputKeys(output_keys=output_keys),
        )


class HearScenePipe(SequentialDataPipe):
    """
    each item in the input dataset should have:
        wav_path: str
        labels: List[str]
    """

    def __init__(
        self,
        output_keys: dict = None,
        audio_sample_rate: int = 16000,
        audio_channel_reduction: str = "first",
    ):
        output_keys = output_keys or dict(
            x="wav",
            x_len="wav_len",
            y="binary_labels",
            labels="labels",
            unique_name="id",
        )

        super().__init__(
            LoadAudio(
                audio_sample_rate=audio_sample_rate,
                audio_channel_reduction=audio_channel_reduction,
            ),
            EncodeMultiLabel(),
            SetOutputKeys(output_keys=output_keys),
        )