yangwang825 commited on
Commit
09285a2
·
verified ·
1 Parent(s): 7a03d8b

Upload config

Browse files
Files changed (3) hide show
  1. README.md +199 -0
  2. config.json +84 -0
  3. configuration_hubert_spkreg.py +257 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
config.json ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/hubert-base-ls960",
3
+ "activation_dropout": 0.1,
4
+ "apply_spec_augment": true,
5
+ "architectures": [
6
+ "HubertModel"
7
+ ],
8
+ "attention_dropout": 0.1,
9
+ "auto_map": {
10
+ "AutoConfig": "configuration_hubert_spkreg.HubertSpkRegConfig"
11
+ },
12
+ "bos_token_id": 1,
13
+ "classifier_proj_size": 256,
14
+ "conv_bias": false,
15
+ "conv_dim": [
16
+ 512,
17
+ 512,
18
+ 512,
19
+ 512,
20
+ 512,
21
+ 512,
22
+ 512
23
+ ],
24
+ "conv_kernel": [
25
+ 10,
26
+ 3,
27
+ 3,
28
+ 3,
29
+ 3,
30
+ 2,
31
+ 2
32
+ ],
33
+ "conv_stride": [
34
+ 5,
35
+ 2,
36
+ 2,
37
+ 2,
38
+ 2,
39
+ 2,
40
+ 2
41
+ ],
42
+ "ctc_loss_reduction": "sum",
43
+ "ctc_zero_infinity": false,
44
+ "do_stable_layer_norm": false,
45
+ "easy_margin": false,
46
+ "eos_token_id": 2,
47
+ "feat_extract_activation": "gelu",
48
+ "feat_extract_dropout": 0.0,
49
+ "feat_extract_norm": "group",
50
+ "feat_proj_dropout": 0.1,
51
+ "feat_proj_layer_norm": true,
52
+ "final_dropout": 0.1,
53
+ "gradient_checkpointing": false,
54
+ "hidden_act": "gelu",
55
+ "hidden_dropout": 0.1,
56
+ "hidden_dropout_prob": 0.1,
57
+ "hidden_size": 768,
58
+ "initializer_range": 0.02,
59
+ "intermediate_size": 3072,
60
+ "label_smoothing": 0.0,
61
+ "layer_norm_eps": 1e-05,
62
+ "layerdrop": 0.1,
63
+ "loss_fct": "cross_entropy",
64
+ "margin": 0.35,
65
+ "mask_feature_length": 10,
66
+ "mask_feature_min_masks": 0,
67
+ "mask_feature_prob": 0.0,
68
+ "mask_time_length": 10,
69
+ "mask_time_min_masks": 2,
70
+ "mask_time_prob": 0.05,
71
+ "model_type": "hubert_spkreg",
72
+ "num_attention_heads": 12,
73
+ "num_conv_pos_embedding_groups": 16,
74
+ "num_conv_pos_embeddings": 128,
75
+ "num_feat_extract_layers": 7,
76
+ "num_hidden_layers": 12,
77
+ "pad_token_id": 0,
78
+ "reduction": "mean",
79
+ "scale": 30.0,
80
+ "tokenizer_class": "Wav2Vec2CTCTokenizer",
81
+ "transformers_version": "4.46.2",
82
+ "use_weighted_layer_sum": false,
83
+ "vocab_size": 32
84
+ }
configuration_hubert_spkreg.py ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Hubert model configuration"""
2
+
3
+ import operator
4
+ import functools
5
+
6
+ from transformers.configuration_utils import PretrainedConfig
7
+ from transformers.utils import logging
8
+
9
+ logger = logging.get_logger(__name__)
10
+
11
+
12
+ class HubertSpkRegConfig(PretrainedConfig):
13
+ r"""
14
+ This is the configuration class to store the configuration of a [`HubertModel`]. It is used to instantiate an
15
+ Hubert model according to the specified arguments, defining the model architecture. Instantiating a configuration
16
+ with the defaults will yield a similar configuration to that of the Hubert
17
+ [facebook/hubert-base-ls960](https://huggingface.co/facebook/hubert-base-ls960) architecture.
18
+
19
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
20
+ documentation from [`PretrainedConfig`] for more information.
21
+
22
+
23
+ Args:
24
+ vocab_size (`int`, *optional*, defaults to 32):
25
+ Vocabulary size of the Hubert model. Defines the number of different tokens that can be represented by the
26
+ `inputs_ids` passed when calling [`HubertModel`]. Vocabulary size of the model. Defines the different
27
+ tokens that can be represented by the *inputs_ids* passed to the forward method of [`HubertModel`].
28
+ hidden_size (`int`, *optional*, defaults to 768):
29
+ Dimensionality of the encoder layers and the pooler layer.
30
+ num_hidden_layers (`int`, *optional*, defaults to 12):
31
+ Number of hidden layers in the Transformer encoder.
32
+ num_attention_heads (`int`, *optional*, defaults to 12):
33
+ Number of attention heads for each attention layer in the Transformer encoder.
34
+ intermediate_size (`int`, *optional*, defaults to 3072):
35
+ Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
36
+ hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
37
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
38
+ `"relu"`, `"selu"` and `"gelu_new"` are supported.
39
+ hidden_dropout(`float`, *optional*, defaults to 0.1):
40
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
41
+ activation_dropout (`float`, *optional*, defaults to 0.1):
42
+ The dropout ratio for activations inside the fully connected layer.
43
+ attention_dropout(`float`, *optional*, defaults to 0.1):
44
+ The dropout ratio for the attention probabilities.
45
+ final_dropout (`float`, *optional*, defaults to 0.1):
46
+ The dropout probability for the final projection layer of [`Wav2Vec2ForCTC`].
47
+ layerdrop (`float`, *optional*, defaults to 0.1):
48
+ The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
49
+ details.
50
+ initializer_range (`float`, *optional*, defaults to 0.02):
51
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
52
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
53
+ The epsilon used by the layer normalization layers.
54
+ feat_extract_norm (`str`, *optional*, defaults to `"group"`):
55
+ The norm to be applied to 1D convolutional layers in feature encoder. One of `"group"` for group
56
+ normalization of only the first 1D convolutional layer or `"layer"` for layer normalization of all 1D
57
+ convolutional layers.
58
+ feat_proj_dropout (`float`, *optional*, defaults to 0.0):
59
+ The dropout probability for output of the feature encoder.
60
+ feat_proj_layer_norm (`bool`, *optional*, defaults to `True`):
61
+ Whether to apply LayerNorm to the output of the feature encoder.
62
+ feat_extract_activation (`str, `optional`, defaults to `"gelu"`):
63
+ The non-linear activation function (function or string) in the 1D convolutional layers of the feature
64
+ extractor. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
65
+ conv_dim (`Tuple[int]`, *optional*, defaults to `(512, 512, 512, 512, 512, 512, 512)`):
66
+ A tuple of integers defining the number of input and output channels of each 1D convolutional layer in the
67
+ feature encoder. The length of *conv_dim* defines the number of 1D convolutional layers.
68
+ conv_stride (`Tuple[int]`, *optional*, defaults to `(5, 2, 2, 2, 2, 2, 2)`):
69
+ A tuple of integers defining the stride of each 1D convolutional layer in the feature encoder. The length
70
+ of *conv_stride* defines the number of convolutional layers and has to match the length of *conv_dim*.
71
+ conv_kernel (`Tuple[int]`, *optional*, defaults to `(10, 3, 3, 3, 3, 3, 3)`):
72
+ A tuple of integers defining the kernel size of each 1D convolutional layer in the feature encoder. The
73
+ length of *conv_kernel* defines the number of convolutional layers and has to match the length of
74
+ *conv_dim*.
75
+ conv_bias (`bool`, *optional*, defaults to `False`):
76
+ Whether the 1D convolutional layers have a bias.
77
+ num_conv_pos_embeddings (`int`, *optional*, defaults to 128):
78
+ Number of convolutional positional embeddings. Defines the kernel size of 1D convolutional positional
79
+ embeddings layer.
80
+ num_conv_pos_embedding_groups (`int`, *optional*, defaults to 16):
81
+ Number of groups of 1D convolutional positional embeddings layer.
82
+ do_stable_layer_norm (`bool`, *optional*, defaults to `False`):
83
+ Whether do apply *stable* layer norm architecture of the Transformer encoder. `do_stable_layer_norm is
84
+ True` corresponds to applying layer norm before the attention layer, whereas `do_stable_layer_norm is
85
+ False` corresponds to applying layer norm after the attention layer.
86
+ apply_spec_augment (`bool`, *optional*, defaults to `True`):
87
+ Whether to apply *SpecAugment* data augmentation to the outputs of the feature encoder. For reference see
88
+ [SpecAugment: A Simple Data Augmentation Method for Automatic Speech
89
+ Recognition](https://arxiv.org/abs/1904.08779).
90
+ mask_time_prob (`float`, *optional*, defaults to 0.05):
91
+ Percentage (between 0 and 1) of all feature vectors along the time axis which will be masked. The masking
92
+ procecure generates ''mask_time_prob*len(time_axis)/mask_time_length'' independent masks over the axis. If
93
+ reasoning from the propability of each feature vector to be chosen as the start of the vector span to be
94
+ masked, *mask_time_prob* should be `prob_vector_start*mask_time_length`. Note that overlap may decrease the
95
+ actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
96
+ mask_time_length (`int`, *optional*, defaults to 10):
97
+ Length of vector span along the time axis.
98
+ mask_time_min_masks (`int`, *optional*, defaults to 2),:
99
+ The minimum number of masks of length `mask_feature_length` generated along the time axis, each time step,
100
+ irrespectively of `mask_feature_prob`. Only relevant if ''mask_time_prob*len(time_axis)/mask_time_length <
101
+ mask_time_min_masks''
102
+ mask_feature_prob (`float`, *optional*, defaults to 0.0):
103
+ Percentage (between 0 and 1) of all feature vectors along the feature axis which will be masked. The
104
+ masking procecure generates ''mask_feature_prob*len(feature_axis)/mask_time_length'' independent masks over
105
+ the axis. If reasoning from the propability of each feature vector to be chosen as the start of the vector
106
+ span to be masked, *mask_feature_prob* should be `prob_vector_start*mask_feature_length`. Note that overlap
107
+ may decrease the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is
108
+ True`.
109
+ mask_feature_length (`int`, *optional*, defaults to 10):
110
+ Length of vector span along the feature axis.
111
+ mask_feature_min_masks (`int`, *optional*, defaults to 0),:
112
+ The minimum number of masks of length `mask_feature_length` generated along the feature axis, each time
113
+ step, irrespectively of `mask_feature_prob`. Only relevant if
114
+ ''mask_feature_prob*len(feature_axis)/mask_feature_length < mask_feature_min_masks''
115
+ ctc_loss_reduction (`str`, *optional*, defaults to `"sum"`):
116
+ Specifies the reduction to apply to the output of `torch.nn.CTCLoss`. Only relevant when training an
117
+ instance of [`HubertForCTC`].
118
+ ctc_zero_infinity (`bool`, *optional*, defaults to `False`):
119
+ Whether to zero infinite losses and the associated gradients of `torch.nn.CTCLoss`. Infinite losses mainly
120
+ occur when the inputs are too short to be aligned to the targets. Only relevant when training an instance
121
+ of [`HubertForCTC`].
122
+ use_weighted_layer_sum (`bool`, *optional*, defaults to `False`):
123
+ Whether to use a weighted average of layer outputs with learned weights. Only relevant when using an
124
+ instance of [`HubertForSequenceClassification`].
125
+ classifier_proj_size (`int`, *optional*, defaults to 256):
126
+ Dimensionality of the projection before token mean-pooling for classification.
127
+
128
+ Example:
129
+
130
+ ```python
131
+ >>> from transformers import HubertModel, HubertConfig
132
+
133
+ >>> # Initializing a Hubert facebook/hubert-base-ls960 style configuration
134
+ >>> configuration = HubertConfig()
135
+
136
+ >>> # Initializing a model from the facebook/hubert-base-ls960 style configuration
137
+ >>> model = HubertModel(configuration)
138
+
139
+ >>> # Accessing the model configuration
140
+ >>> configuration = model.config
141
+ ```"""
142
+
143
+ model_type = "hubert_spkreg"
144
+
145
+ def __init__(
146
+ self,
147
+ vocab_size=32,
148
+ hidden_size=768,
149
+ num_hidden_layers=12,
150
+ num_attention_heads=12,
151
+ intermediate_size=3072,
152
+ hidden_act="gelu",
153
+ hidden_dropout=0.1,
154
+ activation_dropout=0.1,
155
+ attention_dropout=0.1,
156
+ feat_proj_layer_norm=True,
157
+ feat_proj_dropout=0.0,
158
+ final_dropout=0.1,
159
+ layerdrop=0.1,
160
+ initializer_range=0.02,
161
+ layer_norm_eps=1e-5,
162
+ feat_extract_norm="group",
163
+ feat_extract_activation="gelu",
164
+ conv_dim=(512, 512, 512, 512, 512, 512, 512),
165
+ conv_stride=(5, 2, 2, 2, 2, 2, 2),
166
+ conv_kernel=(10, 3, 3, 3, 3, 2, 2),
167
+ conv_bias=False,
168
+ num_conv_pos_embeddings=128,
169
+ num_conv_pos_embedding_groups=16,
170
+ do_stable_layer_norm=False,
171
+ apply_spec_augment=True,
172
+ mask_time_prob=0.05,
173
+ mask_time_length=10,
174
+ mask_time_min_masks=2,
175
+ mask_feature_prob=0.0,
176
+ mask_feature_length=10,
177
+ mask_feature_min_masks=0,
178
+ ctc_loss_reduction="sum",
179
+ ctc_zero_infinity=False,
180
+ use_weighted_layer_sum=False,
181
+ classifier_proj_size=256,
182
+ pad_token_id=0,
183
+ bos_token_id=1,
184
+ eos_token_id=2,
185
+ loss_fct: str = 'cross_entropy', # cross_entropy, additive_margin, additive_angular_margin
186
+ label_smoothing: float = 0.0,
187
+ scale: float = 30.0,
188
+ margin: float = 0.35,
189
+ easy_margin: bool = False,
190
+ reduction: str = "mean",
191
+ **kwargs,
192
+ ):
193
+ super().__init__(**kwargs, pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id)
194
+ self.hidden_size = hidden_size
195
+ self.feat_extract_norm = feat_extract_norm
196
+ self.feat_extract_activation = feat_extract_activation
197
+ self.conv_dim = list(conv_dim)
198
+ self.conv_stride = list(conv_stride)
199
+ self.conv_kernel = list(conv_kernel)
200
+ self.conv_bias = conv_bias
201
+ self.num_conv_pos_embeddings = num_conv_pos_embeddings
202
+ self.num_conv_pos_embedding_groups = num_conv_pos_embedding_groups
203
+ self.num_feat_extract_layers = len(self.conv_dim)
204
+ self.num_hidden_layers = num_hidden_layers
205
+ self.intermediate_size = intermediate_size
206
+ self.hidden_act = hidden_act
207
+ self.num_attention_heads = num_attention_heads
208
+ self.hidden_dropout = hidden_dropout
209
+ self.attention_dropout = attention_dropout
210
+ self.activation_dropout = activation_dropout
211
+ self.feat_proj_layer_norm = feat_proj_layer_norm
212
+ self.feat_proj_dropout = feat_proj_dropout
213
+ self.final_dropout = final_dropout
214
+ self.layerdrop = layerdrop
215
+ self.layer_norm_eps = layer_norm_eps
216
+ self.initializer_range = initializer_range
217
+ self.vocab_size = vocab_size
218
+ self.do_stable_layer_norm = do_stable_layer_norm
219
+ self.use_weighted_layer_sum = use_weighted_layer_sum
220
+ self.classifier_proj_size = classifier_proj_size
221
+
222
+ if (
223
+ (len(self.conv_stride) != self.num_feat_extract_layers)
224
+ or (len(self.conv_kernel) != self.num_feat_extract_layers)
225
+ or (len(self.conv_dim) != self.num_feat_extract_layers)
226
+ ):
227
+ raise ValueError(
228
+ "Configuration for convolutional layers is incorrect. It is required that `len(config.conv_dim)` =="
229
+ " `len(config.conv_stride)` == `len(config.conv_kernel)`, but is `len(config.conv_dim) ="
230
+ f" {len(self.conv_dim)}`, `len(config.conv_stride) = {len(self.conv_stride)}`,"
231
+ f" `len(config.conv_kernel) = {len(self.conv_kernel)}`."
232
+ )
233
+
234
+ # fine-tuning config parameters for SpecAugment: https://arxiv.org/abs/1904.08779
235
+ self.apply_spec_augment = apply_spec_augment
236
+ self.mask_time_prob = mask_time_prob
237
+ self.mask_time_length = mask_time_length
238
+ self.mask_time_min_masks = mask_time_min_masks
239
+ self.mask_feature_prob = mask_feature_prob
240
+ self.mask_feature_length = mask_feature_length
241
+ self.mask_feature_min_masks = mask_feature_min_masks
242
+
243
+ # ctc loss
244
+ self.ctc_loss_reduction = ctc_loss_reduction
245
+ self.ctc_zero_infinity = ctc_zero_infinity
246
+
247
+ # Loss function parameters. Feel free to ignore for other classes.
248
+ self.loss_fct = loss_fct
249
+ self.label_smoothing = label_smoothing
250
+ self.scale = scale
251
+ self.margin = margin
252
+ self.easy_margin = easy_margin
253
+ self.reduction = reduction
254
+
255
+ @property
256
+ def inputs_to_logits_ratio(self):
257
+ return functools.reduce(operator.mul, self.conv_stride, 1)