2023-11-13 11:54:38,240 - INFO - Vocabulary size: 64 2023-11-13 11:54:39,229 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:39,507 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:39,908 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:39,909 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:39,910 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:39,911 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:42,016 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:42,016 - INFO - Normalizer: GlobalMVN(stats_file=models/hindi/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:42,024 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(64, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:42,024 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:43,980 - INFO - Vocabulary size: 62 2023-11-13 11:54:43,986 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:44,082 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:44,282 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:44,283 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:44,284 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:44,285 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:44,413 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:44,413 - INFO - Normalizer: GlobalMVN(stats_file=models/malayalam/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:44,413 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(62, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:44,414 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:44,738 - INFO - Vocabulary size: 38 2023-11-13 11:54:44,744 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:44,823 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:45,008 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,008 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,008 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:45,009 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,010 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:45,011 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:45,123 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:45,123 - INFO - Normalizer: GlobalMVN(stats_file=models/manipuri/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:45,124 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(38, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:45,124 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:45,553 - INFO - Vocabulary size: 61 2023-11-13 11:54:45,558 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:45,643 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,840 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:45,841 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:45,842 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:45,843 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:45,937 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:45,937 - INFO - Normalizer: GlobalMVN(stats_file=models/marathi/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:45,938 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(61, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:45,938 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:46,249 - INFO - Vocabulary size: 55 2023-11-13 11:54:46,256 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:46,335 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:46,524 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:46,524 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:46,524 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:46,524 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:46,524 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:46,524 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:46,524 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:46,524 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:46,525 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:46,526 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:46,527 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:46,528 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:46,528 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:46,528 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:46,528 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:46,528 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:46,528 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:46,528 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:46,620 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:46,621 - INFO - Normalizer: GlobalMVN(stats_file=models/kannada/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:46,621 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(55, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:46,621 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:47,186 - INFO - Vocabulary size: 57 2023-11-13 11:54:47,192 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:47,279 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:47,471 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:47,472 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:47,473 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:47,474 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:47,582 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:47,582 - INFO - Normalizer: GlobalMVN(stats_file=models/english/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:47,583 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(57, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:47,583 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=None, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:47,911 - INFO - Vocabulary size: 52 2023-11-13 11:54:47,919 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:48,000 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,183 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:48,184 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,185 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:48,186 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:48,312 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:48,312 - INFO - Normalizer: GlobalMVN(stats_file=models/assamese/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:48,313 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(52, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:48,313 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:48,635 - INFO - Vocabulary size: 52 2023-11-13 11:54:48,642 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:48,719 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,905 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:48,906 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,907 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:48,908 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:49,014 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:49,014 - INFO - Normalizer: GlobalMVN(stats_file=models/tamil/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:49,015 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(52, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:49,015 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:49,377 - INFO - Vocabulary size: 56 2023-11-13 11:54:49,384 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:49,467 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:49,649 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:49,650 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:49,651 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:49,652 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:49,757 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:49,757 - INFO - Normalizer: GlobalMVN(stats_file=models/odia/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:49,758 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(56, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:49,758 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:50,090 - INFO - Vocabulary size: 59 2023-11-13 11:54:50,097 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:50,175 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:50,356 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:50,357 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:50,358 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:50,359 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:50,481 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:50,481 - INFO - Normalizer: GlobalMVN(stats_file=models/rajasthani/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:50,482 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(59, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:50,482 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:50,812 - INFO - Vocabulary size: 56 2023-11-13 11:54:50,818 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:50,909 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:51,089 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:51,090 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:51,091 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:51,092 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:51,194 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:51,194 - INFO - Normalizer: GlobalMVN(stats_file=models/telugu/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:51,195 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(56, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:51,195 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:51,537 - INFO - Vocabulary size: 52 2023-11-13 11:54:51,544 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:51,618 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:51,803 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:51,804 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:51,805 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:51,806 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:51,924 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:51,924 - INFO - Normalizer: GlobalMVN(stats_file=models/bengali/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:51,925 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(52, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:51,925 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:52,290 - INFO - Vocabulary size: 56 2023-11-13 11:54:52,297 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:52,371 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:52,553 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:52,553 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:52,553 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:52,554 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:52,555 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:52,556 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:52,557 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:52,557 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:52,557 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:52,557 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:52,557 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:52,557 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:52,557 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:52,557 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:52,658 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:52,659 - INFO - Normalizer: GlobalMVN(stats_file=models/gujarati/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:52,659 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(56, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:52,659 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:53,027 - INFO - Vocabulary size: 73 2023-11-13 11:54:53,033 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:53,117 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:53,297 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:53,298 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:53,299 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:53,300 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:53,394 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:53,395 - INFO - Normalizer: GlobalMVN(stats_file=models/punjabi/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:53,395 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(73, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:53,395 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=None, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:53,802 - INFO - Vocabulary size: 88 2023-11-13 11:54:53,808 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:53,893 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:54,075 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,075 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,075 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,075 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,075 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,075 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:54,076 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,077 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:54,078 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:54,213 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:54,213 - INFO - Normalizer: GlobalMVN(stats_file=models/urdu/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:54,214 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(88, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:54,214 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:54,554 - INFO - Vocabulary size: 64 2023-11-13 11:54:54,560 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:54,645 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,831 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:54,832 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:54,833 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:54,834 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:54,834 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:54,834 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:54,834 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:54,834 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:54,834 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:54,834 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:54,834 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:54,955 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:54,955 - INFO - Normalizer: GlobalMVN(stats_file=models/hindi/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:54,956 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(64, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:54,956 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:55,302 - INFO - Vocabulary size: 62 2023-11-13 11:54:55,308 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:55,390 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:55,578 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:55,578 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:55,578 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:55,578 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:55,578 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:55,578 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:55,578 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:55,579 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:55,580 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:55,581 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:55,688 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:55,688 - INFO - Normalizer: GlobalMVN(stats_file=models/malayalam/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:55,689 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(62, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:55,689 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:56,026 - INFO - Vocabulary size: 42 2023-11-13 11:54:56,032 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:56,122 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:56,307 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:56,308 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:56,309 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:56,426 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:56,426 - INFO - Normalizer: GlobalMVN(stats_file=models/manipuri/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:56,427 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(42, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:56,427 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:56,811 - INFO - Vocabulary size: 63 2023-11-13 11:54:56,817 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:56,904 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:57,084 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,084 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:57,085 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,086 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:57,087 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:57,221 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:57,222 - INFO - Normalizer: GlobalMVN(stats_file=models/marathi/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:57,222 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(63, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:57,222 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:57,573 - INFO - Vocabulary size: 54 2023-11-13 11:54:57,579 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:57,655 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:57,838 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:57,839 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:57,840 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:57,964 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:57,964 - INFO - Normalizer: GlobalMVN(stats_file=models/kannada/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:57,964 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(54, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:57,964 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:58,298 - INFO - Vocabulary size: 58 2023-11-13 11:54:58,304 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:58,388 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:58,570 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:58,571 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:58,572 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:58,573 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:58,573 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:58,573 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:58,573 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:58,573 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:58,573 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:58,673 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:58,673 - INFO - Normalizer: GlobalMVN(stats_file=models/bodo/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:58,674 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(58, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:58,674 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:59,016 - INFO - Vocabulary size: 58 2023-11-13 11:54:59,022 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:59,108 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,290 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:54:59,291 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:54:59,292 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:54:59,393 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:54:59,393 - INFO - Normalizer: GlobalMVN(stats_file=models/english/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:54:59,394 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(58, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:54:59,394 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=None, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:54:59,728 - INFO - Vocabulary size: 52 2023-11-13 11:54:59,734 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:59,819 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:54:59,999 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:54:59,999 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:54:59,999 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:54:59,999 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:54:59,999 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:00,000 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,001 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:00,002 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:00,003 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:00,003 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:00,003 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:00,003 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:00,003 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:00,003 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:00,108 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:00,108 - INFO - Normalizer: GlobalMVN(stats_file=models/assamese/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:00,108 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(52, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:00,108 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:00,440 - INFO - Vocabulary size: 46 2023-11-13 11:55:00,446 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:00,526 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:00,709 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,709 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,709 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,709 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,709 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,709 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,709 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:00,710 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,711 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:00,712 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:00,713 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:00,713 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:00,713 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:00,713 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:00,713 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:00,822 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:00,822 - INFO - Normalizer: GlobalMVN(stats_file=models/tamil/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:00,822 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(46, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:00,822 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:01,145 - INFO - Vocabulary size: 55 2023-11-13 11:55:01,151 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:01,238 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:01,418 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:01,419 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:01,420 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:01,421 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:01,530 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:01,530 - INFO - Normalizer: GlobalMVN(stats_file=models/odia/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:01,531 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(55, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:01,531 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:01,870 - INFO - Vocabulary size: 54 2023-11-13 11:55:01,877 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:01,963 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,145 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:02,146 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:02,147 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:02,148 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:02,148 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:02,148 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:02,148 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:02,148 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:02,254 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:02,254 - INFO - Normalizer: GlobalMVN(stats_file=models/rajasthani/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:02,255 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(54, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:02,255 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:02,624 - INFO - Vocabulary size: 56 2023-11-13 11:55:02,630 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:02,715 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,899 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,900 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:02,901 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:03,002 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:03,002 - INFO - Normalizer: GlobalMVN(stats_file=models/telugu/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:03,003 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(56, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:03,003 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:03,330 - INFO - Vocabulary size: 52 2023-11-13 11:55:03,336 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:03,421 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:03,606 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:03,607 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:03,608 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:03,609 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:03,710 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:03,710 - INFO - Normalizer: GlobalMVN(stats_file=models/bengali/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:03,711 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(52, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:03,711 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:04,079 - INFO - Vocabulary size: 56 2023-11-13 11:55:04,085 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:04,165 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:04,347 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:04,348 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:04,349 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:04,350 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:04,350 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:04,350 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:04,350 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:04,350 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:04,452 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:04,453 - INFO - Normalizer: GlobalMVN(stats_file=models/gujarati/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:04,453 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(56, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:04,453 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:04,778 - INFO - Vocabulary size: 73 2023-11-13 11:55:04,784 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:04,866 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,052 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,053 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:05,054 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:05,167 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:05,167 - INFO - Normalizer: GlobalMVN(stats_file=models/punjabi/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:05,168 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(73, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:05,168 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=None, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:05,554 - INFO - Vocabulary size: 88 2023-11-13 11:55:05,560 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:05,649 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:05,838 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,838 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:05,839 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,840 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:05,841 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:05,956 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:05,956 - INFO - Normalizer: GlobalMVN(stats_file=models/urdu/female/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:05,957 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(88, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:05,957 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:06,076 - INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:5000 * Running on http://10.9.240.220:5000 2023-11-13 11:55:06,076 - INFO - Press CTRL+C to quit 2023-11-13 11:55:06,076 - INFO - * Restarting with stat 2023-11-13 11:55:22,896 - INFO - Vocabulary size: 64 2023-11-13 11:55:47,201 - INFO - Vocabulary size: 64 2023-11-13 11:55:48,190 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:48,474 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:48,862 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:48,863 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:48,864 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:48,865 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:48,866 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:50,965 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:50,966 - INFO - Normalizer: GlobalMVN(stats_file=models/hindi/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:50,967 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(64, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:50,967 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:52,941 - INFO - Vocabulary size: 62 2023-11-13 11:55:52,947 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:53,045 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,250 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:53,251 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,252 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:53,253 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:53,383 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:53,383 - INFO - Normalizer: GlobalMVN(stats_file=models/malayalam/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:53,384 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(62, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:53,384 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:53,707 - INFO - Vocabulary size: 38 2023-11-13 11:55:53,713 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:53,795 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:53,986 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:53,987 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,988 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:53,989 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:54,091 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:54,091 - INFO - Normalizer: GlobalMVN(stats_file=models/manipuri/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:54,092 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(38, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:54,092 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:54,519 - INFO - Vocabulary size: 61 2023-11-13 11:55:54,525 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:54,618 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:54,804 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:54,805 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:54,806 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:54,807 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:54,924 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:54,924 - INFO - Normalizer: GlobalMVN(stats_file=models/marathi/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:54,925 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(61, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:54,925 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:55,241 - INFO - Vocabulary size: 55 2023-11-13 11:55:55,247 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:55,323 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:55,507 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:55,508 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:55,509 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:55,510 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:55,510 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:55,510 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:55,510 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:55,510 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:55,510 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:55,510 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:55,510 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:55,611 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:55,611 - INFO - Normalizer: GlobalMVN(stats_file=models/kannada/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:55,612 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(55, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:55,612 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:56,119 - INFO - Vocabulary size: 57 2023-11-13 11:55:56,125 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:56,203 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:56,401 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:56,402 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:56,403 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:56,404 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:56,528 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:56,529 - INFO - Normalizer: GlobalMVN(stats_file=models/english/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:56,529 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(57, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:56,529 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=None, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:56,847 - INFO - Vocabulary size: 52 2023-11-13 11:55:56,854 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:56,937 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,127 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:57,128 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,129 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:57,130 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:57,252 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:57,252 - INFO - Normalizer: GlobalMVN(stats_file=models/assamese/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:57,252 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(52, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:57,252 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:57,589 - INFO - Vocabulary size: 52 2023-11-13 11:55:57,596 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:57,683 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:57,863 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,863 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,863 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:57,864 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:57,865 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:57,866 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:57,967 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:57,968 - INFO - Normalizer: GlobalMVN(stats_file=models/tamil/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:57,968 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(52, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:57,968 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:58,326 - INFO - Vocabulary size: 56 2023-11-13 11:55:58,332 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:58,415 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:58,595 - INFO - Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize encoder.after_norm.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize duration_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize duration_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize duration_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize duration_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize duration_predictor.linear.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize pitch_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize pitch_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize pitch_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize pitch_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize pitch_predictor.conv.2.0.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize pitch_predictor.conv.2.2.bias to zeros 2023-11-13 11:55:58,596 - INFO - Initialize pitch_predictor.conv.3.0.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize pitch_predictor.conv.3.2.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize pitch_predictor.conv.4.0.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize pitch_predictor.conv.4.2.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize pitch_predictor.linear.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize pitch_embed.0.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize energy_predictor.conv.0.0.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize energy_predictor.conv.0.2.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize energy_predictor.conv.1.0.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize energy_predictor.conv.1.2.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize energy_predictor.linear.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize energy_embed.0.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.0.self_attn.linear_q.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.0.self_attn.linear_k.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.0.self_attn.linear_v.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.0.self_attn.linear_out.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.0.feed_forward.w_1.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.0.feed_forward.w_2.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.0.norm1.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.0.norm2.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.1.self_attn.linear_q.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.1.self_attn.linear_k.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.1.self_attn.linear_v.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.1.self_attn.linear_out.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.1.feed_forward.w_1.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.1.feed_forward.w_2.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.1.norm1.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.1.norm2.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.2.self_attn.linear_q.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.2.self_attn.linear_k.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.2.self_attn.linear_v.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.2.self_attn.linear_out.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.2.feed_forward.w_1.bias to zeros 2023-11-13 11:55:58,597 - INFO - Initialize decoder.encoders.2.feed_forward.w_2.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.2.norm1.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.2.norm2.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.3.self_attn.linear_q.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.3.self_attn.linear_k.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.3.self_attn.linear_v.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.3.self_attn.linear_out.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.3.feed_forward.w_1.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.3.feed_forward.w_2.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.3.norm1.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.encoders.3.norm2.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize decoder.after_norm.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize feat_out.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize postnet.postnet.0.1.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize postnet.postnet.1.1.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize postnet.postnet.2.1.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize postnet.postnet.3.1.bias to zeros 2023-11-13 11:55:58,598 - INFO - Initialize postnet.postnet.4.1.bias to zeros 2023-11-13 11:55:58,707 - INFO - Extractor: LogMelFbank( (stft): Stft(n_fft=1024, win_length=1024, hop_length=256, center=True, normalized=False, onesided=True) (logmel): LogMel(sr=22050, n_fft=1024, n_mels=80, fmin=0, fmax=8000, htk=False) ) 2023-11-13 11:55:58,707 - INFO - Normalizer: GlobalMVN(stats_file=models/odia/male/model/feats_stats.npz, norm_means=True, norm_vars=True) 2023-11-13 11:55:58,708 - INFO - TTS: FastSpeech2( (encoder): Encoder( (embed): Sequential( (0): Embedding(56, 384, padding_idx=0) (1): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (duration_predictor): DurationPredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1-4): 4 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (pitch_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (energy_predictor): VariancePredictor( (conv): ModuleList( (0): Sequential( (0): Conv1d(384, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) (1): Sequential( (0): Conv1d(256, 256, kernel_size=(3,), stride=(1,), padding=(1,)) (1): ReLU() (2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (3): Dropout(p=0.5, inplace=False) ) ) (linear): Linear(in_features=256, out_features=1, bias=True) ) (energy_embed): Sequential( (0): Conv1d(1, 384, kernel_size=(1,), stride=(1,)) (1): Dropout(p=0.0, inplace=False) ) (length_regulator): LengthRegulator() (decoder): Encoder( (embed): Sequential( (0): ScaledPositionalEncoding( (dropout): Dropout(p=0.2, inplace=False) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=384, out_features=384, bias=True) (linear_k): Linear(in_features=384, out_features=384, bias=True) (linear_v): Linear(in_features=384, out_features=384, bias=True) (linear_out): Linear(in_features=384, out_features=384, bias=True) (dropout): Dropout(p=0.2, inplace=False) ) (feed_forward): MultiLayeredConv1d( (w_1): Conv1d(384, 1536, kernel_size=(3,), stride=(1,), padding=(1,)) (w_2): Conv1d(1536, 384, kernel_size=(3,), stride=(1,), padding=(1,)) (dropout): Dropout(p=0.2, inplace=False) ) (norm1): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((384,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.2, inplace=False) ) ) (after_norm): LayerNorm((384,), eps=1e-12, elementwise_affine=True) ) (feat_out): Linear(in_features=384, out_features=80, bias=True) (postnet): Postnet( (postnet): ModuleList( (0): Sequential( (0): Conv1d(80, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (1-3): 3 x Sequential( (0): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Tanh() (3): Dropout(p=0.5, inplace=False) ) (4): Sequential( (0): Conv1d(256, 80, kernel_size=(5,), stride=(1,), padding=(2,), bias=False) (1): BatchNorm1d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Dropout(p=0.5, inplace=False) ) ) ) (criterion): FastSpeech2Loss( (l1_criterion): L1Loss() (mse_criterion): MSELoss() (duration_criterion): DurationPredictorLoss( (criterion): MSELoss() ) ) ) 2023-11-13 11:55:58,708 - INFO - Vocoder: Spectrogram2Waveform(n_fft=1024, n_shift=256, win_length=1024, window=hann, n_iter=8, fs=22050, n_mels=80, fmin=0, fmax=8000, ) 2023-11-13 11:55:59,017 - INFO - Vocabulary size: 59 2023-11-13 11:55:59,024 - INFO - encoder self-attention layer type = self-attention 2023-11-13 11:55:59,272 - INFO - Vocabulary size: 56 2023-11-13 11:55:59,279 - INFO - encoder self-attention layer type = self-attention