prosody_gtsc_phi-3-mini-energy
Ground truth text with prosody encoding residual cross attention multi-label DAC
Model description
Prosody encoder: 2 layer transformer encoder with initial dense projection
Backbone: Phi 3 mini
Pooling: Self attention
Multi-label classification head: 2 dense layers with two dropouts 0.3 and Tanh activation inbetween
Training and evaluation data
Trained on ground truth.
Evaluated on ground truth (GT) and normalized Whisper small transcripts (E2E).
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 41
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Dataset used to train Masioki/prosody_gtsc_phi-3-mini-energy
Evaluation results
- F1 macro E2E on asapp/slue-phase-2self-reported67.270
- F1 macro GT on asapp/slue-phase-2self-reported72.730