This model has 1 file scanned as unsafe.
- attn_layer_mapper=last, attn_loss_fn=mse, attn_weight=1.0, lr_scheduler_type=cosine, warmup_ratio=0.5
- attn_layer_mapper=layer-2, attn_loss_fn=mse, attn_weight=1.0, lr_scheduler_type=cosine, warmup_ratio=0.5
- dataset_sample_size=1000000, lr_scheduler_type=cosine, warmup_ratio=0.5
- dataset_sample_size=1000000
- dataset_subset=default, dataset_uri=distily_c4_multilingual_1M, lr_scheduler_type=cosine, warmup_ratio=0.5
- hs_layer_mapper=last, hs_loss_fn=mse, hs_weight=1.0, lr_scheduler_type=cosine, warmup_ratio=0.5
- hs_layer_mapper=layer-2, hs_loss_fn=mse, hs_weight=1.0, lr_scheduler_type=cosine, warmup_ratio=0.5
- lr_scheduler_type=cosine, warmup_ratio=0.5
- lr_scheduler_type=inverse_sqrt, warmup_ratio=0.5
- lr_scheduler_type=linear, warmup_ratio=0.5
-
0 Bytes
-
5.26 kB
LFS
-
3.78 MB
LFS
-
29.7 MB
LFS
-
588 Bytes
LFS