zohirjonsharipov
/

xls-r-uzbek-cv8

@@ -38,7 +38,7 @@ model-index:
 # XLS-R-300M Uzbek CV8
-Ushbu model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) asosida MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - UZ datasetidan foydalangan holda fine-tuning qilingan.
 Model quydagi natijalarga erishgan:
 - Loss: 0.3063
 - Wer: 0.3852
@@ -48,28 +48,29 @@ Model quydagi natijalarga erishgan:
 Model arxitekturasi haqida ko'prom ma'lumot olish uchun ushbu [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) havola orqali o'ting
-The model vocabulary consists of the [Modern Latin alphabet for Uzbek](https://en.wikipedia.org/wiki/Uzbek_alphabet), with punctuation removed.
 Note that the characters <‘> and <’> do not count as punctuation, as <‘> modifies \<o\> and \<g\>, and <’> indicates the glottal stop or a long vowel.
-The decoder uses a kenlm language model built on common_voice text.
-## Intended uses & limitations
-This model is expected to be of some utility for low-fidelity use cases such as:
-- Draft video captions
-- Indexing of recorded broadcasts
-The model is not reliable enough to use as a substitute for live captions for accessibility purposes, and it should not be used in a manner that would infringe the privacy of any of the contributors to the Common Voice dataset nor any other speakers.
-## Training and evaluation data
 The 50% of the `train` common voice official split was used as training data. The 50% of the official `dev` split was used as validation data, and the full `test` set was used for final evaluation of the model without LM, while the model with LM was evaluated only on 500 examples from the `test` set.
 The kenlm language model was compiled from the target sentences of the train + other dataset splits.
-### Training hyperparameters
-The following hyperparameters were used during training:
 - learning_rate: 3e-05
 - train_batch_size: 32
 - eval_batch_size: 8
@@ -82,7 +83,7 @@ The following hyperparameters were used during training:
 - num_epochs: 100.0
 - mixed_precision_training: Native AMP
-### Training results
 | Training Loss | Epoch | Step  | Validation Loss | Wer    | Cer    |
 |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|

 # XLS-R-300M Uzbek CV8
+Ushbu model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) asosida MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - UZ datasetidan foydalangan holda Transfer Learning usuli orqali ngramm modeli asosida o'zbek tili uchun fine-tuning qilingan.
 Model quydagi natijalarga erishgan:
 - Loss: 0.3063
 - Wer: 0.3852
 Model arxitekturasi haqida ko'prom ma'lumot olish uchun ushbu [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) havola orqali o'ting
+Ushbu modelning lugʻati oʻzbek tili  zamonaviy lotin alifbosidan iborat boʻlib, tinish belgilari olib tashlangan(https://en.wikipedia.org/wiki/Uzbek_alphabet).
 Note that the characters <‘> and <’> do not count as punctuation, as <‘> modifies \<o\> and \<g\>, and <’> indicates the glottal stop or a long vowel.
+Shuni ta'kidlash kerakki, <‘> va <’> belgilar tinish belgisi sifatida hisoblanmaydi, qachonki mana shunday belgilar \<o\> va \<g\> dan so'ng kelganda ularni  <‘> bilan o‘zgartirilgan.
+Dekoder common_voice matniga asoslangan kenlm tili modelidan foydalanadi.
+## Foydalanish yo'nalishilari va cheklovlar
+Ushbu model quyidagi foydalanish holatlari uchun foydali bo'lishi kutilmoqda:
+- Video subtitr uchun
+- yozib olingan eshittirishlarni indekslash
+Model jonli efirdagi uchrashuvlar yoki ko'rsatuvlarni subtitrini aniqlash uchun kerakli ravishda mos emas va undan Common Voice maʼlumotlar toʻplamiga yoki boshqa hissa qoʻshuvchilarning shaxsiy hayotini xafvga qo'yadigan holatlar uchun ishlatilmasligi kerak.
+## Training va baholash ma'lumotlari
 The 50% of the `train` common voice official split was used as training data. The 50% of the official `dev` split was used as validation data, and the full `test` set was used for final evaluation of the model without LM, while the model with LM was evaluated only on 500 examples from the `test` set.
 The kenlm language model was compiled from the target sentences of the train + other dataset splits.
+### Training giperparametrlari
+Training jarayonida quyidagi giperparametrlardan foydalanildi:
 - learning_rate: 3e-05
 - train_batch_size: 32
 - eval_batch_size: 8
 - num_epochs: 100.0
 - mixed_precision_training: Native AMP
+### Training natijalari
 | Training Loss | Epoch | Step  | Validation Loss | Wer    | Cer    |
 |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|