lang-recogn-model / README.md
spolivin's picture
Update README.md
c41f4fb
|
raw
history blame
2.07 kB
metadata
base_model: bert-base-multilingual-uncased
model-index:
  - name: lang-recogn-model
    results:
      - task:
          type: text-classification
        dataset:
          name: language-detection
          type: language-detection
        metrics:
          - name: accuracy
            type: accuracy
            value: 0.9836
        source:
          name: Language recognition using BERT
          url: >-
            https://www.kaggle.com/code/sergeypolivin/language-recognition-using-bert
language:
  - ar
  - da
  - nl
  - en
  - fr
  - de
  - el
  - hi
  - it
  - kn
  - ml
  - pt
  - ru
  - es
  - sv
  - ta
  - tr
pipeline_tag: text-classification
widget:
  - text: I have seen it somewhere...
    example_title: English
  - text: Ik heb het al gezien
    example_title: Dutch
  - text: Интересная идея
    example_title: Russian
  - text: Que vamos a hacer?
    example_title: Spanish
  - text: Hvor er der en pengeautomat?
    example_title: Danish
license: mit

Language Detection Model

The model presented in the following repository represents a fine-tuned version of BertForSequenceClassification pretrained on multilingual texts.

Training/fine-tuning

The model has been fine-tuned based on Language Detection dataset found on Kaggle. The entire process of the dataset analysis as well as a complete description of the training procedure can be found in one of my Kaggle notebooks which has been used for the purpose of a faster model training on GPU.

Supported languages

The model has been fine-tuned to detect one of the following 17 languages:

  • Arabic
  • Danish
  • Dutch
  • English
  • French
  • German
  • Greek
  • Hindi
  • Italian
  • Kannada
  • Malayalam
  • Portugeese
  • Russian
  • Spanish
  • Sweedish
  • Tamil
  • Turkish

References

  1. BERT multilingual base model (uncased)
  2. Language Detection Dataset