metadata

base_model: bert-base-multilingual-uncased
model-index:
  - name: lang-recogn-model
    results:
      - task:
          type: text-classification
        dataset:
          name: language-detection
          type: language-detection
        metrics:
          - name: accuracy
            type: accuracy
            value: 0.9836
        source:
          name: Language recognition using BERT
          url: >-
            https://www.kaggle.com/code/sergeypolivin/language-recognition-using-bert
language:
  - ar
  - da
  - nl
  - en
  - fr
  - de
  - el
  - hi
  - it
  - kn
  - ml
  - pt
  - ru
  - es
  - sv
  - ta
  - tr
pipeline_tag: text-classification
widget:
  - text: I have seen it somewhere...
    example_title: English
  - text: Ik heb het al gezien
    example_title: Dutch
  - text: Интересная идея
    example_title: Russian
  - text: Que vamos a hacer?
    example_title: Spanish
  - text: Hvor er der en pengeautomat?
    example_title: Danish
license: mit

Language Detection Model

The model presented in the following repository represents a fine-tuned version of BertForSequenceClassification pretrained on multilingual texts.

Training/fine-tuning

The model has been fine-tuned based on Language Detection dataset found on Kaggle. The entire process of the dataset analysis as well as a complete description of the training procedure can be found in one of my Kaggle notebooks which has been used for the purpose of a faster model training on GPU.

Supported languages

The model has been fine-tuned to detect one of the following 17 languages:

Arabic
Danish
Dutch
English
French
German
Greek
Hindi
Italian
Kannada
Malayalam
Portugeese
Russian
Spanish
Sweedish
Tamil
Turkish

spolivin
/

lang-recogn-model

Language Detection Model

Training/fine-tuning

Supported languages

References