metadata
base_model: bert-base-multilingual-uncased
language:
- ar
- da
- nl
- en
- fr
- de
- el
- hi
- it
- kn
- ml
- pt
- ru
- es
- sv
- ta
- tr
pipeline_tag: text-classification
widget:
- text: I have seen it somewhere...
example_title: English
- text: Ik heb het al gezien
example_title: Dutch
- text: Интересная идея
example_title: Russian
- text: Que vamos a hacer?
example_title: Spanish
- text: Hvor er der en pengeautomat?
example_title: Danish
- text: إنه مشوق جدا
example_title: Arabic
- text: Es ist sehr interessant
example_title: German
- text: c'est très intéressant
example_title: French
- text: Non ho mai visto una tale bellezza
example_title: Italian
- text: Jag har aldrig sett en sådan skönhet
example_title: Swedish
- text: Böyle bir güzellik görmedim
example_title: Turkish
- text: ಅದ್ಭುತ ಕಲ್ಪನೆ
example_title: Kannada
- text: அற்புதமான யோசனை
example_title: Tamil
- text: Υπέροχη ιδέα
example_title: Greek
- text: Eu nunca estive aqui
example_title: Portugeese
- text: मैं यहां कभी नहीं गया
example_title: Hindi
- text: ഞാൻ ഇവിടെ പോയിട്ടില്ല
example_title: Malayam
license: mit
Language Detection Model
The model presented in the following repository represents a fine-tuned version of BertForSequenceClassification
pretrained on multilingual texts.
Training/fine-tuning
The model has been fine-tuned based on Language Detection dataset found on Kaggle. The entire process of the dataset analysis as well as a complete description of the training procedure can be found in one of my Kaggle notebooks which has been used for the purpose of a faster model training on GPU.
Supported languages
The model has been fine-tuned to detect one of the following 17 languages:
- Arabic
- Danish
- Dutch
- English
- French
- German
- Greek
- Hindi
- Italian
- Kannada
- Malayalam
- Portugeese
- Russian
- Spanish
- Sweedish
- Tamil
- Turkish