This is a distilbert-base-multilingual-cased-Model fine-tuned with a NER objective to tag tokens based on whether they belong to a code block or natural language text. The dataset of 78210 examples was generated by randomly combining code and text blocks from other permissively-licensed datasets, with some examples containing only code and some only regular text.

The model achieves the following stats on the validation set:

Metric Value
Loss 0.0788
F1 Score 0.8619
Precision 0.8362
Recall 0.8893
Accuracy 0.9792
Downloads last month
10
Safetensors
Model size
135M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support