Uzbek Dependency Parser

This model predicts Universal Dependencies dependency relations for Uzbek text.

Model details

The model was fine-tuned on a Universal Dependencies treebank containing approximately 600 annotated sentences. It is based on the XLM-RoBERTa base model and adapted for token classification.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Arofat/uzbek-dependency-parser")
model = AutoModelForTokenClassification.from_pretrained("Arofat/uzbek-dependency-parser")

# Prepare text
text = "Men O'zbekistonda yashayman."
tokens = text.split()

# Get predictions
inputs = tokenizer(tokens, is_split_into_words=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

# Process outputs
predictions = torch.argmax(outputs.logits, dim=2)
id2label = model.config.id2label

# Get dependency relations
dep_tags = []
word_ids = inputs.word_ids(batch_index=0)
prev_word_id = None
for idx, word_id in enumerate(word_ids):
    if word_id is None or word_id == prev_word_id:
        continue
    dep_tags.append(id2label[predictions[0, idx].item()])
    prev_word_id = word_id

# Print results
for token, tag in zip(tokens, dep_tags):
    print(f"{token}: {tag}")

Limitations

This model was trained on a relatively small dataset and may not generalize well to all domains of Uzbek text. Note that this model only predicts dependency relations (labels) and not the dependency tree structure (heads). For a complete dependency parse, additional processing is needed.

Downloads last month
2
Safetensors
Model size
277M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Arofat/uzbek-dependency-parser