--- language: uz license: apache-2.0 tags: - uzbek - dependency-parsing - universal-dependencies - nlp datasets: - universal_dependencies metrics: - accuracy - f1 --- # Uzbek Dependency Parser This model predicts Universal Dependencies dependency relations for Uzbek text. ## Model details The model was fine-tuned on a Universal Dependencies treebank containing approximately 600 annotated sentences. It is based on the [XLM-RoBERTa base model](https://huggingface.co/xlm-roberta-base) and adapted for token classification. ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("Arofat/uzbek-dependency-parser") model = AutoModelForTokenClassification.from_pretrained("Arofat/uzbek-dependency-parser") # Prepare text text = "Men O'zbekistonda yashayman." tokens = text.split() # Get predictions inputs = tokenizer(tokens, is_split_into_words=True, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) # Process outputs predictions = torch.argmax(outputs.logits, dim=2) id2label = model.config.id2label # Get dependency relations dep_tags = [] word_ids = inputs.word_ids(batch_index=0) prev_word_id = None for idx, word_id in enumerate(word_ids): if word_id is None or word_id == prev_word_id: continue dep_tags.append(id2label[predictions[0, idx].item()]) prev_word_id = word_id # Print results for token, tag in zip(tokens, dep_tags): print(f"{token}: {tag}") ``` ## Limitations This model was trained on a relatively small dataset and may not generalize well to all domains of Uzbek text. Note that this model only predicts dependency relations (labels) and not the dependency tree structure (heads). For a complete dependency parse, additional processing is needed.