YAML Metadata Error: "datasets[2]" with value "samsum_(translated_into_Russian)" is not valid. If possible, use a dataset id from https://hf.co/datasets.

📝 Description

MBart for Russian summarization fine-tuned for dialogues summarization.

This model was firstly fine-tuned by Ilya Gusev on Gazeta dataset. We have fine tuned that model on SamSum dataset translated to Russian using GoogleTranslateAPI

🤗 Moreover! We have implemented a ! telegram bot @summarization_bot ! with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages!  🤗

❓ How to use with code

from transformers import MBartTokenizer, MBartForConditionalGeneration

# Download model and tokenizer
model_name = "Kirili4ik/mbart_ruDialogSum"   
tokenizer =  AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
model.eval()

article_text = "..."

input_ids = tokenizer(
    [article_text],
    max_length=600,
    padding="max_length",
    truncation=True,
    return_tensors="pt",
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    top_k=0,
    num_beams=3,
    no_repeat_ngram_size=3
)[0]


summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)
Downloads last month
881
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results

  • Validation ROGUE-1 on SAMSum Corpus (translated to Russian)
    self-reported
    34.500
  • Validation ROGUE-L on SAMSum Corpus (translated to Russian)
    self-reported
    33.000
  • Test ROGUE-1 on SAMSum Corpus (translated to Russian)
    self-reported
    31.000
  • Test ROGUE-L on SAMSum Corpus (translated to Russian)
    self-reported
    28.000