|
--- |
|
license: mit |
|
language: |
|
- ne |
|
metrics: |
|
- rouge |
|
tags: |
|
- Nepali summary |
|
- Nepali bart |
|
- Nepali |
|
- summary |
|
- text |
|
- nepali text summary |
|
pipeline_tag: text2text-generation |
|
widget: |
|
- text: "अत्यधिक माग भएका बेला दसैंमा चिनीको हाहाकार भएको थियो । उपत्यकाबाहिरका केही जिल्लामा चिनी पाइए पनि काठमाडौंमा भने अभाव नै कायम रहेको छ । प्रधानमन्त्री पुष्पकमल दाहालले बिहीबार बिहान उद्योग तथा वाणिज्य मन्त्री तथा मुख्यसचिवलाई चिनीको अभाव सिर्जना हुन नदिन सबै उपायको खोजी गर्न निर्देशन दिएका थिए । नेपाली चिनी उद्योगहरूले आम उपभोक्तालाई सहज हुने किसिमले बजारमा चिनी नपठाइ ठूला उद्योगलाई आपूर्ति गर्न गोदाममै राख्ने गरेको पनि भेटिएको छ । वाणिज्य विभागको तथ्यांक अनुसार, नेपालमा उत्पादन हुने चिनीको सत्तरी प्रतिशत चिनी बिभिन्न पेय पदार्थ, मिठाइ, चकलेट, विस्कुटलगायतका उद्योगहरुमा आपूर्ति हुने गर्दछ । नेपाल प्रहरीले नेपालमा रहेका सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने तथा सो आधारमा बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गरिने विभागले जनाएको छ ।" |
|
example_title: "Example 1" |
|
--- |
|
# Nep_Summ_BART: |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model is pre-trained using BART on Nepali corpus and then fine-tuned on Nepali summary data. |
|
<br>The model generates a summary for the text input. |
|
|
|
The parameter size for the model is 101M. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
The model is trained using BART noising techniques like sentence permutation, token deletion, and random token masking. |
|
<br>The noisy data is fed into the encoder of the transformer and the denoising task/ objective is fulfilled by the decoder of the transformer model. |
|
|
|
Cross-entropy loss is used for both the pre-training and fine-tuning of the model. |
|
|
|
The Loss for pre-training is as follows: |
|
|
|
| Epoch | Training Loss | Val Loss | |
|
|----------|:-------------:|------:| |
|
| 1 | 0.8137 | 0.8010 | |
|
| 2 | 0.7861 | 0.7524 | |
|
| 3 | 0.7495 | 0.7290 | |
|
|
|
The ROUGE Score after the fine-tuning, for the BBC XLSum Nepali Test Dataset is: |
|
|
|
ROUGE1 : 0.177 |
|
|
|
ROUGE2 : 0.059 |
|
|
|
ROUGEL : 0.154 |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
You can use this model for text summarization. |
|
<br>Could be used as an encoder-only model using BartForSequenceClasssification. |
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
``` |
|
# make sure to install the dependencies below/ from requirements.txt |
|
# pip install transformers==4.35 |
|
# pip install huggingface_hub==0.23.0 |
|
|
|
import torch |
|
|
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("pascalrai/nep_summ_BART") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("pascalrai/nep_summ_BART") |
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
sentence = """अत्यधिक माग भएका बेला दसैंमा चिनीको हाहाकार भएको थियो । उपत्यकाबाहिरका केही जिल्लामा चिनी पाइए पनि काठमाडौंमा भने अभाव नै कायम रहेको छ । प्रधानमन्त्री पुष्पकमल दाहालले बिहीबार बिहान उद्योग तथा वाणिज्य मन्त्री तथा मुख्यसचिवलाई चिनीको अभाव सिर्जना हुन नदिन सबै उपायको खोजी गर्न निर्देशन दिएका थिए । |
|
|
|
नेपाली चिनी उद्योगहरूले आम उपभोक्तालाई सहज हुने किसिमले बजारमा चिनी नपठाइ ठूला उद्योगलाई आपूर्ति गर्न गोदाममै राख्ने गरेको पनि भेटिएको छ । वाणिज्य विभागको तथ्यांक अनुसार, नेपालमा उत्पादन हुने चिनीको सत्तरी प्रतिशत चिनी बिभिन्न पेय पदार्थ, मिठाइ, चकलेट, विस्कुटलगायतका उद्योगहरुमा आपूर्ति हुने गर्दछ । |
|
|
|
नेपाल प्रहरीले नेपालमा रहेका सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने तथा सो आधारमा बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गरिने विभागले जनाएको छ""" |
|
|
|
inputs = tokenizer(sentence, max_length=1000, return_tensors="pt") |
|
summary_ids = model.to(device).generate(inputs["input_ids"].to(device)) |
|
|
|
tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False) |
|
|
|
'दशैंको मुखमा चिनीको चरम अभाव भएको भन्दै नेपाल प्रहरीले सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने र बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गर्ने जनाएको छ।' |
|
|
|
``` |
|
#### Hardware |
|
|
|
The model was pre-trained continuously on a single A10G GPU in an AWS instance for 133 hours with each epoch taking 45 hours using bf16 quantization. |
|
|
|
#### Possible Future Directions: |
|
|
|
1. Use a decoder-only model for pre-training and summarization. |
|
<br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder context during Cross-attention to decoder generation. |
|
<br>Thus, hurts the performance of the Abstractive Summarization task. |
|
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all. |
|
|
|
2. We have pre-trained our model with approx 16 GB of data, and testing Classification result on <a href='https://www.kaggle.com/datasets/ashokpant/nepali-news-dataset-large/data'>Nepali News Dataset (Large)</a> with a couple of Nepali transformer based Models available on Hugging Face, |
|
<br> Our models seem to do better than others with an accuracy of 0.58 on validation but, |
|
<br> There could be two reasons for this: |
|
|
|
- There is still room for improving the quality of the data. (test with HLP) |
|
<br>Try below, if HLP >> 0.58 |
|
- We still do not have enough data for generalization as Transformer models only perform well with large amounts of pre-trained data compared with Classical Sequential Models. |
|
|
|
#### Authors: |
|
|
|
<a href="https://www.linkedin.com/in/bijaya-bhatta-69536018a/">Vijaya Bhatta</a> |
|
<br><a href="https://www.linkedin.com/in/pascal-rai/">Pascal Rai</a> |
|
<br><a href="https://www.linkedin.com/in/niranjan-shrestha-gem/">Niranjan Shrestha</a> |
|
<br><a href="https://www.linkedin.com/in/dristi-sigdel-3120131b1/">Dristi Sigdel</a> |
|
<br><a href="https://www.linkedin.com/in/sujan-neupane-596964211/">Sujan Neupane</a> |
|
<br><a href="https://www.linkedin.com/in/sagar-kafle-a1b84b185/">Sagar Kafle</a> |