File size: 3,283 Bytes
16b872d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f329cf5
 
32deec2
 
0a2430a
32deec2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbebf3f
 
 
 
32deec2
 
 
cbebf3f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: mit
datasets:
- ntphuc149/ViBidLQA
language:
- vi
metrics:
- bleu
- rouge
- meteor
- bertscore
base_model:
- VietAI/vit5-base
pipeline_tag: question-answering
library_name: transformers
tags:
- legal
- vietnamese
- question-answering
- text-generation
---

# ViBidLAQA_base: A Vietnamese Bidding Legal Abstractive Question Answering Model

## Overview
ViBidLAQA_base is an abstractive question-answering (AQA) model specifically developed for the Vietnamese bidding law domain. Built upon the VietAI/vit5-base architecture and fine-tuned with a specialized bidding law dataset, this model demonstrates strong performance in generating natural and accurate responses to legal queries. 

## Model Description

- **Downstream task**: Abstractive Question Answering
- **Domain**: Vietnamese Bidding Law
- **Base Model**: VietAI/vit5-base
- **Approach**: Fine-tuning
- **Language**: Vietnamese

## Dataset

The ViBidLQA dataset features:
- **Training set**: 5,300 samples
- **Test set**: 1,000 samples
- **Data Creation Process**:
  - Training data was automatically generated by Claude 3.5 Sonnet and validated by two legal experts
  - Two Vietnamese legal experts manually created test set

## Performance

| Metric | Score |
|--------|-------|
| ROUGE-1 | 75.09 |
| ROUGE-2 | 63.43 |
| ROUGE-L | 65.72 |
| ROUGE-L-SUM | 65.79 |
| BLEU-1 | 53.61 |
| BLEU-2 | 47.51 |
| BLEU-3 | 43.40 |
| BLEU-4 | 39.54 |
| METEOR | 64.38 |
| BERT-Score | 86.65 |

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ntphuc149/ViBidLAQA_base")
model = AutoModelForSeq2SeqLM.from_pretrained("ntphuc149/ViBidLAQA_base")

# Example usage
question = "Thế nào là đấu thầu hạn chế?"
context = "Đấu thầu hạn chế là phương thức lựa chọn nhà thầu trong đó chỉ một số nhà thầu đáp ứng yêu cầu về năng lực và kinh nghiệm được bên mời thầu mời tham gia."

# Prepare input
inputs = tokenizer(f"question: {question} context: {context}", return_tensors="pt", max_length=512, truncation=True)

# Generate answer
outputs = model.generate(inputs.input_ids, max_length=128, min_length=10, num_beams=4)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)
```

## Applications

This model is advantageous for:
- Bidding law information retrieval systems
- Legal advisory chatbots in the bidding domain
- Automated question-answering systems for bidding law queries

## Limitations

- The model is specifically trained for the Vietnamese bidding law domain and may not perform well on other legal domains
- Performance may vary depending on the complexity and specificity of the questions
- The model should be used as a reference tool and not as a replacement for professional legal advice

## Citation

If you use this model in your research, please cite:
```
comming soon...
```

## Contact

For questions, feedback, or collaborations:
- Email: nguyentruongphuc[email protected]
- GitHub Issues: [@ntphuc149](https://github.com/ntphuc149)
- HuggingFace: [@ntphuc149](https://huggingface.co/ntphuc149)

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.