SemViQA-TC: Vietnamese Three-class Classification for Claim Verification

Model Description

SemViQA-TC is one of the key components of the SemViQA system, designed for three-class classification in Vietnamese fact-checking. This model classifies a given claim into one of three categories: SUPPORTED, REFUTED, or NOT ENOUGH INFORMATION (NEI) based on retrieved evidence.

Model Information

Developed by: SemViQA Research Team
Fine-tuned model: XLM-R
Supported Language: Vietnamese
Task: Three-Class Classification (Fact Verification)
Dataset: ViWikiFC

SemViQA-TC serves as the first step in the two-step classification process of the SemViQA system. It initially categorizes claims into three classes: SUPPORTED, REFUTED, or NEI. For claims classified as SUPPORTED or REFUTED, a secondary binary classification model (SemViQA-BC) further refines the prediction. This hierarchical classification strategy enhances the accuracy of fact verification.

Usage Example

Direct Model Usage

# Install semviqa
!pip install semviqa

# Initalize a pipeline
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer
from semviqa.tvc.model import ClaimModelForClassification

tokenizer = AutoTokenizer.from_pretrained("SemViQA/tc-xlmr-viwikifc")
model = ClaimModelForClassification.from_pretrained("SemViQA/tc-xlmr-viwikifc")
claim = "Chiến tranh với Campuchia đã kết thúc trước khi Việt Nam thống nhất."
evidence = "Sau khi thống nhất, Việt Nam tiếp tục gặp khó khăn do sự sụp đổ và tan rã của đồng minh Liên Xô cùng Khối phía Đông, các lệnh cấm vận của Hoa Kỳ, chiến tranh với Campuchia, biên giới giáp Trung Quốc và hậu quả của chính sách bao cấp sau nhiều năm áp dụng."

inputs = tokenizer(
    claim,
    evidence,
    truncation="only_second",
    add_special_tokens=True,
    max_length=256,
    padding='max_length',
    return_attention_mask=True,
    return_token_type_ids=False,
    return_tensors='pt',
)

labels = ["NEI", "SUPPORTED", "REFUTED"]

with torch.no_grad():
    outputs = model(**inputs)

logits = outputs["logits"]
probabilities = F.softmax(logits, dim=1).squeeze()

for i, (label, prob) in enumerate(zip(labels, probabilities.tolist()), start=1):
    print(f"{i}) {label} {prob:.4f}")
# 1) NEI 0.0091
# 2) SUPPORTED 0.0014
# 3) REFUTED 0.9894

Citation

If you use SemViQA-TC in your research, please cite:

@misc{nguyen2025semviqasemanticquestionanswering,
      title={SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking},
      author={Nam V. Nguyen and Dien X. Tran and Thanh T. Tran and Anh T. Hoang and Tai V. Duong and Di T. Le and Phuc-Lu Le},
      year={2025},
      eprint={2503.00955},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.00955},
}

🔗 Paper Link: SemViQA on arXiv
🔗 Source Code: GitHub - SemViQA

SemViQA
/

tc-xlmr-viwikifc

SemViQA-TC: Vietnamese Three-class Classification for Claim Verification

Model Description

Model Information

Usage Example

Citation

Space using SemViQA/tc-xlmr-viwikifc 1