Implementation of ACL 2024 findings "Improving Grammatical Error Correction via Contextual Data Augmentation"

github link

Model Weights

We release the model weights of each training stage. Our model is trained based on the Fairseq framework, details of the weights and links to them are below.

Name Data Info Download Link
Stage1 Pre-training on C4 synthetic data with 200M scale CDA4GEC/tree/main/stage1_checkpoint_best.pt
Stage2+ Fine-tuning on the augmented Lang8, NUCLE, FCE and W&I+L datasets CDA4GEC/tree/main/stage2_checkpoint_best.pt
Stage3+ Continue fine-tuning on the augmented W&I+L dataset CDA4GEC/tree/main/stage3_checkpoint_best.pt

Synthetic Data

We only release the synthetic pseudo-data, please follow the official process to apply for the original annotated data.

DataInfo Amount Source Path
stage2+ 2M Lang-8 & NUCLE & FCE & W&I+L CDA4GEC/tree/main/pseudo/stage2
stage3+ 200K W&I+L CDA4GEC/tree/main/pseudo/stage3

Citation

If you find this work is useful for your research, please cite our paper:

@inproceedings{wang-etal-2024-improving-grammatical,
    title = "Improving Grammatical Error Correction via Contextual Data Augmentation",
    author = "Wang, Yixuan  and
      Wang, Baoxin  and
      Liu, Yijun  and
      Zhu, Qingfu  and
      Wu, Dayong  and
      Che, Wanxiang",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.647",
    pages = "10898--10910",
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support