DecoderImmortal commited on
Commit
b0240c0
·
verified ·
1 Parent(s): 752c8de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -1,3 +1,50 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Implementation of ACL 2024 findings "Improving Grammatical Error Correction via Contextual Data Augmentation"
5
+
6
+ [github link](https://github.com/wyxstriker/CDA4GEC)
7
+
8
+ # Model Weights
9
+ We release the model weights of each training stage.
10
+ Our model is trained based on the Fairseq framework, details of the weights and links to them are below.
11
+
12
+ |Name|Data Info|Download Link|
13
+ |:--:|--|--|
14
+ |Stage1|Pre-training on [C4 synthetic data](https://github.com/google-research-datasets/C4_200M-synthetic-dataset-for-grammatical-error-correction) with 200M scale|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/stage1_checkpoint_best.pt|
15
+ |Stage2+|Fine-tuning on the augmented Lang8, NUCLE, FCE and W&I+L datasets|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/stage2_checkpoint_best.pt|
16
+ |Stage3+|Continue fine-tuning on the augmented W&I+L dataset|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/stage3_checkpoint_best.pt|
17
+
18
+ # Synthetic Data
19
+ > We only release the synthetic pseudo-data, please follow the official process to apply for the original annotated data.
20
+
21
+
22
+ |DataInfo|Amount|Source|Path|
23
+ |:--:|:--:|:--:|:--:|
24
+ |stage2+|2M|Lang-8 & NUCLE & FCE & W&I+L|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/pseudo/stage2|
25
+ |stage3+|200K|W&I+L|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/pseudo/stage3|
26
+
27
+ # Citation
28
+ If you find this work is useful for your research, please cite our paper:
29
+
30
+ ```
31
+ @inproceedings{wang-etal-2024-improving-grammatical,
32
+ title = "Improving Grammatical Error Correction via Contextual Data Augmentation",
33
+ author = "Wang, Yixuan and
34
+ Wang, Baoxin and
35
+ Liu, Yijun and
36
+ Zhu, Qingfu and
37
+ Wu, Dayong and
38
+ Che, Wanxiang",
39
+ editor = "Ku, Lun-Wei and
40
+ Martins, Andre and
41
+ Srikumar, Vivek",
42
+ booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
43
+ month = aug,
44
+ year = "2024",
45
+ address = "Bangkok, Thailand and virtual meeting",
46
+ publisher = "Association for Computational Linguistics",
47
+ url = "https://aclanthology.org/2024.findings-acl.647",
48
+ pages = "10898--10910",
49
+ }
50
+ ```