jiangjiechen commited on
Commit
d6b79a4
·
1 Parent(s): 0452866

models for loren-aaai22

Browse files
.gitignore ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Created by .ignore support plugin (hsz.mobi)
2
+ ### macOS template
3
+ # General
4
+ .DS_Store
5
+ .AppleDouble
6
+ .LSOverride
7
+
8
+ # Icon must end with two \r
9
+ Icon
10
+
11
+ # Thumbnails
12
+ ._*
13
+
14
+ # Files that might appear in the root of a volume
15
+ .DocumentRevisions-V100
16
+ .fseventsd
17
+ .Spotlight-V100
18
+ .TemporaryItems
19
+ .Trashes
20
+ .VolumeIcon.icns
21
+ .com.apple.timemachine.donotpresent
22
+
23
+ # Directories potentially created on remote AFP share
24
+ .AppleDB
25
+ .AppleDesktop
26
+ Network Trash Folder
27
+ Temporary Items
28
+ .apdisk
29
+ ### Python template
30
+ # Byte-compiled / optimized / DLL files
31
+ __pycache__/
32
+ *.py[cod]
33
+ *$py.class
34
+
35
+ # C extensions
36
+ *.so
37
+
38
+ # Distribution / packaging
39
+ .Python
40
+ build/
41
+ develop-eggs/
42
+ dist/
43
+ downloads/
44
+ eggs/
45
+ .eggs/
46
+ lib/
47
+ lib64/
48
+ parts/
49
+ sdist/
50
+ var/
51
+ wheels/
52
+ *.egg-info/
53
+ .installed.cfg
54
+ *.egg
55
+ MANIFEST
56
+
57
+ # PyInstaller
58
+ # Usually these files are written by a python script from a template
59
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
60
+ *.manifest
61
+ *.spec
62
+
63
+ # Installer logs
64
+ pip-log.txt
65
+ pip-delete-this-directory.txt
66
+
67
+ # Unit test / coverage reports
68
+ htmlcov/
69
+ .tox/
70
+ .coverage
71
+ .coverage.*
72
+ .cache
73
+ nosetests.xml
74
+ coverage.xml
75
+ *.cover
76
+ .hypothesis/
77
+ .pytest_cache/
78
+
79
+ # Translations
80
+ *.mo
81
+ *.pot
82
+
83
+ # Django stuff:
84
+ *.log
85
+ local_settings.py
86
+ db.sqlite3
87
+
88
+ # Flask stuff:
89
+ instance/
90
+ .webassets-cache
91
+
92
+ # Scrapy stuff:
93
+ .scrapy
94
+
95
+ # Sphinx documentation
96
+ docs/_build/
97
+
98
+ # PyBuilder
99
+ target/
100
+
101
+ # Jupyter Notebook
102
+ .ipynb_checkpoints
103
+
104
+ # pyenv
105
+ .python-version
106
+
107
+ # celery beat schedule file
108
+ celerybeat-schedule
109
+
110
+ # SageMath parsed files
111
+ *.sage.py
112
+
113
+ # Environments
114
+ .env
115
+ .venv
116
+ env/
117
+ venv/
118
+ ENV/
119
+ env.bak/
120
+ venv.bak/
121
+
122
+ # Spyder project settings
123
+ .spyderproject
124
+ .spyproject
125
+
126
+ # Rope project settings
127
+ .ropeproject
128
+
129
+ # mkdocs documentation
130
+ /site
131
+
132
+ # mypy
133
+ .mypy_cache/
134
+ ### JetBrains template
135
+ # Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
136
+ # Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
137
+
138
+ # User-specific stuff
139
+ .idea/**/workspace.xml
140
+ .idea/**/tasks.xml
141
+ .idea/**/usage.statistics.xml
142
+ .idea/**/dictionaries
143
+ .idea/**/shelf
144
+
145
+ # Sensitive or high-churn files
146
+ .idea/**/dataSources/
147
+ .idea/**/dataSources.ids
148
+ .idea/**/dataSources.local.xml
149
+ .idea/**/sqlDataSources.xml
150
+ .idea/**/dynamic.xml
151
+ .idea/**/uiDesigner.xml
152
+ .idea/**/dbnavigator.xml
153
+
154
+ # Gradle
155
+ .idea/**/gradle.xml
156
+ .idea/**/libraries
157
+
158
+ # Gradle and Maven with auto-import
159
+ # When using Gradle or Maven with auto-import, you should exclude module files,
160
+ # since they will be recreated, and may cause churn. Uncomment if using
161
+ # auto-import.
162
+ # .idea/modules.xml
163
+ # .idea/*.iml
164
+ # .idea/modules
165
+
166
+ # CMake
167
+ cmake-build-*/
168
+
169
+ # Mongo Explorer plugin
170
+ .idea/**/mongoSettings.xml
171
+
172
+ # File-based project format
173
+ *.iws
174
+
175
+ # IntelliJ
176
+ out/
177
+
178
+ # mpeltonen/sbt-idea plugin
179
+ .idea_modules/
180
+
181
+ # JIRA plugin
182
+ atlassian-ide-plugin.xml
183
+
184
+ # Cursive Clojure plugin
185
+ .idea/replstate.xml
186
+
187
+ # Crashlytics plugin (for Android Studio and IntelliJ)
188
+ com_crashlytics_export_strings.xml
189
+ crashlytics.properties
190
+ crashlytics-build.properties
191
+ fabric.properties
192
+
193
+ # Editor-based Rest Client
194
+ .idea/httpRequests
195
+ ### VirtualEnv template
196
+ # Virtualenv
197
+ # http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
198
+ .Python
199
+ [Bb]in
200
+ [Ii]nclude
201
+ [Ll]ib
202
+ [Ll]ib64
203
+ [Ll]ocal
204
+ pyvenv.cfg
205
+ .venv
206
+ pip-selfcheck.json
207
+
208
+ .idea/
evidence_retrieval/bert_base/README.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # BERT pretrained checkpoints
2
+ * The BERT pertained checkpoints. They can be found at [Google Drive](https://drive.google.com/open?id=1cv9dfYN_dF8GyILFbON6IUB-iU3nsNLp).
3
+ * The BERT codes and checkpoints inherit from hugginface's BERT implementation.
4
+ ```
5
+ https://github.com/huggingface/transformers
6
+ ```
evidence_retrieval/bert_base/bert_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "attention_probs_dropout_prob": 0.1,
3
+ "hidden_act": "gelu",
4
+ "hidden_dropout_prob": 0.1,
5
+ "hidden_size": 768,
6
+ "initializer_range": 0.02,
7
+ "intermediate_size": 3072,
8
+ "max_position_embeddings": 512,
9
+ "num_attention_heads": 12,
10
+ "num_hidden_layers": 12,
11
+ "type_vocab_size": 2,
12
+ "vocab_size": 28996
13
+ }
evidence_retrieval/bert_base/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6992b8cd27d7a132eafce6a8210272329a371b1c762d453588795dd3835593e
3
+ size 435779157
evidence_retrieval/bert_base/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
evidence_retrieval/retrieval_model/model.best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13650f0f9a4143573a28097a58863fc74c08399aaf3b26d4492a5ef043ebb26c
3
+ size 433299120
fact_checking/bert-large/config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bert-large-cased",
3
+ "architectures": [
4
+ "BertChecker"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "directionality": "bidi",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "id2label": {
13
+ "0": "LABEL_0",
14
+ "1": "LABEL_1",
15
+ "2": "LABEL_2"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 4096,
19
+ "label2id": {
20
+ "LABEL_0": 0,
21
+ "LABEL_1": 1,
22
+ "LABEL_2": 2
23
+ },
24
+ "layer_norm_eps": 1e-12,
25
+ "max_position_embeddings": 512,
26
+ "model_type": "bert",
27
+ "num_attention_heads": 16,
28
+ "num_hidden_layers": 24,
29
+ "pad_token_id": 0,
30
+ "pooler_fc_size": 768,
31
+ "pooler_num_attention_heads": 12,
32
+ "pooler_num_fc_layers": 3,
33
+ "pooler_size_per_head": 128,
34
+ "pooler_type": "first_token_transform",
35
+ "position_embedding_type": "absolute",
36
+ "transformers_version": "4.6.0.dev0",
37
+ "type_vocab_size": 2,
38
+ "vocab_size": 28996
39
+ }
fact_checking/bert-large/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f62f2e57d3a8619c85bf688804597a41b47df8f643c040c3e7dfdbab82e26c59
3
+ size 1338984725
fact_checking/bert-large/special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
fact_checking/bert-large/tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": false, "do_basic_tokenize": true, "never_split": null, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "model_max_length": 512, "name_or_path": "bert-large-cased"}
fact_checking/bert-large/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec118ac159598b4d53751e10497485f7ebc295c6a6689ab98048a40f5b7733ba
3
+ size 1775
fact_checking/bert-large/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
fact_checking/roberta-large/config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "roberta-large",
3
+ "architectures": [
4
+ "RobertaChecker"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "gradient_checkpointing": false,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 1024,
13
+ "id2label": {
14
+ "0": "LABEL_0",
15
+ "1": "LABEL_1",
16
+ "2": "LABEL_2"
17
+ },
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 4096,
20
+ "label2id": {
21
+ "LABEL_0": 0,
22
+ "LABEL_1": 1,
23
+ "LABEL_2": 2
24
+ },
25
+ "layer_norm_eps": 1e-05,
26
+ "max_position_embeddings": 514,
27
+ "model_type": "roberta",
28
+ "num_attention_heads": 16,
29
+ "num_hidden_layers": 24,
30
+ "pad_token_id": 1,
31
+ "type_vocab_size": 1,
32
+ "vocab_size": 50265
33
+ }
fact_checking/roberta-large/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
fact_checking/roberta-large/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6294d8a5926582303beb298ababa540524061e46e0564c8a0e2ea69ba8cd612
3
+ size 1426109081
fact_checking/roberta-large/special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true}}
fact_checking/roberta-large/tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"errors": "replace", "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "model_max_length": 512, "name_or_path": "roberta-large"}
fact_checking/roberta-large/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57c9f8f7c1b48eeef2a2396e3dc1a910e0f822cf5837c072370e6cbef9506528
3
+ size 1775
fact_checking/roberta-large/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
mrc_seq2seq/bart-base/config.json ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/bart-base",
3
+ "activation_dropout": 0.1,
4
+ "activation_function": "gelu",
5
+ "add_bias_logits": false,
6
+ "add_final_layer_norm": false,
7
+ "architectures": [
8
+ "BartForConditionalGeneration"
9
+ ],
10
+ "attention_dropout": 0.1,
11
+ "bos_token_id": 0,
12
+ "classif_dropout": 0.1,
13
+ "classifier_dropout": 0.0,
14
+ "d_model": 768,
15
+ "decoder_attention_heads": 12,
16
+ "decoder_ffn_dim": 3072,
17
+ "decoder_layerdrop": 0.0,
18
+ "decoder_layers": 6,
19
+ "decoder_start_token_id": 2,
20
+ "do_blenderbot_90_layernorm": false,
21
+ "dropout": 0.1,
22
+ "early_stopping": true,
23
+ "encoder_attention_heads": 12,
24
+ "encoder_ffn_dim": 3072,
25
+ "encoder_layerdrop": 0.0,
26
+ "encoder_layers": 6,
27
+ "eos_token_id": 2,
28
+ "extra_pos_embeddings": 2,
29
+ "force_bos_token_to_be_generated": false,
30
+ "id2label": {
31
+ "0": "LABEL_0",
32
+ "1": "LABEL_1",
33
+ "2": "LABEL_2"
34
+ },
35
+ "init_std": 0.02,
36
+ "is_encoder_decoder": true,
37
+ "label2id": {
38
+ "LABEL_0": 0,
39
+ "LABEL_1": 1,
40
+ "LABEL_2": 2
41
+ },
42
+ "max_length": 15,
43
+ "max_position_embeddings": 1024,
44
+ "min_length": 1,
45
+ "model_type": "bart",
46
+ "no_repeat_ngram_size": 3,
47
+ "normalize_before": false,
48
+ "normalize_embedding": true,
49
+ "num_beams": 4,
50
+ "num_hidden_layers": 6,
51
+ "pad_token_id": 1,
52
+ "save_step": 20,
53
+ "scale_embedding": false,
54
+ "static_position_embeddings": false,
55
+ "task_specific_params": {
56
+ "summarization": {
57
+ "length_penalty": 1.0,
58
+ "max_length": 128,
59
+ "min_length": 12,
60
+ "num_beams": 4
61
+ },
62
+ "summarization_cnn": {
63
+ "length_penalty": 2.0,
64
+ "max_length": 142,
65
+ "min_length": 56,
66
+ "num_beams": 4
67
+ },
68
+ "summarization_xsum": {
69
+ "length_penalty": 1.0,
70
+ "max_length": 62,
71
+ "min_length": 11,
72
+ "num_beams": 6
73
+ }
74
+ },
75
+ "vocab_size": 50265
76
+ }
mrc_seq2seq/bart-base/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
mrc_seq2seq/bart-base/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37d0722b0a88057d35ad7b43506370bb58f17d3889ff472cf403ce598f482806
3
+ size 557982872
mrc_seq2seq/bart-base/special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true}}
mrc_seq2seq/bart-base/tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"errors": "replace", "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "model_max_length": 1024, "name_or_path": "facebook/bart-base"}
mrc_seq2seq/bart-base/vocab.json ADDED
The diff for this file is too large to render. See raw diff