RichardErkhov commited on
Commit
f041667
·
verified ·
1 Parent(s): 084a371

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +334 -0
README.md ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ Jellyfish-8B - GGUF
11
+ - Model creator: https://huggingface.co/NECOUDBFM/
12
+ - Original model: https://huggingface.co/NECOUDBFM/Jellyfish-8B/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [Jellyfish-8B.Q2_K.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q2_K.gguf) | Q2_K | 2.96GB |
18
+ | [Jellyfish-8B.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.IQ3_XS.gguf) | IQ3_XS | 3.28GB |
19
+ | [Jellyfish-8B.IQ3_S.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.IQ3_S.gguf) | IQ3_S | 3.43GB |
20
+ | [Jellyfish-8B.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q3_K_S.gguf) | Q3_K_S | 3.41GB |
21
+ | [Jellyfish-8B.IQ3_M.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.IQ3_M.gguf) | IQ3_M | 3.52GB |
22
+ | [Jellyfish-8B.Q3_K.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q3_K.gguf) | Q3_K | 3.74GB |
23
+ | [Jellyfish-8B.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q3_K_M.gguf) | Q3_K_M | 3.74GB |
24
+ | [Jellyfish-8B.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q3_K_L.gguf) | Q3_K_L | 4.03GB |
25
+ | [Jellyfish-8B.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.IQ4_XS.gguf) | IQ4_XS | 4.18GB |
26
+ | [Jellyfish-8B.Q4_0.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q4_0.gguf) | Q4_0 | 4.34GB |
27
+ | [Jellyfish-8B.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.IQ4_NL.gguf) | IQ4_NL | 4.38GB |
28
+ | [Jellyfish-8B.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q4_K_S.gguf) | Q4_K_S | 4.37GB |
29
+ | [Jellyfish-8B.Q4_K.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q4_K.gguf) | Q4_K | 4.58GB |
30
+ | [Jellyfish-8B.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q4_K_M.gguf) | Q4_K_M | 4.58GB |
31
+ | [Jellyfish-8B.Q4_1.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q4_1.gguf) | Q4_1 | 4.78GB |
32
+ | [Jellyfish-8B.Q5_0.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q5_0.gguf) | Q5_0 | 5.21GB |
33
+ | [Jellyfish-8B.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q5_K_S.gguf) | Q5_K_S | 5.21GB |
34
+ | [Jellyfish-8B.Q5_K.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q5_K.gguf) | Q5_K | 5.34GB |
35
+ | [Jellyfish-8B.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q5_K_M.gguf) | Q5_K_M | 5.34GB |
36
+ | [Jellyfish-8B.Q5_1.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q5_1.gguf) | Q5_1 | 5.65GB |
37
+ | [Jellyfish-8B.Q6_K.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q6_K.gguf) | Q6_K | 6.14GB |
38
+ | [Jellyfish-8B.Q8_0.gguf](https://huggingface.co/RichardErkhov/NECOUDBFM_-_Jellyfish-8B-gguf/blob/main/Jellyfish-8B.Q8_0.gguf) | Q8_0 | 7.95GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ license: cc-by-nc-4.0
46
+ language:
47
+ - en
48
+ ---
49
+ # Jellyfish-8B
50
+ <!-- Provide a quick summary of what the model is/does. -->
51
+ <!--
52
+ <img src="https://i.imgur.com/d8Bl04i.png" alt="PicToModel" width="330"/>
53
+ -->
54
+ <img src="https://i.imgur.com/E1vqCIw.png" alt="PicToModel" width="330"/>
55
+
56
+ Jellyfish models with other sizes are available here:
57
+ [Jellyfish-7B](https://huggingface.co/NECOUDBFM/Jellyfish-7B)
58
+ [Jellyfish-13B](https://huggingface.co/NECOUDBFM/Jellyfish-13B)
59
+
60
+ ## Model Details
61
+ Jellyfish-8B is a large language model equipped with 8 billion parameters.
62
+ We fine-tuned the [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model using a subset of the [Jellyfish-Instruct](https://huggingface.co/datasets/NECOUDBFM/Jellyfish-Instruct) dataset.
63
+
64
+ <!-- Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%. -->
65
+
66
+ More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678).
67
+
68
+ - **Developed by:** Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
69
+ - **Contact: [email protected]**
70
+ - **Funded by:** NEC Corporation, Osaka University
71
+ - **Language(s) (NLP):** English
72
+ - **License:** Non-Commercial Creative Commons license (CC BY-NC-4.0)
73
+ - **Finetuned from model:** [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
74
+
75
+ ## Citation
76
+
77
+ If you find our work useful, please give us credit by citing:
78
+
79
+ ```
80
+ @article{zhang2023jellyfish,
81
+ title={Jellyfish: A Large Language Model for Data Preprocessing},
82
+ author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi},
83
+ journal={arXiv preprint arXiv:2312.01678},
84
+ year={2023}
85
+ }
86
+ ```
87
+
88
+ ## Performance on seen tasks
89
+
90
+ | Task | Type | Dataset | Non-LLM SoTA<sup>1</sup> | GPT-3.5<sup>2</sup> | GPT-4<sup>2</sup> | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
91
+ |-----------------|--------|-------------------|-----------------|--------|--------|--------|-----------|--------------|--------------|---------------|
92
+ | Error Detection | Seen | Adult | *99.10* | 99.10 | 92.01 | 83.58 | -- | 77.40 | 73.74 | **99.33** |
93
+ | Error Detection | Seen | Hospital | 94.40 | **97.80** | 90.74 | 44.76 | -- | 94.51 | 93.40 | *95.59* |
94
+ | Error Detection | Unseen | Flights | 81.00 | -- | **83.48** | 66.01 | -- | 69.15 | 66.21 | *82.52* |
95
+ | Error Detection | Unseen | Rayyan | 79.00 | -- | *81.95* | 68.53 | -- | 75.07 | 81.06 | **90.65** |
96
+ | Data Imputation | Seen | Buy | 96.50 | 98.50 | **100** | **100** | -- | 98.46 | 98.46 | **100** |
97
+ | Data Imputation | Seen | Restaurant | 77.20 | 88.40 | **97.67** | 90.70 | -- | 89.53 | 87.21 | 89.53 |
98
+ | Data Imputation | Unseen | Flipkart | 68.00 | -- | **89.94** | 83.20 | -- | 87.14 | *87.48* | 81.68 |
99
+ | Data Imputation | Unseen | Phone | 86.70 | -- | **90.79** | 86.78 | -- | 86.52 | 85.68 | *87.21* |
100
+ | Schema Matching | Seen | MIMIC-III | 20.00 | -- | 40.00 | 29.41 | -- | **53.33** | *45.45* | 40.00 |
101
+ | Schema Matching | Seen | Synthea | 38.50 | 45.20 | **66.67** | 6.56 | -- | 55.56 | 47.06 | 56.00 |
102
+ | Schema Matching | Unseen | CMS | *50.00* | -- | 19.35 | 22.22 | -- | 42.86 | 38.10 | **59.29** |
103
+ | Entity Matching | Seen | Amazon-Google | 75.58 | 63.50 | 74.21 | 70.91 | 70.10 | **81.69** | *81.42* | 81.34 |
104
+ | Entity Matching | Seen | Beer | 94.37 | **100** | **100** | 90.32 | 96.30 | **100.00** | **100.00** | 96.77 |
105
+ | Entity Matching | Seen | DBLP-ACM | **98.99** | 96.60 | 97.44 | 95.87 | 93.80 | 98.65 | 98.77 | *98.98* |
106
+ | Entity Matching | Seen | DBLP-GoogleScholar| *95.70* | 83.80 | 91.87 | 90.45 | 92.40 | 94.88 | 95.03 | **98.51** |
107
+ | Entity Matching | Seen | Fodors-Zagats | **100** | **100** | **100** | 93.62 | **100** | **100** | **100** | **100** |
108
+ | Entity Matching | Seen | iTunes-Amazon | 97.06 | *98.20*| **100** | 98.18 | 94.30 | 96.30 | 96.30 | 98.11 |
109
+ | Entity Matching | Unseen | Abt-Buy | 89.33 | -- | **92.77** | 78.73 | -- | 86.06 | 88.84 | *89.58* |
110
+ | Entity Matching | Unseen | Walmart-Amazon | 86.89 | 87.00 | **90.27** | 79.19 | 82.40 | 84.91 | 85.24 | *89.42* |
111
+ | Avg | | | 80.44 | - | *84.17* | 72.58 | - | 82.74 | 81.55 | **86.02** |
112
+
113
+ _For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. For Jellyfish models, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
114
+ _Accuracy as the metric for data imputation and the F1 score for other tasks._
115
+
116
+ 1.
117
+ [HoloDetect](https://arxiv.org/abs/1904.02285) for Error Detection seen datasets
118
+ [RAHA](https://dl.acm.org/doi/10.1145/3299869.3324956) for Error Detection unseen datasets
119
+ [IPM](https://ieeexplore.ieee.org/document/9458712) for Data Imputation
120
+ [SMAT](https://www.researchgate.net/publication/353920530_SMAT_An_Attention-Based_Deep_Learning_Solution_to_the_Automation_of_Schema_Matching) for Schema Matching
121
+ [Ditto](https://arxiv.org/abs/2004.00584) for Entity Matching
122
+ 3.
123
+ [Large Language Models as Data Preprocessors](https://arxiv.org/abs/2308.16361)
124
+
125
+ ## Performance on unseen tasks
126
+
127
+ ### Column Type Annotation
128
+
129
+ | Dataset | RoBERTa (159 shots)<sup>1</sup> | GPT-3.5<sup>1</sup> | GPT-4 | GPT-4o | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
130
+ |--------|-----------------|--------|--------|--------|--------------|--------------|---------------|
131
+ | SOTAB | 79.20 | 89.47 | 91.55 | 65.05 | 83 | 76.33 | 82 |
132
+
133
+ _Few-shot is disabled for Jellyfish models._
134
+
135
+ 1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745)
136
+
137
+ ### Attribute Value Extraction
138
+
139
+ | Dataset |Stable Beluga 2 70B<sup>1</sup> | SOLAR 70B<sup>1</sup> | GPT-3.5<sup>1</sup> | GPT-4 <sup>1</sup>| GPT-4o | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
140
+ | ---- | ---- | ---- | ---- | ---- | ---- | ----| ----| ----|
141
+ | AE-110k | 52.10 | 49.20 | 61.30 | 55.50 | 55.77 | 56.09 |59.55 | 58.12 |
142
+ | OA-Mine | 50.80 | 55.20 | 62.70 | 68.90 | 60.20 | 51.98 | 59.22 | 55.96 |
143
+
144
+ _Few-shot is disabled for Jellyfish models._
145
+
146
+ 1. Results from [Product Attribute Value Extraction using Large Language Models](https://arxiv.org/abs/2310.12537)
147
+
148
+ ## Prompt Template
149
+ ```
150
+ <|start_header_id|>system<|end_header_id|>{system message}<|eot_id|>
151
+ <|start_header_id|>user<|end_header_id|>{prompt}<|eot_id|>
152
+ <|start_header_id|>assistant<|end_header_id|>
153
+ ```
154
+
155
+ ## Training Details
156
+
157
+ ### Training Method
158
+
159
+ We used LoRA to speed up the training process, targeting the q_proj, k_proj, v_proj, and o_proj modules.
160
+
161
+ ## Uses
162
+
163
+ To accelerate the inference, we strongly recommend running Jellyfish using [vLLM](https://github.com/vllm-project/vllm).
164
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
165
+
166
+ ### Python Script
167
+ We provide two simple Python code examples for inference using the Jellyfish model.
168
+
169
+ #### Using Transformers and Torch Modules
170
+ <div style="height: auto; max-height: 400px; overflow-y: scroll;">
171
+
172
+ ```python
173
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
174
+ import torch
175
+
176
+ if torch.cuda.is_available():
177
+ device = "cuda"
178
+ else:
179
+ device = "cpu"
180
+
181
+ # Model will be automatically downloaded from HuggingFace model hub if not cached.
182
+ # Model files will be cached in "~/.cache/huggingface/hub/models--NECOUDBFM--Jellyfish/" by default.
183
+ # You can also download the model manually and replace the model name with the path to the model files.
184
+ model = AutoModelForCausalLM.from_pretrained(
185
+ "NECOUDBFM/Jellyfish",
186
+ torch_dtype=torch.float16,
187
+ device_map="auto",
188
+ )
189
+ tokenizer = AutoTokenizer.from_pretrained("NECOUDBFM/Jellyfish")
190
+
191
+ system_message = "You are an AI assistant that follows instruction extremely well. Help as much as you can."
192
+
193
+ # You need to define the user_message variable based on the task and the data you want to test on.
194
+ user_message = "Hello, world."
195
+
196
+ prompt = f"<|start_header_id|>system<|end_header_id|>{system message}<|eot_id|>\n<|start_header_id|>user<|end_header_id|>{user_message}<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>"
197
+ inputs = tokenizer(prompt, return_tensors="pt")
198
+ input_ids = inputs["input_ids"].to(device)
199
+
200
+ # You can modify the sampling parameters according to your needs.
201
+ generation_config = GenerationConfig(
202
+ do_samples=True,
203
+ temperature=0.35,
204
+ top_p=0.9,
205
+ )
206
+
207
+ with torch.no_grad():
208
+ generation_output = model.generate(
209
+ input_ids=input_ids,
210
+ generation_config=generation_config,
211
+ return_dict_in_generate=True,
212
+ output_scores=True,
213
+ max_new_tokens=1024,
214
+ pad_token_id=tokenizer.eos_token_id,
215
+ repetition_penalty=1.15,
216
+ )
217
+
218
+ output = generation_output[0]
219
+ response = tokenizer.decode(
220
+ output[:, input_ids.shape[-1] :][0], skip_special_tokens=True
221
+ ).strip()
222
+
223
+ print(response)
224
+
225
+ ```
226
+ </div>
227
+
228
+ #### Using vLLM
229
+ <div style="height: auto; max-height: 400px; overflow-y: scroll;">
230
+
231
+ ```python
232
+ from vllm import LLM, SamplingParams
233
+
234
+ # To use vllm for inference, you need to download the model files either using HuggingFace model hub or manually.
235
+ # You should modify the path to the model according to your local environment.
236
+ path_to_model = (
237
+ "/workspace/models/Jellyfish"
238
+ )
239
+
240
+ model = LLM(model=path_to_model)
241
+
242
+ # You can modify the sampling parameters according to your needs.
243
+ # Caution: The stop parameter should not be changed.
244
+ sampling_params = SamplingParams(
245
+ temperature=0.35,
246
+ top_p=0.9,
247
+ max_tokens=1024,
248
+ stop=["<|eot_id|>"],
249
+ )
250
+
251
+ system_message = "You are an AI assistant that follows instruction extremely well. Help as much as you can."
252
+
253
+ # You need to define the user_message variable based on the task and the data you want to test on.
254
+ user_message = "Hello, world."
255
+
256
+ prompt = ff"<|start_header_id|>system<|end_header_id|>{system message}<|eot_id|>\n<|start_header_id|>user<|end_header_id|>{user_message}<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>"
257
+ outputs = model.generate(prompt, sampling_params)
258
+ response = outputs[0].outputs[0].text.strip()
259
+ print(response)
260
+
261
+ ```
262
+ </div>
263
+
264
+ ## Prompts
265
+
266
+ We provide the prompts used for both fine-tuning and inference.
267
+ You can structure your data according to these prompts.
268
+
269
+ ### System Message
270
+ ```
271
+ You are an AI assistant that follows instruction extremely well.
272
+ User will give you a question. Your task is to answer as faithfully as you can.
273
+ ```
274
+
275
+ ### For Error Detection
276
+ _There are two forms of the error detection task.
277
+ In the first form, a complete record row is provided, and the task is to determine if a specific value is erroneous.
278
+ In the second form, only the value of a specific attribute is given, and the decision about its correctness is based solely on the attribute's name and value.
279
+ The subsequent prompt examples pertain to these two forms, respectively._
280
+ ```
281
+ Your task is to determine if there is an error in the value of a specific attribute within the whole record provided.
282
+ The attributes may include {attribute 1}, {attribute 2}, ...
283
+ Errors may include, but are not limited to, spelling errors, inconsistencies, or values that don't make sense given the context of the whole record.
284
+ Record [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
285
+ Attribute for Verification: [{attribute X}: {attribute X value}]
286
+ Question: Is there an error in the value of {attribute X}? Choose your answer from: [Yes, No].
287
+ ```
288
+ ```
289
+ Your task is to determine if there is an error in the value of a specific attribute.
290
+ The attributes may belong to a {keyword} record and could be one of the following: {attribute 1}, {attribute 2}, ...
291
+ Errors can include, but are not limited to, spelling errors, inconsistencies, or values that don't make sense for that attribute.
292
+ Note: Missing values (N/A or \"nan\") are not considered errors.
293
+ Attribute for Verification: [{attribute X}: {attribute X value}]
294
+ Question: Is there an error in the value of {attribute X}? Choose your answer from: [Yes, No].
295
+ ```
296
+ ### For Data Imputation
297
+ ```
298
+ You are presented with a {keyword} record that is missing a specific attribute: {attribute X}.
299
+ Your task is to deduce or infer the value of {attribute X} using the available information in the record.
300
+ You may be provided with fields like {attribute 1}, {attribute 2}, ... to help you in the inference.
301
+ Record: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
302
+ Based on the provided record, what would you infer is the value for the missing attribute {attribute X}?
303
+ Answer only the value of {attribute X}.
304
+ ```
305
+
306
+ ### For Schema Matching
307
+ ```
308
+ Your task is to determine if the two attributes (columns) are semantically equivalent in the context of merging two tables.
309
+ Each attribute will be provided by its name and a brief description.
310
+ Your goal is to assess if they refer to the same information based on these names and descriptions provided.
311
+ Attribute A is [name: {value of name}, description: {value of description}].
312
+ Attribute B is [name: {value of name}, description: {value of description}].
313
+ Are Attribute A and Attribute B semantically equivalent? Choose your answer from: [Yes, No].
314
+ ```
315
+
316
+ ### For Entity Matching
317
+ ```
318
+ You are tasked with determining whether two records listed below are the same based on the information provided.
319
+ Carefully compare the {attribute 1}, {attribute 2}... for each record before making your decision.
320
+ Note that missing values (N/A or \"nan\") should not be used as a basis for your decision.
321
+ Record A: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
322
+ Record B: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
323
+ Are record A and record B the same entity? Choose your answer from: [Yes, No].
324
+ ```
325
+
326
+ ### For Column Type Annotation
327
+
328
+ We follow the prompt in [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745) (text+inst+2-step).
329
+
330
+ ### For Attribute Value Extraction
331
+
332
+ We follow the prompt in [Product Attribute Value Extraction using Large Language Models](https://arxiv.org/abs/2310.12537) (textual, w/o examples).
333
+
334
+