File size: 8,210 Bytes
fcc571f
 
496eed3
 
fcc571f
496eed3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2a53d0
eac36e0
455b82a
b2a53d0
 
 
 
455b82a
b2a53d0
 
 
 
 
 
455b82a
 
 
 
 
 
 
496eed3
 
 
 
455b82a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
496eed3
 
 
 
 
 
783ac46
496eed3
 
 
 
 
 
 
 
 
783ac46
 
496eed3
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
license: cc-by-nc-4.0
language:
- en
---
# Jellyfish-8B
<!-- Provide a quick summary of what the model is/does. -->
<!--
<img src="https://i.imgur.com/d8Bl04i.png" alt="PicToModel" width="330"/>
-->
<img src="https://i.imgur.com/E1vqCIw.png" alt="PicToModel" width="330"/>


## Model Details
Jellyfish-8B is a large language model equipped with 8 billion parameters.   
We fine-tuned the [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model using the datasets pertinent to data preprocessing tasks.
The training data include two parts:
* Jellyfish-13B training data
* GPT4 generated reasoning data for data preprocessing tasks.

<!-- Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%. -->

More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678).

- **Developed by:** Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada  
- **Contact: [email protected]**  
- **Funded by:** NEC Corporation, Osaka University  
- **Language(s) (NLP):** English  
- **License:** Non-Commercial Creative Commons license (CC BY-NC-4.0)  
- **Finetuned from model:** [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) 
## Citation

If you find our work useful, please give us credit by citing:

```
@article{zhang2023jellyfish,
  title={Jellyfish: A Large Language Model for Data Preprocessing},
  author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi},
  journal={arXiv preprint arXiv:2312.01678},
  year={2023}
}
```

## Performance on seen tasks

|  Task  | Type | Dataset | Non-LLM SoTA<sup>1</sup> | GPT-3.5<sup>2</sup> | GPT-4<sup>2</sup> | GPT-4o | Jellyfish-13B | Jellyfish-7B |  Jellyfish-8B | 
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |  ---- |  ---- |  
| Entity Matching  | Seen | Fodors-Zagats  | 100  | 100 | 100 |  | 100 | 100 | 92.68 |
| Entity Matching  | Seen | Beer           | 94.37| 96.30 | 100 | | 96.77 | 96.55| 96.30 |
| Entity Matching  | Seen | iTunes-Amazon  | 97.06| 96.43 | 100 | | 98.11 | 96.30| 92.00 |
| Entity Matching  | Seen | DBLP-ACM       | 98.99| 96.99 | 97.44 | | 98.98 | 98.88| 98.76 |
| Entity Matching  | Seen | DBLP-GoogleScholar | 95.60| 76.12 | 91.87 | | 98.51 | 95.15| 93.20 | 
| Entity Matching  | Seen | Amazon-Google  | 75.58| 66.53 | 74.21 | 70.91 | 81.34 | 80.83 |  74.49 |
| Entity Matching  | Unseen | Walmart-Amazon | 86.76| 86.17 | 90.27 | | 89.42 | 85.64 | 89.97 |
| Entity Matching  | Unseen | Abt-Buy | 89.33 | -- | 92.77 | | 89.58 | 82.38 |  92.54 |
| Data Imputation  | Seen |  Restaurant    | 77.20| 94.19 | 97.67 | | 94.19 | 88.37 |  87.21 |
| Data Imputation  | Seen |  Buy           | 96.50| 98.46 | 100 | | 100 | 96.62 |  92.31 |
| Data Imputation  | Unseen |  Filpkart    | 68.00 | -- | 89.94 | | 81.68 | 79.44|  90.17 |
| Data Imputation  | Unseen |  Phone       | 86.70| -- | 90.79 | | 87.21 | 85.00|  83.92 |
| Error Detection  | Seen |  Hosptial      | 94.40| 90.74 | 90.74 | 44.76 | 95.59 | 96.27 |  80.72|
| Error Detection  | Seen |  Adult         | 99.10| 92.01 | 92.01 | 83.58 | 99.33 | 91.96 |  81.72|
| Error Detection  | Unseen |  Flights     | 81.00 | -- | 83.48  | 66.01 | 82.52 | 66.92 | 75.18 |
| Error Detection  | Unseen |  Rayyan      | 79.00| -- | 81.95 | 68.53 | 90.65 | 69.82 | 91.54 |
| Schema Matching  | Seen |  Sythea        | 38.50| 57.14 | 66.67 | 6.56 | 36.36 | 44.44 | 27.27 |
| Schema Matching  | Seen |  MIMIC        | 20.00| -- | 40.00 | 29.41 | 40.00 | 40.00 | 34.04|
| Schema Matching  | Unseen |  CMS        | 50.00| -- | 19.35 | 22.22 | 59.29 | 13.79 |  56.72|

_For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._   
_Accuracy as the metric for data imputation and the F1 score for other tasks._ 

| Task | Type   | Dataset              | Best of non-LLM | GPT-3 | GPT-3.5 | GPT-4 | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
|------|--------|----------------------|-----------------|-------|---------|-------|--------|-----------|--------------|--------------|---------------|
| Error Detection   | Seen   | Adult                | *99.10          | 99.10 | 92.01   | 92.01 | 83.58  | --        | 77.40        | 73.74        | **99.33       |
|      |        | Hospital             | 94.40           | **97.80 | 90.74  | 90.74 | 44.76  | --        | 94.51        | 93.40        | *95.59        |
|      | Unseen | Flights              | 81.00           | --     | --     | **83.48 | 66.01 | --        | 69.15        | 66.21        | *82.52        |
|      |        | Rayyan               | 79.00           | --     | --     | *81.95 | 68.53 | --        | 75.07        | 81.06        | **90.65       |
| Data Imputation   | Seen   | Buy                  | 96.50           | 98.50  | 98.46   | **100  | **100 | --        | 98.46        | 98.46        | **100         |
|      |        | Restaurant           | 77.20           | 88.40  | *94.19  | **97.67 | 90.70 | --        | 89.53        | 87.21        | 89.53         |
|      | Unseen | Flipkart             | 68.00           | --     | --     | **89.94 | 83.20 | --        | 87.14        | *87.48       | 81.68         |
|      |        | Phone                | 86.70           | --     | --     | **90.79 | 86.78 | --        | 86.52        | 85.68        | *87.21        |
| Schema Matching   | Seen   | MIMIC-III            | 20.00           | --     | --     | 40.00  | 29.41 | --        | **53.33      | *45.45       | 40.00         |
|      |        | Synthea              | 38.50           | 45.20  | *57.14 | **66.67 | 6.56  | --        | 55.56        | 47.06        | 56.00         |
|      | Unseen | CMS                  | *50.00          | --     | --     | 19.35  | 22.22 | --        | 42.86        | 38.10        | **59.29       |
| Entity Matching   | Seen   | Amazon-Google        | 75.58           | 63.50  | 66.50  | 74.21  | 70.91 | 70.10     | **81.69      | *81.42       | 81.34         |
|      |        | Beer                 | 94.37           | **100  | 96.30  | **100  | 90.32 | 96.30     | **100.00     | **100.00     | 96.77         |
|      |        | DBLP-ACM             | **98.99         | 96.60  | 96.99  | 97.44  | 95.87 | 93.80     | 98.65        | 98.77        | *98.98        |
|      |        | DBLP-GoogleScholar   | *95.70          | 83.80  | 76.12  | 91.87  | 90.45 | 92.40     | 94.88        | 95.03        | **98.51       |
|      |        | Fodors-Zagats        | **100           | **100  | **100  | **100  | 93.62 | **100     | **100        | **100        | **100         |
|      |        | iTunes-Amazon        | 97.06           | *98.20 | 96.40  | **100  | 98.18 | 94.30     | 96.30        | 96.30        | 98.11         |
|      | Unseen | Abt-Buy              | 89.33           | --     | --     | **92.77 | 78.73 | --        | 86.06        | 88.84        | *89.58        |
|      |        | Walmart-Amazon       | 86.89           | 87.00  | 86.17  | **90.27 | 79.19 | 82.40     | 84.91        | 85.24        | *89.42        |
| Avg  |        |                      | 80.44           | -      | -      | *84.17 | 72.58 | -         | 82.74        | 81.55        | **86.02       |

## Performance on unseen tasks

### Column Type Annotation

| Dataset | RoBERTa (159 shots)<sup>1</sup> | GPT-3.5<sup>1</sup> | GPT-4 | Jellfish-13B| Jellyfish-7B |   Jellyfish-8B |
| ---- | ---- | ---- | ---- | ---- | ----|----|
| SOTAB | 79.20 | 89.47 | 91.55 | 82.00 | 80.89 | 67.21|

_Few-shot is disabled for Jellyfish-13B._   

1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745)

### Attribute Value Extraction

| Dataset |Stable Beluga 2 70B<sup>1</sup> | SOLAR 70B<sup>1</sup> | GPT-3.5<sup>1</sup> | GPT-4 <sup>1</sup>| Jellfish-13B | Jellyfish-7B|   Jellyfish-8B |
| ---- | ---- | ---- | ---- | ---- | ---- | ----| ----|
| AE-110k | 52.10 | 49.20 | 61.30 | 55.50 | 58.12 | 76.85| 69.78|
| OA-Mine | 50.80 | 55.20 | 62.70 | 68.90 | 55.96 | 76.04| 78.83|


## Prompt Template
```
[INST]:

<prompt> (without the <>)

[\INST]]
```