|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- en |
|
--- |
|
# Jellyfish-8B |
|
<!-- Provide a quick summary of what the model is/does. --> |
|
<!-- |
|
<img src="https://i.imgur.com/d8Bl04i.png" alt="PicToModel" width="330"/> |
|
--> |
|
<img src="https://i.imgur.com/E1vqCIw.png" alt="PicToModel" width="330"/> |
|
|
|
|
|
## Model Details |
|
Jellyfish-8B is a large language model equipped with 8 billion parameters. |
|
We fine-tuned the [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model using the datasets pertinent to data preprocessing tasks. |
|
The training data include two parts: |
|
* Jellyfish-13B training data |
|
* GPT4 generated reasoning data for data preprocessing tasks. |
|
|
|
<!-- Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%. --> |
|
|
|
More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678). |
|
|
|
- **Developed by:** Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada |
|
- **Contact: [email protected]** |
|
- **Funded by:** NEC Corporation, Osaka University |
|
- **Language(s) (NLP):** English |
|
- **License:** Non-Commercial Creative Commons license (CC BY-NC-4.0) |
|
- **Finetuned from model:** [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
|
## Citation |
|
|
|
If you find our work useful, please give us credit by citing: |
|
|
|
``` |
|
@article{zhang2023jellyfish, |
|
title={Jellyfish: A Large Language Model for Data Preprocessing}, |
|
author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi}, |
|
journal={arXiv preprint arXiv:2312.01678}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
## Performance on seen tasks |
|
|
|
| Task | Type | Dataset | Non-LLM SoTA<sup>1</sup> | GPT-3.5<sup>2</sup> | GPT-4<sup>2</sup> | GPT-4o | Jellyfish-13B | Jellyfish-7B | Jellyfish-8B | |
|
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | |
|
| Entity Matching | Seen | Fodors-Zagats | 100 | 100 | 100 | | 100 | 100 | 92.68 | |
|
| Entity Matching | Seen | Beer | 94.37| 96.30 | 100 | | 96.77 | 96.55| 96.30 | |
|
| Entity Matching | Seen | iTunes-Amazon | 97.06| 96.43 | 100 | | 98.11 | 96.30| 92.00 | |
|
| Entity Matching | Seen | DBLP-ACM | 98.99| 96.99 | 97.44 | | 98.98 | 98.88| 98.76 | |
|
| Entity Matching | Seen | DBLP-GoogleScholar | 95.60| 76.12 | 91.87 | | 98.51 | 95.15| 93.20 | |
|
| Entity Matching | Seen | Amazon-Google | 75.58| 66.53 | 74.21 | 70.91 | 81.34 | 80.83 | 74.49 | |
|
| Entity Matching | Unseen | Walmart-Amazon | 86.76| 86.17 | 90.27 | | 89.42 | 85.64 | 89.97 | |
|
| Entity Matching | Unseen | Abt-Buy | 89.33 | -- | 92.77 | | 89.58 | 82.38 | 92.54 | |
|
| Data Imputation | Seen | Restaurant | 77.20| 94.19 | 97.67 | | 94.19 | 88.37 | 87.21 | |
|
| Data Imputation | Seen | Buy | 96.50| 98.46 | 100 | | 100 | 96.62 | 92.31 | |
|
| Data Imputation | Unseen | Filpkart | 68.00 | -- | 89.94 | | 81.68 | 79.44| 90.17 | |
|
| Data Imputation | Unseen | Phone | 86.70| -- | 90.79 | | 87.21 | 85.00| 83.92 | |
|
| Error Detection | Seen | Hosptial | 94.40| 90.74 | 90.74 | 44.76 | 95.59 | 96.27 | 80.72| |
|
| Error Detection | Seen | Adult | 99.10| 92.01 | 92.01 | 83.58 | 99.33 | 91.96 | 81.72| |
|
| Error Detection | Unseen | Flights | 81.00 | -- | 83.48 | 66.01 | 82.52 | 66.92 | 75.18 | |
|
| Error Detection | Unseen | Rayyan | 79.00| -- | 81.95 | 68.53 | 90.65 | 69.82 | 91.54 | |
|
| Schema Matching | Seen | Sythea | 38.50| 57.14 | 66.67 | 6.56 | 36.36 | 44.44 | 27.27 | |
|
| Schema Matching | Seen | MIMIC | 20.00| -- | 40.00 | 29.41 | 40.00 | 40.00 | 34.04| |
|
| Schema Matching | Unseen | CMS | 50.00| -- | 19.35 | 22.22 | 59.29 | 13.79 | 56.72| |
|
|
|
_For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._ |
|
_Accuracy as the metric for data imputation and the F1 score for other tasks._ |
|
|
|
| Task | Type | Dataset | Best of non-LLM | GPT-3 | GPT-3.5 | GPT-4 | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B | |
|
|------|--------|----------------------|-----------------|-------|---------|-------|--------|-----------|--------------|--------------|---------------| |
|
| Error Detection | Seen | Adult | *99.10 | 99.10 | 92.01 | 92.01 | 83.58 | -- | 77.40 | 73.74 | **99.33 | |
|
| | | Hospital | 94.40 | **97.80 | 90.74 | 90.74 | 44.76 | -- | 94.51 | 93.40 | *95.59 | |
|
| | Unseen | Flights | 81.00 | -- | -- | **83.48 | 66.01 | -- | 69.15 | 66.21 | *82.52 | |
|
| | | Rayyan | 79.00 | -- | -- | *81.95 | 68.53 | -- | 75.07 | 81.06 | **90.65 | |
|
| Data Imputation | Seen | Buy | 96.50 | 98.50 | 98.46 | **100 | **100 | -- | 98.46 | 98.46 | **100 | |
|
| | | Restaurant | 77.20 | 88.40 | *94.19 | **97.67 | 90.70 | -- | 89.53 | 87.21 | 89.53 | |
|
| | Unseen | Flipkart | 68.00 | -- | -- | **89.94 | 83.20 | -- | 87.14 | *87.48 | 81.68 | |
|
| | | Phone | 86.70 | -- | -- | **90.79 | 86.78 | -- | 86.52 | 85.68 | *87.21 | |
|
| Schema Matching | Seen | MIMIC-III | 20.00 | -- | -- | 40.00 | 29.41 | -- | **53.33 | *45.45 | 40.00 | |
|
| | | Synthea | 38.50 | 45.20 | *57.14 | **66.67 | 6.56 | -- | 55.56 | 47.06 | 56.00 | |
|
| | Unseen | CMS | *50.00 | -- | -- | 19.35 | 22.22 | -- | 42.86 | 38.10 | **59.29 | |
|
| Entity Matching | Seen | Amazon-Google | 75.58 | 63.50 | 66.50 | 74.21 | 70.91 | 70.10 | **81.69 | *81.42 | 81.34 | |
|
| | | Beer | 94.37 | **100 | 96.30 | **100 | 90.32 | 96.30 | **100.00 | **100.00 | 96.77 | |
|
| | | DBLP-ACM | **98.99 | 96.60 | 96.99 | 97.44 | 95.87 | 93.80 | 98.65 | 98.77 | *98.98 | |
|
| | | DBLP-GoogleScholar | *95.70 | 83.80 | 76.12 | 91.87 | 90.45 | 92.40 | 94.88 | 95.03 | **98.51 | |
|
| | | Fodors-Zagats | **100 | **100 | **100 | **100 | 93.62 | **100 | **100 | **100 | **100 | |
|
| | | iTunes-Amazon | 97.06 | *98.20 | 96.40 | **100 | 98.18 | 94.30 | 96.30 | 96.30 | 98.11 | |
|
| | Unseen | Abt-Buy | 89.33 | -- | -- | **92.77 | 78.73 | -- | 86.06 | 88.84 | *89.58 | |
|
| | | Walmart-Amazon | 86.89 | 87.00 | 86.17 | **90.27 | 79.19 | 82.40 | 84.91 | 85.24 | *89.42 | |
|
| Avg | | | 80.44 | - | - | *84.17 | 72.58 | - | 82.74 | 81.55 | **86.02 | |
|
|
|
## Performance on unseen tasks |
|
|
|
### Column Type Annotation |
|
|
|
| Dataset | RoBERTa (159 shots)<sup>1</sup> | GPT-3.5<sup>1</sup> | GPT-4 | Jellfish-13B| Jellyfish-7B | Jellyfish-8B | |
|
| ---- | ---- | ---- | ---- | ---- | ----|----| |
|
| SOTAB | 79.20 | 89.47 | 91.55 | 82.00 | 80.89 | 67.21| |
|
|
|
_Few-shot is disabled for Jellyfish-13B._ |
|
|
|
1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745) |
|
|
|
### Attribute Value Extraction |
|
|
|
| Dataset |Stable Beluga 2 70B<sup>1</sup> | SOLAR 70B<sup>1</sup> | GPT-3.5<sup>1</sup> | GPT-4 <sup>1</sup>| Jellfish-13B | Jellyfish-7B| Jellyfish-8B | |
|
| ---- | ---- | ---- | ---- | ---- | ---- | ----| ----| |
|
| AE-110k | 52.10 | 49.20 | 61.30 | 55.50 | 58.12 | 76.85| 69.78| |
|
| OA-Mine | 50.80 | 55.20 | 62.70 | 68.90 | 55.96 | 76.04| 78.83| |
|
|
|
|
|
## Prompt Template |
|
``` |
|
[INST]: |
|
|
|
<prompt> (without the <>) |
|
|
|
[\INST]] |
|
``` |
|
|