Jellyfish-8B / README.md

Update README.md

455b82a verified 9 months ago

8.21 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	---
	# Jellyfish-8B
	<!-- Provide a quick summary of what the model is/does. -->
	<!--
	<img src="https://i.imgur.com/d8Bl04i.png" alt="PicToModel" width="330"/>
	-->
	<img src="https://i.imgur.com/E1vqCIw.png" alt="PicToModel" width="330"/>


	## Model Details
	Jellyfish-8B is a large language model equipped with 8 billion parameters.
	We fine-tuned the [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model using the datasets pertinent to data preprocessing tasks.
	The training data include two parts:
	* Jellyfish-13B training data
	* GPT4 generated reasoning data for data preprocessing tasks.

	<!-- Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%. -->

	More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678).

	- Developed by: Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
	- Contact: [email protected]
	- Funded by: NEC Corporation, Osaka University
	- Language(s) (NLP): English
	- License: Non-Commercial Creative Commons license (CC BY-NC-4.0)
	- Finetuned from model: [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
	## Citation

	If you find our work useful, please give us credit by citing:

	```
	@article{zhang2023jellyfish,
	title={Jellyfish: A Large Language Model for Data Preprocessing},
	author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi},
	journal={arXiv preprint arXiv:2312.01678},
	year={2023}
	}
	```

	## Performance on seen tasks

	\| Task \| Type \| Dataset \| Non-LLM SoTA<sup>1</sup> \| GPT-3.5<sup>2</sup> \| GPT-4<sup>2</sup> \| GPT-4o \| Jellyfish-13B \| Jellyfish-7B \| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
	\| Entity Matching \| Seen \| Fodors-Zagats \| 100 \| 100 \| 100 \| \| 100 \| 100 \| 92.68 \|
	\| Entity Matching \| Seen \| Beer \| 94.37\| 96.30 \| 100 \| \| 96.77 \| 96.55\| 96.30 \|
	\| Entity Matching \| Seen \| iTunes-Amazon \| 97.06\| 96.43 \| 100 \| \| 98.11 \| 96.30\| 92.00 \|
	\| Entity Matching \| Seen \| DBLP-ACM \| 98.99\| 96.99 \| 97.44 \| \| 98.98 \| 98.88\| 98.76 \|
	\| Entity Matching \| Seen \| DBLP-GoogleScholar \| 95.60\| 76.12 \| 91.87 \| \| 98.51 \| 95.15\| 93.20 \|
	\| Entity Matching \| Seen \| Amazon-Google \| 75.58\| 66.53 \| 74.21 \| 70.91 \| 81.34 \| 80.83 \| 74.49 \|
	\| Entity Matching \| Unseen \| Walmart-Amazon \| 86.76\| 86.17 \| 90.27 \| \| 89.42 \| 85.64 \| 89.97 \|
	\| Entity Matching \| Unseen \| Abt-Buy \| 89.33 \| -- \| 92.77 \| \| 89.58 \| 82.38 \| 92.54 \|
	\| Data Imputation \| Seen \| Restaurant \| 77.20\| 94.19 \| 97.67 \| \| 94.19 \| 88.37 \| 87.21 \|
	\| Data Imputation \| Seen \| Buy \| 96.50\| 98.46 \| 100 \| \| 100 \| 96.62 \| 92.31 \|
	\| Data Imputation \| Unseen \| Filpkart \| 68.00 \| -- \| 89.94 \| \| 81.68 \| 79.44\| 90.17 \|
	\| Data Imputation \| Unseen \| Phone \| 86.70\| -- \| 90.79 \| \| 87.21 \| 85.00\| 83.92 \|
	\| Error Detection \| Seen \| Hosptial \| 94.40\| 90.74 \| 90.74 \| 44.76 \| 95.59 \| 96.27 \| 80.72\|
	\| Error Detection \| Seen \| Adult \| 99.10\| 92.01 \| 92.01 \| 83.58 \| 99.33 \| 91.96 \| 81.72\|
	\| Error Detection \| Unseen \| Flights \| 81.00 \| -- \| 83.48 \| 66.01 \| 82.52 \| 66.92 \| 75.18 \|
	\| Error Detection \| Unseen \| Rayyan \| 79.00\| -- \| 81.95 \| 68.53 \| 90.65 \| 69.82 \| 91.54 \|
	\| Schema Matching \| Seen \| Sythea \| 38.50\| 57.14 \| 66.67 \| 6.56 \| 36.36 \| 44.44 \| 27.27 \|
	\| Schema Matching \| Seen \| MIMIC \| 20.00\| -- \| 40.00 \| 29.41 \| 40.00 \| 40.00 \| 34.04\|
	\| Schema Matching \| Unseen \| CMS \| 50.00\| -- \| 19.35 \| 22.22 \| 59.29 \| 13.79 \| 56.72\|

	_For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
	_Accuracy as the metric for data imputation and the F1 score for other tasks._

	\| Task \| Type \| Dataset \| Best of non-LLM \| GPT-3 \| GPT-3.5 \| GPT-4 \| GPT-4o \| Table-GPT \| Jellyfish-7B \| Jellyfish-8B \| Jellyfish-13B \|
	\|------\|--------\|----------------------\|-----------------\|-------\|---------\|-------\|--------\|-----------\|--------------\|--------------\|---------------\|
	\| Error Detection \| Seen \| Adult \| 99.10 \| 99.10 \| 92.01 \| 92.01 \| 83.58 \| -- \| 77.40 \| 73.74 \| *99.33 \|
	\| \| \| Hospital \| 94.40 \| *97.80 \| 90.74 \| 90.74 \| 44.76 \| -- \| 94.51 \| 93.40 \| 95.59 \|
	\| \| Unseen \| Flights \| 81.00 \| -- \| -- \| *83.48 \| 66.01 \| -- \| 69.15 \| 66.21 \| 82.52 \|
	\| \| \| Rayyan \| 79.00 \| -- \| -- \| 81.95 \| 68.53 \| -- \| 75.07 \| 81.06 \| *90.65 \|
	\| Data Imputation \| Seen \| Buy \| 96.50 \| 98.50 \| 98.46 \| 100 \| 100 \| -- \| 98.46 \| 98.46 \| **100 \|
	\| \| \| Restaurant \| 77.20 \| 88.40 \| 94.19 \| *97.67 \| 90.70 \| -- \| 89.53 \| 87.21 \| 89.53 \|
	\| \| Unseen \| Flipkart \| 68.00 \| -- \| -- \| *89.94 \| 83.20 \| -- \| 87.14 \| 87.48 \| 81.68 \|
	\| \| \| Phone \| 86.70 \| -- \| -- \| *90.79 \| 86.78 \| -- \| 86.52 \| 85.68 \| 87.21 \|
	\| Schema Matching \| Seen \| MIMIC-III \| 20.00 \| -- \| -- \| 40.00 \| 29.41 \| -- \| *53.33 \| 45.45 \| 40.00 \|
	\| \| \| Synthea \| 38.50 \| 45.20 \| 57.14 \| *66.67 \| 6.56 \| -- \| 55.56 \| 47.06 \| 56.00 \|
	\| \| Unseen \| CMS \| 50.00 \| -- \| -- \| 19.35 \| 22.22 \| -- \| 42.86 \| 38.10 \| *59.29 \|
	\| Entity Matching \| Seen \| Amazon-Google \| 75.58 \| 63.50 \| 66.50 \| 74.21 \| 70.91 \| 70.10 \| *81.69 \| 81.42 \| 81.34 \|
	\| \| \| Beer \| 94.37 \| 100 \| 96.30 \| 100 \| 90.32 \| 96.30 \| 100.00 \| 100.00 \| 96.77 \|
	\| \| \| DBLP-ACM \| *98.99 \| 96.60 \| 96.99 \| 97.44 \| 95.87 \| 93.80 \| 98.65 \| 98.77 \| 98.98 \|
	\| \| \| DBLP-GoogleScholar \| 95.70 \| 83.80 \| 76.12 \| 91.87 \| 90.45 \| 92.40 \| 94.88 \| 95.03 \| *98.51 \|
	\| \| \| Fodors-Zagats \| 100 \| 100 \| 100 \| 100 \| 93.62 \| 100 \| 100 \| 100 \| 100 \|
	\| \| \| iTunes-Amazon \| 97.06 \| 98.20 \| 96.40 \| *100 \| 98.18 \| 94.30 \| 96.30 \| 96.30 \| 98.11 \|
	\| \| Unseen \| Abt-Buy \| 89.33 \| -- \| -- \| *92.77 \| 78.73 \| -- \| 86.06 \| 88.84 \| 89.58 \|
	\| \| \| Walmart-Amazon \| 86.89 \| 87.00 \| 86.17 \| *90.27 \| 79.19 \| 82.40 \| 84.91 \| 85.24 \| 89.42 \|
	\| Avg \| \| \| 80.44 \| - \| - \| 84.17 \| 72.58 \| - \| 82.74 \| 81.55 \| *86.02 \|

	## Performance on unseen tasks

	### Column Type Annotation

	\| Dataset \| RoBERTa (159 shots)<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 \| Jellfish-13B\| Jellyfish-7B \| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----\|----\|
	\| SOTAB \| 79.20 \| 89.47 \| 91.55 \| 82.00 \| 80.89 \| 67.21\|

	_Few-shot is disabled for Jellyfish-13B._

	1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745)

	### Attribute Value Extraction

	\| Dataset \|Stable Beluga 2 70B<sup>1</sup> \| SOLAR 70B<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 <sup>1</sup>\| Jellfish-13B \| Jellyfish-7B\| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ----\| ----\|
	\| AE-110k \| 52.10 \| 49.20 \| 61.30 \| 55.50 \| 58.12 \| 76.85\| 69.78\|
	\| OA-Mine \| 50.80 \| 55.20 \| 62.70 \| 68.90 \| 55.96 \| 76.04\| 78.83\|


	## Prompt Template
	```
	[INST]:

	<prompt> (without the <>)

	[\INST]]
	```

	---
	license: cc-by-nc-4.0
	language:
	- en
	---
	# Jellyfish-8B
	<!-- Provide a quick summary of what the model is/does. -->
	<!--
	<img src="https://i.imgur.com/d8Bl04i.png" alt="PicToModel" width="330"/>
	-->
	<img src="https://i.imgur.com/E1vqCIw.png" alt="PicToModel" width="330"/>


	## Model Details
	Jellyfish-8B is a large language model equipped with 8 billion parameters.
	We fine-tuned the [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model using the datasets pertinent to data preprocessing tasks.
	The training data include two parts:
	* Jellyfish-13B training data
	* GPT4 generated reasoning data for data preprocessing tasks.

	<!-- Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%. -->

	More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678).

	- Developed by: Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
	- Contact: [email protected]
	- Funded by: NEC Corporation, Osaka University
	- Language(s) (NLP): English
	- License: Non-Commercial Creative Commons license (CC BY-NC-4.0)
	- Finetuned from model: [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
	## Citation

	If you find our work useful, please give us credit by citing:

	```
	@article{zhang2023jellyfish,
	title={Jellyfish: A Large Language Model for Data Preprocessing},
	author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi},
	journal={arXiv preprint arXiv:2312.01678},
	year={2023}
	}
	```

	## Performance on seen tasks

	\| Task \| Type \| Dataset \| Non-LLM SoTA<sup>1</sup> \| GPT-3.5<sup>2</sup> \| GPT-4<sup>2</sup> \| GPT-4o \| Jellyfish-13B \| Jellyfish-7B \| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
	\| Entity Matching \| Seen \| Fodors-Zagats \| 100 \| 100 \| 100 \| \| 100 \| 100 \| 92.68 \|
	\| Entity Matching \| Seen \| Beer \| 94.37\| 96.30 \| 100 \| \| 96.77 \| 96.55\| 96.30 \|
	\| Entity Matching \| Seen \| iTunes-Amazon \| 97.06\| 96.43 \| 100 \| \| 98.11 \| 96.30\| 92.00 \|
	\| Entity Matching \| Seen \| DBLP-ACM \| 98.99\| 96.99 \| 97.44 \| \| 98.98 \| 98.88\| 98.76 \|
	\| Entity Matching \| Seen \| DBLP-GoogleScholar \| 95.60\| 76.12 \| 91.87 \| \| 98.51 \| 95.15\| 93.20 \|
	\| Entity Matching \| Seen \| Amazon-Google \| 75.58\| 66.53 \| 74.21 \| 70.91 \| 81.34 \| 80.83 \| 74.49 \|
	\| Entity Matching \| Unseen \| Walmart-Amazon \| 86.76\| 86.17 \| 90.27 \| \| 89.42 \| 85.64 \| 89.97 \|
	\| Entity Matching \| Unseen \| Abt-Buy \| 89.33 \| -- \| 92.77 \| \| 89.58 \| 82.38 \| 92.54 \|
	\| Data Imputation \| Seen \| Restaurant \| 77.20\| 94.19 \| 97.67 \| \| 94.19 \| 88.37 \| 87.21 \|
	\| Data Imputation \| Seen \| Buy \| 96.50\| 98.46 \| 100 \| \| 100 \| 96.62 \| 92.31 \|
	\| Data Imputation \| Unseen \| Filpkart \| 68.00 \| -- \| 89.94 \| \| 81.68 \| 79.44\| 90.17 \|
	\| Data Imputation \| Unseen \| Phone \| 86.70\| -- \| 90.79 \| \| 87.21 \| 85.00\| 83.92 \|
	\| Error Detection \| Seen \| Hosptial \| 94.40\| 90.74 \| 90.74 \| 44.76 \| 95.59 \| 96.27 \| 80.72\|
	\| Error Detection \| Seen \| Adult \| 99.10\| 92.01 \| 92.01 \| 83.58 \| 99.33 \| 91.96 \| 81.72\|
	\| Error Detection \| Unseen \| Flights \| 81.00 \| -- \| 83.48 \| 66.01 \| 82.52 \| 66.92 \| 75.18 \|
	\| Error Detection \| Unseen \| Rayyan \| 79.00\| -- \| 81.95 \| 68.53 \| 90.65 \| 69.82 \| 91.54 \|
	\| Schema Matching \| Seen \| Sythea \| 38.50\| 57.14 \| 66.67 \| 6.56 \| 36.36 \| 44.44 \| 27.27 \|
	\| Schema Matching \| Seen \| MIMIC \| 20.00\| -- \| 40.00 \| 29.41 \| 40.00 \| 40.00 \| 34.04\|
	\| Schema Matching \| Unseen \| CMS \| 50.00\| -- \| 19.35 \| 22.22 \| 59.29 \| 13.79 \| 56.72\|

	_For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
	_Accuracy as the metric for data imputation and the F1 score for other tasks._

	\| Task \| Type \| Dataset \| Best of non-LLM \| GPT-3 \| GPT-3.5 \| GPT-4 \| GPT-4o \| Table-GPT \| Jellyfish-7B \| Jellyfish-8B \| Jellyfish-13B \|
	\|------\|--------\|----------------------\|-----------------\|-------\|---------\|-------\|--------\|-----------\|--------------\|--------------\|---------------\|
	\| Error Detection \| Seen \| Adult \| 99.10 \| 99.10 \| 92.01 \| 92.01 \| 83.58 \| -- \| 77.40 \| 73.74 \| *99.33 \|
	\| \| \| Hospital \| 94.40 \| *97.80 \| 90.74 \| 90.74 \| 44.76 \| -- \| 94.51 \| 93.40 \| 95.59 \|
	\| \| Unseen \| Flights \| 81.00 \| -- \| -- \| *83.48 \| 66.01 \| -- \| 69.15 \| 66.21 \| 82.52 \|
	\| \| \| Rayyan \| 79.00 \| -- \| -- \| 81.95 \| 68.53 \| -- \| 75.07 \| 81.06 \| *90.65 \|
	\| Data Imputation \| Seen \| Buy \| 96.50 \| 98.50 \| 98.46 \| 100 \| 100 \| -- \| 98.46 \| 98.46 \| **100 \|
	\| \| \| Restaurant \| 77.20 \| 88.40 \| 94.19 \| *97.67 \| 90.70 \| -- \| 89.53 \| 87.21 \| 89.53 \|
	\| \| Unseen \| Flipkart \| 68.00 \| -- \| -- \| *89.94 \| 83.20 \| -- \| 87.14 \| 87.48 \| 81.68 \|
	\| \| \| Phone \| 86.70 \| -- \| -- \| *90.79 \| 86.78 \| -- \| 86.52 \| 85.68 \| 87.21 \|
	\| Schema Matching \| Seen \| MIMIC-III \| 20.00 \| -- \| -- \| 40.00 \| 29.41 \| -- \| *53.33 \| 45.45 \| 40.00 \|
	\| \| \| Synthea \| 38.50 \| 45.20 \| 57.14 \| *66.67 \| 6.56 \| -- \| 55.56 \| 47.06 \| 56.00 \|
	\| \| Unseen \| CMS \| 50.00 \| -- \| -- \| 19.35 \| 22.22 \| -- \| 42.86 \| 38.10 \| *59.29 \|
	\| Entity Matching \| Seen \| Amazon-Google \| 75.58 \| 63.50 \| 66.50 \| 74.21 \| 70.91 \| 70.10 \| *81.69 \| 81.42 \| 81.34 \|
	\| \| \| Beer \| 94.37 \| 100 \| 96.30 \| 100 \| 90.32 \| 96.30 \| 100.00 \| 100.00 \| 96.77 \|
	\| \| \| DBLP-ACM \| *98.99 \| 96.60 \| 96.99 \| 97.44 \| 95.87 \| 93.80 \| 98.65 \| 98.77 \| 98.98 \|
	\| \| \| DBLP-GoogleScholar \| 95.70 \| 83.80 \| 76.12 \| 91.87 \| 90.45 \| 92.40 \| 94.88 \| 95.03 \| *98.51 \|
	\| \| \| Fodors-Zagats \| 100 \| 100 \| 100 \| 100 \| 93.62 \| 100 \| 100 \| 100 \| 100 \|
	\| \| \| iTunes-Amazon \| 97.06 \| 98.20 \| 96.40 \| *100 \| 98.18 \| 94.30 \| 96.30 \| 96.30 \| 98.11 \|
	\| \| Unseen \| Abt-Buy \| 89.33 \| -- \| -- \| *92.77 \| 78.73 \| -- \| 86.06 \| 88.84 \| 89.58 \|
	\| \| \| Walmart-Amazon \| 86.89 \| 87.00 \| 86.17 \| *90.27 \| 79.19 \| 82.40 \| 84.91 \| 85.24 \| 89.42 \|
	\| Avg \| \| \| 80.44 \| - \| - \| 84.17 \| 72.58 \| - \| 82.74 \| 81.55 \| *86.02 \|

	## Performance on unseen tasks

	### Column Type Annotation

	\| Dataset \| RoBERTa (159 shots)<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 \| Jellfish-13B\| Jellyfish-7B \| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----\|----\|
	\| SOTAB \| 79.20 \| 89.47 \| 91.55 \| 82.00 \| 80.89 \| 67.21\|

	_Few-shot is disabled for Jellyfish-13B._

	1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745)

	### Attribute Value Extraction

	\| Dataset \|Stable Beluga 2 70B<sup>1</sup> \| SOLAR 70B<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 <sup>1</sup>\| Jellfish-13B \| Jellyfish-7B\| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ----\| ----\|
	\| AE-110k \| 52.10 \| 49.20 \| 61.30 \| 55.50 \| 58.12 \| 76.85\| 69.78\|
	\| OA-Mine \| 50.80 \| 55.20 \| 62.70 \| 68.90 \| 55.96 \| 76.04\| 78.83\|


	## Prompt Template
	```
	[INST]:

	<prompt> (without the <>)

	[\INST]]
	```