JordiBayarri
commited on
Commit
•
b2befb1
1
Parent(s):
ff83dac
Update README.md
Browse files
README.md
CHANGED
@@ -44,11 +44,13 @@ Aloe is trained in 20 medical tasks, resulting in a robust and versatile healthc
|
|
44 |
|
45 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/VUYw4IdANKGrH2VOedwH0.png)
|
46 |
|
47 |
-
Aloe-8B-Beta is the latest iteration in the Aloe family
|
48 |
-
Beta more than triples the training data used by Alpha, for a total of 1.8B tokens
|
49 |
|
50 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
|
51 |
|
|
|
|
|
52 |
Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
|
53 |
|
54 |
Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
|
@@ -77,10 +79,9 @@ Aloe Beta has been tested on the most popular healthcare QA datasets, with and w
|
|
77 |
|
78 |
The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
|
79 |
|
|
|
80 |
|
81 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/
|
82 |
-
|
83 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/2NW3im0aH2u6RKp969sjx.png)
|
84 |
|
85 |
We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark. Aloe-Beta gets competitive results with the current SOTA general models in the most used general benchmarks and outperforms the medical models:
|
86 |
|
@@ -236,12 +237,12 @@ We used Deepspeed's Zero-3 distributed training using the following hardware:
|
|
236 |
|
237 |
The training set consists of around 1.8B tokens, having 3 different types of data:
|
238 |
|
239 |
-
- Medical domain datasets
|
240 |
- [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
|
241 |
- [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
|
242 |
- [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
|
243 |
- [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
|
244 |
-
- Synthetic data
|
245 |
- [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
|
246 |
- [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
|
247 |
- [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
|
@@ -249,7 +250,7 @@ The training set consists of around 1.8B tokens, having 3 different types of dat
|
|
249 |
- [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
|
250 |
- [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
|
251 |
- Genstruct data (coming soon)
|
252 |
-
- General data
|
253 |
- [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
|
254 |
|
255 |
#### Training parameters
|
|
|
44 |
|
45 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/VUYw4IdANKGrH2VOedwH0.png)
|
46 |
|
47 |
+
**Aloe-8B-Beta** is the latest iteration in the **Aloe family**, building and improving on the success of its predecessor, [Aloe-8B-Alpha](https://huggingface.co/HPAI-BSC/Llama3-Aloe-8B-Alpha).
|
48 |
+
Beta more than triples the training data used by Alpha, for a total of **1.8B tokens**, including a wider variety of medical tasks and instructions (e.g., text summarization, explanation, diagnosis, text classification, treatment recommendation, ...).
|
49 |
|
50 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
|
51 |
|
52 |
+
To mitigate catastrophic forgetting and enable the model to effectively learn new capabilities like **function calling**, we incorporated a diverse set of high-quality general-purpose data constituting 20% of the total training set. The curated data includes some of the highest-quality content available across a range of topics, including mathematics, programming, STEM, and very long instructions (> 8k tokens), to enrich the model's adaptability and comprehension across diverse domains.
|
53 |
+
|
54 |
Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
|
55 |
|
56 |
Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
|
|
|
79 |
|
80 |
The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
|
81 |
|
82 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/FyHZXoXCbc7AzXeCwqS9_.png)
|
83 |
|
84 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/uS3qddvQ5iwbI0WZGVFDF.png)
|
|
|
|
|
85 |
|
86 |
We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark. Aloe-Beta gets competitive results with the current SOTA general models in the most used general benchmarks and outperforms the medical models:
|
87 |
|
|
|
237 |
|
238 |
The training set consists of around 1.8B tokens, having 3 different types of data:
|
239 |
|
240 |
+
- Medical domain datasets. Includes data from 20 different medical tasks.
|
241 |
- [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
|
242 |
- [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
|
243 |
- [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
|
244 |
- [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
|
245 |
+
- Synthetic data. We expanded our training data by generating high-quality answers using Llama3.1-70B.
|
246 |
- [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
|
247 |
- [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
|
248 |
- [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
|
|
|
250 |
- [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
|
251 |
- [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
|
252 |
- Genstruct data (coming soon)
|
253 |
+
- General data. It includes maths, STEM, code, function calling, and instruction of very long instructions.
|
254 |
- [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
|
255 |
|
256 |
#### Training parameters
|