sigridjineth's picture
Update README.md
0eb31f0 verified
|
raw
history blame
3.98 kB
---
library_name: transformers
tags: []
---
# FuseLLM for Korean
## Model Description
FuseLLM is a project that aims to combine the knowledge and strengths of large language models with different transformer architectures into a single model.
* The original idea is suggested by https://github.com/fanqiwan/FuseLLM
The source models used for adapting this idea into Korean in this repository are Orion (Base), OPEN-SOLAR-KO-10.7B, and Yi-Ko-6B (Sources).
### Model Architecture
The target model architecture is based on the Llama model. The fusion process involves extracting representations from the source models, aligning the token embeddings of the source models with the target model's token space, and training the target model using the aligned representations.
### Training Data
The model was trained on a dataset consisting of various text samples. For each text sample, inference was performed using the source models, and the top-k logit values output by the models for each token were collected and stored.
### Training Procedure
1. **Representation Extraction**: For each text sample in the dataset, inference is performed using the source models. The top-k logit values output by the model for each token are collected and stored.
2. **Token Alignment**: The token embeddings of the source models and the target model (Llama) are compared, and the most similar tokens are mapped to each other. This process normalizes the token representations of different models into the token space of the Llama model.
3. **Model Training**: The aligned representations are then used to train the target model.
### Training Infrastructure
The model was trained using 8 A100 GPUs provided by Sionic AI.
## Evaluation
### Evaluation Datasets
The performance of the base Orion model and the fused model (Orion + OPEN-SOLAR-KO-10.7B + Yi-Ko-6B) was evaluated on the following datasets:
- kobest_boolq
- kobest_hellaswag
- korunsmile
- nsmc
### Evaluation Results
The evaluation was conducted in both zero-shot and five-shot settings. The results are as follows:
| Dataset | Orion Base<br>(0 shot) | Fuse 0 shot<br>(Orion + OPEN-SOLAR-KO-10.7B + Yi-Ko-6B) | Orion Base<br>(5 Shot) | Fuse 5 shot<br>(Orion + OPEN-SOLAR-KO-10.7B + Yi-Ko-6B) |
|------------------|------------------------|--------------------------------------------------------|------------------------|--------------------------------------------------------|
| kobest_boolq | 0.7642 | 0.7022 | 0.9017 | 0.8924 |
| kobest_hellaswag | 0.4840 | 0.5080 | 0.5060 | 0.5080 |
| korunsmile | 0.3694 | 0.3941 | 0.3562 | 0.3570 |
| nsmc | 0.5574 | 0.5803 | 0.8692 | 0.8690 |
The results demonstrate the effectiveness of the fusion approach in combining the strengths of different language models.
## Intended Use
FuseLLM is intended to be used as a research model to explore the fusion of different language models and investigate the potential benefits of combining their knowledge and strengths.
## Limitations and Bias
As with any language model, FuseLLM may exhibit biases present in the training data. The model's performance may also be limited by the quality and diversity of the training data used. Further analysis is required to understand and mitigate potential biases and limitations.
## Acknowledgments
We would like to thank Sionic AI (https://sionic.ai) for providing the computational A100 x8 resources needed for training the FuseLLM model.