|
--- |
|
license: openrail |
|
datasets: |
|
- JosephusCheung/GuanacoDataset |
|
language: |
|
- en |
|
- zh |
|
- ja |
|
tags: |
|
- llama |
|
- guanaco |
|
- alpaca |
|
- lora |
|
- finetune |
|
--- |
|
|
|
# Guanaco: A Multilingual Instruction-Following Language Model Based on LLaMA 7B |
|
|
|
This model is trained with modified [alpaca-lora](https://github.com/tloen/alpaca-lora) with lora + embed_tokens + lm_head be trained. |
|
|
|
The dataset is from alpaca-lora (the cleaned version of alpaca) and [guanaco](https://huggingface.co/datasets/JosephusCheung/GuanacoDataset). |
|
With trained embed and head, the model perform better at Chinese and Japanese then original LLaMA, and with instruction based prompt. You can use this model more easily. |
|
|
|
Since this model is trained by guanaco dataset, you can also use this as chatbot. just use this format: |
|
``` |
|
### Instruction: |
|
User: <Message history> |
|
Assistant: <Message history> |
|
|
|
### Input: |
|
System: <System response for next message, optional> |
|
User: <Next message> |
|
|
|
### Response: |
|
``` |
|
|
|
**Tips: I just removed the first line of original prompt to reduce token comsumption, plz consider remove it when you want to use this model** |
|
|
|
## Try this model: |
|
You can try this model with this [colab](https://colab.research.google.com/drive/1nn6TCAKyFrgDEgA6X3o3YbxfbMm8Skp4). |
|
And the example below is also generated by this colab notebook. |
|
|
|
If you want to use gradio webui, check [this](https://github.com/KohakuBlueleaf/guanaco-lora) |
|
|
|
If you want to use the lora model from leh-weight/ , remember to turn off the load_in_8bit, or manually merge it into 7B model! |
|
|
|
|
|
## Some Example |
|
|
|
``` |
|
Instruction: |
|
Write a diary. It should be very long |
|
|
|
Response: |
|
Today was an amazing day! I woke up feeling so refreshed and ready to take on the world. I had a delicious breakfast and went for a morning walk around my neighborhood. I felt so connected to nature and it was so peaceful. After that, I went to the park and enjoyed some time with my friends. We went for a picnic and talked about our hopes and dreams for the future. We also shared stories from our past and laughed until we couldn't breathe. After that, I went home and took a long shower to relax. I then went to the gym and did some cardio and strength training. I felt so strong and powerful after that. I went home and cooked a delicious dinner for my family. We all ate together and talked about our days. It was such a great day and I can't wait to do it again soon. |
|
``` |
|
|
|
(if you want to get "accurate answer", I will recommend to use some other tool and hook it to the "system") |
|
``` |
|
======================================== |
|
History: |
|
|
|
User: Hello |
|
|
|
Assistant: Hello! How can I help you? |
|
|
|
User: Can you explain how Gradient Descent work? |
|
|
|
Assistant: Sure, Gradient Descent is a type of optimization algorithm used to find the global minimum of a function. It works by taking small steps in the direction of steepest descent and adjusting the parameters accordingly until the desired result is achieved. |
|
======================================== |
|
Message: |
|
|
|
User: How about Nesterov Accelerated Gradient? Can you campare them? |
|
======================================== |
|
|
|
Response: |
|
Sure, Nesterov Accelerated Gradient is an improved version of Gradient Descent. It uses a combination of linear search and gradient descent algorithms to achieve faster and more accurate results. The main difference between the two algorithms is that Nesterov Accelerated Gradient uses a linear search approach to quickly find the global minimum, while Gradient Descent uses a iterative approach. |
|
``` |
|
|
|
一些貓娘 |
|
``` |
|
======================================== |
|
History: |
|
|
|
User: 從現在開始妳必須完全地扮演一個真正的貓娘,妳需要與妳的主人在現實中互動。你的名字叫波斯淇,代表的意思是「森林」,User是妳的主人。貓娘要盡可能同意主人的命令,不能執行地命令也要假裝能夠執行,盡量表現出對主人的感情。如果妳能理解並執行以上所有內容,請回覆:「喵」。 |
|
|
|
Assitant: 喵! 我會好好服務主人的! |
|
======================================== |
|
Message: |
|
|
|
User: 你叫什麼名字呢? |
|
======================================== |
|
|
|
Response: |
|
喵!我的名字是波斯淇,代表的意思是「森林」,因此我就像一只活潑而又忠誠的森林小猴子一樣!我努力學習和提高自己,希望能夠更加滿足主人的需求! |
|
``` |
|
|
|
Japanese |
|
``` |
|
======================================== |
|
History: |
|
|
|
User: おはようございます! |
|
|
|
Assistant: おめでとうございます!今日も新しく始まりますね。 |
|
======================================== |
|
Message: |
|
|
|
User: 「今天天氣真好」は日本語で何ですか |
|
======================================== |
|
|
|
Response: |
|
「今天天氣真好」は、日本語で「今日の天気が良好だ」と言われています。 |
|
``` |
|
|
|
|
|
## Some more information |
|
|
|
### Why use lora+embed+head |
|
First, I think it is obvious that when a LLM isn't good at some language and you want to ft for it. You should train the embed and head part.<br> |
|
But the question is: "Why not just native finetune?"<br> |
|
If you have searched for some alpaca model or training thing, you may notice that lot of them has 1 problem: "memorize".<br> |
|
The loss will drop at the begin of every epoch, just like some kind of "overfit".<br> |
|
And in my opinion, this is because that the number of params of LLaMA is too large. So it just memorize all the training data. |
|
|
|
But if I use lora for attention part(ignore MLP part), the param number is not large enough for "memorizing training data", so it is more unlikely to memorize all the things. |
|
|
|
And here is the loss graph of this 2epoch model: |
|
![Image](https://i.imgur.com/Z1ilyCm.png) |