Spaces:
Running
Running
Update lora_clm_with_additional_tokens.ipynb (#7)
Browse files- Update lora_clm_with_additional_tokens.ipynb (fedc56bcbdf2e1afa6750f530fb7f899a1dba71d)
Co-authored-by: Jainish Patel <[email protected]>
lora_clm_with_additional_tokens.ipynb
CHANGED
@@ -9,7 +9,7 @@
|
|
9 |
"\n",
|
10 |
"In this example, we will learn how to train a LoRA model when adding new tokens to the tokenizer and model. \n",
|
11 |
"This is a common usecase when doing the following:\n",
|
12 |
-
"1. Instruction finetuning with new tokens
|
13 |
"2. Finetuning on a specific language wherein language specific tokens are added, e.g., korean tokens being added to vocabulary for finetuning LLM on Korean datasets.\n",
|
14 |
"3. Instruction finetuning to return outputs in certain format to enable agent behaviour new tokens such as `<|FUNCTIONS|>`, `<|BROWSE|>`, `<|TEXT2IMAGE|>`, `<|ASR|>`, `<|TTS|>`, `<|GENERATECODE|>`, `<|RAG|>`.\n",
|
15 |
"\n",
|
|
|
9 |
"\n",
|
10 |
"In this example, we will learn how to train a LoRA model when adding new tokens to the tokenizer and model. \n",
|
11 |
"This is a common usecase when doing the following:\n",
|
12 |
+
"1. Instruction finetuning with new tokens being added such as `<|user|>`, `<|assistant|>`, `<|system|>`, `</s>`, `<s>` to properly format the conversations\n",
|
13 |
"2. Finetuning on a specific language wherein language specific tokens are added, e.g., korean tokens being added to vocabulary for finetuning LLM on Korean datasets.\n",
|
14 |
"3. Instruction finetuning to return outputs in certain format to enable agent behaviour new tokens such as `<|FUNCTIONS|>`, `<|BROWSE|>`, `<|TEXT2IMAGE|>`, `<|ASR|>`, `<|TTS|>`, `<|GENERATECODE|>`, `<|RAG|>`.\n",
|
15 |
"\n",
|