ahassoun's picture
Upload 3018 files
ee6e328

A newer version of the Gradio SDK is available: 5.23.3

Upgrade

LLaMA [[llama]]

๊ฐœ์š” [[overview]]

LLaMA ๋ชจ๋ธ์€ Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothรฉe Lacroix, Baptiste Roziรจre, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample์— ์˜ํ•ด ์ œ์•ˆ๋œ LLaMA: Open and Efficient Foundation Language Models์—์„œ ์†Œ๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ 7B์—์„œ 65B๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ๊ธฐ์ดˆ ์–ธ์–ด ๋ชจ๋ธ์„ ๋ชจ์•„๋†“์€ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์˜ ์ดˆ๋ก์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

"LLaMA๋Š” 7B์—์„œ 65B๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ๊ฐ€์ง„ ๊ธฐ์ดˆ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋ชจ์Œ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ˆ˜์กฐ ๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œ์ผฐ๊ณ , ๊ณต๊ฐœ์ ์œผ๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ์…‹๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํŠนํžˆ, LLaMA-13B ๋ชจ๋ธ์€ ๋Œ€๋ถ€๋ถ„์˜ ๋ฒค์น˜๋งˆํฌ์—์„œ GPT-3 (175B)๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋ฉฐ, LLaMA-65B๋Š” ์ตœ๊ณ  ์ˆ˜์ค€์˜ ๋ชจ๋ธ์ธ Chinchilla-70B์™€ PaLM-540B์— ๋ฒ„๊ธˆ๊ฐ€๋Š” ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ชจ๋“  ๋ชจ๋ธ์„ ์—ฐ๊ตฌ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค."

ํŒ:

  • LLaMA ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋Š” ์ด ์–‘์‹์„ ์ž‘์„ฑํ•˜์—ฌ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฐ€์ค‘์น˜๋ฅผ ๋‹ค์šด๋กœ๋“œํ•œ ํ›„์—๋Š” ์ด๋ฅผ ๋ณ€ํ™˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Hugging Face Transformers ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๋ณ€ํ™˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด ์•„๋ž˜์˜ ์˜ˆ์‹œ ๋ช…๋ น์–ด๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”:
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
  • ๋ณ€ํ™˜์„ ํ•˜์˜€๋‹ค๋ฉด ๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ €๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
from transformers import LlamaForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained("/output/path")
model = LlamaForCausalLM.from_pretrained("/output/path")

์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ชจ๋ธ์„ float16 ์ •๋ฐ€๋„๋กœ ์ „๋ถ€ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์„ ๋งŒํผ์˜ ์ถฉ๋ถ„ํ•œ CPU RAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. (๊ฐ€์žฅ ํฐ ๋ฒ„์ „์˜ ๋ชจ๋ธ์ด ์—ฌ๋Ÿฌ ์ฒดํฌํฌ์ธํŠธ๋กœ ๋‚˜๋‰˜์–ด ์žˆ๋”๋ผ๋„, ๊ฐ ์ฒดํฌํฌ์ธํŠธ๋Š” ๋ชจ๋ธ์˜ ๊ฐ ๊ฐ€์ค‘์น˜์˜ ์ผ๋ถ€๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋“  ์ฒดํฌํฌ์ธํŠธ๋ฅผ RAM์— ๋กœ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค) 65B ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ์ด 130GB์˜ RAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

  • LLaMA ํ† ํฌ๋‚˜์ด์ €๋Š” sentencepiece๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” BPE ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. sentencepiece์˜ ํŠน์ง• ์ค‘ ํ•˜๋‚˜๋Š” ์‹œํ€€์Šค๋ฅผ ๋””์ฝ”๋”ฉํ•  ๋•Œ ์ฒซ ํ† ํฐ์ด ๋‹จ์–ด์˜ ์‹œ์ž‘์ด๋ผ๋ฉด (์˜ˆ๋ฅผ ๋“ค์–ด "Banana"), ํ† ํฌ๋‚˜์ด์ €๋Š” ๋ฌธ์ž์—ด ์•ž์— ๊ณต๋ฐฑ์„ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์€ BlackSamorez์˜ ๊ธฐ์—ฌ์™€ ํ•จ๊ป˜, zphang์— ์˜ํ•ด ์ œ๊ณต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Hugging Face์—์„œ์˜ ๊ตฌํ˜„ ์ฝ”๋“œ๋Š” GPT-NeoX๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ ์—ฌ๊ธฐ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ๊ณ , ์ €์ž์˜ ์ฝ”๋“œ ์›๋ณธ์€ ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์›๋ž˜ LLaMA ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ Meta AI์—์„œ ๋ช‡ ๊ฐ€์ง€ ํ›„์† ์ž‘์—…์„ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค:

  • Llama2: Llama2๋Š” ๊ตฌ์กฐ์ ์ธ ๋ช‡ ๊ฐ€์ง€ ์ˆ˜์ •(Grouped Query Attention)์„ ํ†ตํ•ด ๊ฐœ์„ ๋œ ๋ฒ„์ „์ด๋ฉฐ, 2์กฐ ๊ฐœ์˜ ํ† ํฐ์œผ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ์ด ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. Llama2์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ด ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

๋ฆฌ์†Œ์Šค [[resources]]

LLaMA๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  Hugging Face ๋ฐ ์ปค๋ฎค๋‹ˆํ‹ฐ(๐ŸŒŽ๋กœ ํ‘œ์‹œ)์˜ ๊ณต์‹ ์ž๋ฃŒ ๋ชฉ๋ก์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ์ž๋ฃŒ๋ฅผ ์ œ์ถœํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด Pull Request๋ฅผ ์˜ฌ๋ ค์ฃผ์„ธ์š”! ์ถ”๊ฐ€ํ•  ์ž๋ฃŒ๋Š” ๊ธฐ์กด์˜ ์ž๋ฃŒ์™€ ์ค‘๋ณต๋˜์ง€ ์•Š๊ณ  ์ƒˆ๋กœ์šด ๋‚ด์šฉ์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

  • LLaMA ๋ชจ๋ธ์„ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ์ž‘์—…์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋กฌํ”„ํŠธ ํŠœ๋‹ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋…ธํŠธ๋ถ ๐ŸŒŽ

โš—๏ธ ์ตœ์ ํ™”

  • ์ œํ•œ๋œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฐ€์ง„ GPU์—์„œ xturing ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LLaMA ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋…ธํŠธ๋ถ ๐ŸŒŽ

โšก๏ธ ์ถ”๋ก 

  • ๐Ÿค— PEFT ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ PeftModel์„ ์‚ฌ์šฉํ•˜์—ฌ LLaMA ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋…ธํŠธ๋ถ ๐ŸŒŽ
  • LangChain์„ ์‚ฌ์šฉํ•˜์—ฌ PEFT ์–ด๋Œ‘ํ„ฐ LLaMA ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋…ธํŠธ๋ถ ๐ŸŒŽ

๐Ÿš€ ๋ฐฐํฌ

  • ๐Ÿค— PEFT ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ์‚ฌ์šฉ์ž ์นœํ™”์ ์ธ UI๋กœ LLaMA ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋…ธํŠธ๋ถ ๐ŸŒŽ
  • Amazon SageMaker์—์„œ ํ…์ŠคํŠธ ์ƒ์„ฑ์„ ์œ„ํ•ด Open-LLaMA ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋…ธํŠธ๋ถ ๐ŸŒŽ

LlamaConfig [[llamaconfig]]

[[autodoc]] LlamaConfig

LlamaTokenizer [[llamatokenizer]]

[[autodoc]] LlamaTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary

LlamaTokenizerFast [[llamatokenizerfast]]

[[autodoc]] LlamaTokenizerFast - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - update_post_processor - save_vocabulary

LlamaModel [[llamamodel]]

[[autodoc]] LlamaModel - forward

LlamaForCausalLM [[llamaforcausallm]]

[[autodoc]] LlamaForCausalLM - forward

LlamaForSequenceClassification [[llamaforsequenceclassification]]

[[autodoc]] LlamaForSequenceClassification - forward