egpt-1.3b-preview / README.md
dalgarak's picture
Update README.md
31d1b60 verified
|
raw
history blame
12.3 kB
metadata
language:
  - ko
library_name: transformers

EAGLE: ETRI's Advanced-lightweight Generative Language Engine

(๊ณผ๊ฑฐ์— eGPT๋กœ ๋ถˆ๋ ธ์œผ๋ฉฐ, 2024.11.14 ์— ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ถ”ํ›„ ๋ฆด๋ฆฌ์ฆˆ๋˜๋Š” ๋ชจ๋ธ์˜ prefix๋Š” egpt- ๋Œ€์‹  eagle-๋กœ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค.)

๋ณธ ๋ชจ๋ธ์€ ์‚ฌ์ „ํ•™์Šต๋งŒ ์ˆ˜ํ–‰๋œ ๋ชจ๋ธ์ด๋ฉฐ, ๋ณ„๋„์˜ Instruction Tuning ๋“ฑ์ด ์ ์šฉ๋˜์ง€ ์•Š์€ ๊ธฐ์ดˆ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ฑ—๋ด‡ ์Šคํƒ€์ผ์˜ ์ž…์ถœ๋ ฅ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ, ๋ณ„๋„์˜ ๋ฏธ์„ธ์กฐ์ •์„ ๋ฐ˜๋“œ์‹œ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์ •๋ณด

1.3B Decoder-only, Causal ์–ธ์–ด๋ชจ๋ธ. ์ˆ˜ํ•™, ์ •๋Ÿ‰ ์ถ”๋ก ์„ ๋น„๋กฏํ•œ STEM ๋ถ„์•ผ์— ํŠนํ™”๋œ ์†Œ๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์„ ์ง€ํ–ฅํ•ฉ๋‹ˆ๋‹ค. ๋ฒ”์šฉ ์–ธ์–ด๋ชจ๋ธ์˜ ์—ญํ• ์„ ๋ชฉํ‘œ๋กœํ•˜์ง€๋Š” ์•Š๊ธฐ์—, ํ†ต์ƒ์˜ ์ดํ•ด ๊ด€๋ จ ๋ฒ”์šฉ ํƒœ์Šคํฌ ํ‰๊ฐ€(e.g. hellaswag, sentineg ๋“ฑ)์—๋Š” ๋‚ฎ์€ ์„ฑ๋Šฅ์ด ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ ๋ฐ ํ•™์Šต ๋ฐฉ๋ฒ• ์ˆ˜์ •, ๊ฐœ์„ ์œผ๋กœ ์ธํ•ด ๋ณธ ๋ชจ๋ธ์€ ๋น„์ •๊ธฐ์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ฏธ๋ฆฌ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Tokenizer๋Š” LLaMa์˜ ๊ตฌ์„ฑ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ byte-fallbacked BPE + digit ๋ถ„๋ฆฌ ๊ตฌ์„ฑ์„ ๊ฐ€์ง€๋‚˜, BOS/EOS(e.g. <s>,</s>) ํ† ํฐ์ด ๋ชจ๋‘ EOS(</s>)๋กœ ํ†ต์ผ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ† ํฌ๋‚˜์ด์ € ์„ค์ •์—์„œ PAD ํ† ํฐ์€ ๋ณ„๋„๋กœ ์ง€์ •๋˜์–ด ์žˆ์ง€ ์•Š์œผ๋‚˜, Byte-level BPE์˜ ํŠน์„ฑ์ƒ <unk> ์‹ฌ๋ณผ์ด ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ, ๋ฏธ์„ธ์กฐ์ • ๋‹จ๊ณ„์—์„œ๋Š” <unk> ํ† ํฐ์„ PAD ํ† ํฐ์œผ๋กœ ์ง€์ •ํ•˜์—ฌ ํ™œ์šฉํ•  ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. EleutherAI/gptneox ์•„ํ‚คํ…์ณ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, A100 80GB PCIE * 8์žฅ์—์„œ ์•ฝ 12์ฃผ๊ฐ„ ํ•™์Šต(4์ฃผ์”ฉ v1, v2, v3 ์ง€์† ์‚ฌ์ „ ํ•™์Šต; ์•ฝ 500B tokens ํ•™์Šต)ํ•˜์—ฌ ํš๋“๋œ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

ํ†ต์ง€์‚ฌํ•ญ/Acknowledgement

  • ์ด ๋ชจ๋ธ์€ 2023๋…„๋„ ์ •๋ถ€(๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€)์˜ ์žฌ์›์œผ๋กœ ์ •๋ณดํ†ต์‹ ๊ธฐํšํ‰๊ฐ€์›์˜ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ์—ฐ๊ตฌ์ž„ (RS-2023-00216011, ์‚ฌ๋žŒ์ฒ˜๋Ÿผ ๊ฐœ๋…์ ์œผ๋กœ ์ดํ•ด/์ถ”๋ก ์ด ๊ฐ€๋Šฅํ•œ ๋ณตํ•ฉ์ธ๊ณต์ง€๋Šฅ ์›์ฒœ๊ธฐ์ˆ  ์—ฐ๊ตฌ)
  • This work was supported by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (RS-2023-00216011, Development of artificial complex intelligence for conceptually understanding and inferring like human)

์ œํ•œ์  ๋ชจ๋ธ ์ ‘๊ทผ ๋ฐ, ๋ชจ๋ธ ์ ‘๊ทผ ํ—ˆ๊ฐ€์™€ ๊ด€๋ จํ•œ ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘ ๋ฐ ์‚ฌ์šฉ ์•ˆ๋‚ด/Information on Collection and Use of Personal Information for Gated Model Access

๋ณธ ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์™€ ๊ต์œก ๋ชฉ์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉ ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ˜„์žฌ ์ œํ•œ์  ๊ณต๊ฐœ ์ƒํƒœ๋กœ, ๋ณธ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋‹ค์šด๋กœ๋“œ์—๋Š” ๋‹ด๋‹น์ž ์‚ฌ์ „ ์Šน์ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์ „ ์Šน์ธ๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์˜์‚ฌํ•ญ์€ ๋ณ„๋„์˜ ๋ฉ”์ผ(jhshin82 at etri.re.kr)๋กœ ์š”์ฒญ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

๋ณธ ๋ชจ๋ธ๊ณผ ๊ด€๋ จํ•ด ์‚ฌํšŒ์ , ๋ฒ•์  ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ์„ ์ œํ•œํ•˜๊ณ , ๋ฐฐํฌ๋ฅผ ์ฒ ํšŒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ชจ๋ธ ์ ‘๊ทผ ํ—ˆ๊ฐ€์— ์‚ฌ์šฉ๋œ ์ด๋ฉ”์ผ ์ฃผ์†Œ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆ˜์ง‘, ๋ณด์œ , ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘๋™์˜/Concent to collection of Personal Information

๋ณธ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ๊ณผ ๊ด€๋ จ, ๋ฐฐํฌ/์‚ฌ์šฉ ์ œํ•œ/์ฒ ํšŒ, ๊ทธ ์™ธ ์‚ฌ์šฉ์ž์˜ ์ด์ต์— ๊ด€๊ณ„๋œ ๋ผ์ด์„ ์Šค ๋ณ€๊ฒฝ ์‹œ ์ด๋ฅผ ํ†ต์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐœ์ธ์ •๋ณด๋ฅผ ์ˆ˜์ง‘, ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜์ง‘ ๋ชฉ์  ์ˆ˜์ง‘ ํ•ญ๋ชฉ ๋ณด์œ , ์ด์šฉ๊ธฐ๊ฐ„
๋ชจ๋ธ์˜ ์‚ฌ์šฉ์ œํ•œ/์ฒ ํšŒ ์š”์ฒญ ๋ชฉ์  ์ด๋ฉ”์ผ ์ฃผ์†Œ, huggingface hub ID ๋ณธ ๋ชจ๋ธ์˜ ๊ณต๊ฐœ ๊ธฐ๊ฐ„ ๋ฐ ์ด์šฉ ๋ชฉ์  ๋‹ฌ์„ฑ ์‹œ
๋ชจ๋ธ์˜ ์‚ฌ์šฉ ๋ผ์ด์„ ์Šค ๋“ฑ ๋ณ€๊ฒฝ ์•ˆ๋‚ด ์ด๋ฉ”์ผ ์ฃผ์†Œ, huggingface hub ID ๋ณธ ๋ชจ๋ธ์˜ ๊ณต๊ฐœ ๊ธฐ๊ฐ„ ๋ฐ ์ด์šฉ ๋ชฉ์  ๋‹ฌ์„ฑ ์‹œ

๋ณธ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ ‘๊ทผ ์š”์ฒญ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๋ชจ๋ธ์— ์ ‘๊ทผํ•˜์‹œ๋Š” ํ–‰์œ„๋Š” ์•„๋ž˜์— ์•ˆ๋‚ด๋œ ์•ˆ๋‚ด์‚ฌํ•ญ, ๋ณธ ๋ชจ๋ธ์˜ ํ•œ๊ณ„, ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ์— ๋Œ€ํ•œ ์ •๋ณด, ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘/์ด์šฉ์— ๋™์˜ํ•˜์‹  ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ๋™์˜๋ฅผ ๊ฑฐ๋ถ€ํ•˜์‹ค ๊ถŒ๋ฆฌ๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋™์˜๋ฅผ ๊ฑฐ๋ถ€ํ•˜์‹ค ๊ฒฝ์šฐ ๋ชจ๋ธ ์‚ฌ์šฉ์ด ์ œํ•œ๋˜๋ฉฐ, ์ด์— ๊ด€๋ จํ•œ ์‚ฌ์šฉ, ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ฑ…์ž„์€ ์‚ฌ์šฉ์ž์—๊ฒŒ ์žˆ์Œ์„ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค. ์‚ฌ์šฉ ํ›„ ๋™์˜ ์ฒ ํšŒ, ๊ฐœ์ธ์ •๋ณด ํ๊ธฐ์— ๋Œ€ํ•œ ์‚ฌํ•ญ์€ ์ƒ๊ธฐ ์•ˆ๋‚ด๋œ ๋ฉ”์ผ ์ฃผ์†Œ ๋˜๋Š” Community tab์„ ํ†ตํ•ด์„œ ์š”์ฒญํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ์˜ ํ•œ๊ณ„, ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ๊ด€๋ จ ์ •๋ณด ์•ˆ๋‚ด

๋ณธ ๋ชจ๋ธ์˜ ๊ฐœ๋ฐœ๊ณผ ๊ด€๋ จํ•œ ๊ฐœ๋ฐœ์ž ๋ฐ ์กฐ์ง์€ ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ๋ฅผ ์ค€์ˆ˜ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด์™€ ๊ด€๋ จํ•ด AI ์—ฐ๊ตฌ์— ์‚ฌ์šฉ๋˜๋Š” ์ž…์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋‚ด ํฌํ•จ๋œ ์š•์„ค, ์Œ๋ž€, ์ •์น˜์  ๋‚ด์šฉ ๋ฐ ๊ธฐํƒ€ ๊ฑฐ์นœ ์–ธ์–ด์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์›์‹œ ์›น ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ ์ƒ ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด ํ•™์Šต๋œ ๋ณธ ์ƒ์„ฑ ์–ธ์–ด ๋ชจ๋ธ์€ ๊ฒฝ๋„๋œ ์‚ฌ์ƒ์„ ํฌํ•จํ•˜๊ฑฐ๋‚˜, ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋  ์ˆ˜ ์—†๋Š” ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ์–ธ์–ด ๋ชจ๋ธ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŠน์ • ํ”„๋กฌํ”„ํŠธ์™€ ๊ณต๊ฒฉ์ ์ธ ์ฝ˜ํ…์ธ ๊ฐ€ ๋ฐ˜ํ™˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํฌํ•จ, ๋ณธ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ/์ƒ์„ฑ ๊ฒฐ๊ณผ์™€ ๊ด€๋ จํ•œ ๋‚ด์šฉ์€ ๊ฐœ๋ฐœ์ž ๋ฐ ๊ฐœ๋ฐœ์ž๊ฐ€ ์†ํ•œ ์กฐ์ง์˜ ์‚ฌ์ƒ, ์˜๋„์™€ ์ „ํ˜€ ๊ด€๋ จ์ด ์—†์Œ์„ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค.

ํ…Œ์ŠคํŠธ์ค‘์— ๋ฐœ์ƒํ•œ ๋น„์ •์ƒ์ ์ธ ํ˜น์€ ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋˜์ง€ ์•Š๋Š” ํ…์ŠคํŠธ๊ฐ€ ์ƒ์„ฑ๋œ ๊ฒฝ์šฐ jhshin82 at etri.re.kr๋กœ (__at__์„ @๋กœ ์น˜ํ™˜) ์ถœ๋ ฅ ์œ ๋„์— ์‚ฌ์šฉ๋œ ์ž…๋ ฅ๋ฌธ(ํ”„๋กฌํ”„ํŠธ), ์‚ฌ์šฉ๋œ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ• ๋ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(์˜ˆ: top-p=0.8, temperature, repetition-penalty ๋“ฑ), ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ํ•จ๊ป˜ ๋ณด๋‚ด์ฃผ์‹œ๋ฉด, ์ด๋ฅผ ์–ต์ œํ•˜๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ์„ ๊ธฐ์šธ์ด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ‰๊ฐ€/Evaluations

์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์˜ KOBEST ํ‰๊ฐ€

ํ‰๊ฐ€๋Š” EleutherAI/lm-evaluation-harness, polyglot branch ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, KoBEST(Kim et al., 2022) ํ‰๊ฐ€์…‹์œผ๋กœ fine-tuning ์—†์ด zero-shot, 5-shot ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. (lm-evaluation-harness์˜ KOBEST ํ‰๊ฐ€๋Š” ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์–ด, ์ตœ์‹  lm-evaluation-harness(๋ฒ„์ „ 0.4.2 ์ดํ›„)๋ฅผ ํ†ตํ•œ ํ‰๊ฐ€๋ฅผ ์•„๋ž˜ ๋ณ„๋„๋กœ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค.)

Zero-shot ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1)
Polyglot-ko-1.3b 0.3552ยฑ0.0087 0.7196ยฑ0.0142 0.4013ยฑ0.0217 0.6790ยฑ0.0239 0.3276ยฑ0.0064
egpt-1.3b (23/07) 0.4903ยฑ0.0134 0.6612ยฑ0.0149 0.3925ยฑ0.0217 0.3383ยฑ0.0112 0.3280ยฑ0.0063
egpt-1.3b (23/11) 0.3969ยฑ0.0112 0.6470ยฑ0.0151 0.3746ยฑ0.0214 0.3350ยฑ0.0111 0.3297ยฑ0.0066
egpt-1.3b (24/03) 0.4034ยฑ0.0118 0.6438ยฑ0.0152 0.4150ยฑ0.0218 0.5272ยฑ0.0255 0.3294ยฑ0.0066
5-shot ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1)
Polyglot-ko-1.3b 0.4751ยฑ0.0133 0.7193ยฑ0.0142 0.3984ยฑ0.0218 0.6257ยฑ0.0244 0.4559ยฑ0.0138
egpt-1.3b (23/07) 0.4829ยฑ0.0133 0.6558ยฑ0.0150 0.3846ยฑ0.0216 0.5715ยฑ0.0249 0.5108ยฑ0.0141
egpt-1.3b (23/11) 0.4762ยฑ0.0133 0.6499ยฑ0.0151 0.3689ยฑ0.0214 0.5607ยฑ0.0249 0.4776ยฑ0.0141
egpt-1.3b (24/03) 0.4944ยฑ0.0134 0.6643ยฑ0.0149 0.3862ยฑ0.0216 0.5232ยฑ0.0251 0.4947ยฑ0.0141

LM-Evaluation-Harness 0.4.2 ๋ฒ„์ „ ์ด์ƒ(์ดํ•˜ LEH 0.4.2dev, commit id b1777c82) ์œผ๋กœ ํ‰๊ฐ€ ์‹œ, KB-SENTINEG๋Š” ๋” ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ, ๋‚˜๋จธ์ง€ 4๊ฐœ ํ‰๊ฐ€ ํ•ญ๋ชฉ์€ ๋” ๋†’์€ ์ ์ˆ˜๋กœ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. polyglot branch์˜ ํ‰๊ฐ€ ์˜ค๋ฅ˜๊ฐ€ ์ˆ˜์ •๋œ ๊ฒƒ์œผ๋กœ ๋ณด์—ฌ ์ตœ์‹  ๋ฒ„์ „์„ ํ†ตํ•ด ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์„ ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋˜๋‚˜, ํ‰๊ฐ€ ์ผ๊ด€์„ฑ์„ ์œ„ํ•ด polyglot branch์˜ ํ‰๊ฐ€ ์ ์ˆ˜๋ฅผ ๋ณ„๋„๋กœ ์œ ์ง€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Zero-shot ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1)
egpt-1.3b (23/11) - LEH v0.4.2dev ํ‰๊ฐ€ 0.4926 0.6530 0.3933 0.3350 0.3280
egpt-1.3b (24/03) - LEH v0.4.2dev ํ‰๊ฐ€ 0.4391 0.6497 0.4222 0.3733 0.3412
egpt-1.3b (24/03) - LEH polyglot branch ํ‰๊ฐ€(์ฐธ๊ณ ) 0.4034 0.6438 0.4150 0.5272 0.3294

์ „์ดํ•™์Šต ๋Šฅ๋ ฅ ํ‰๊ฐ€

MetaMathQA๋ฅผ ํ†ตํ•œ ์˜์–ด GSM8k ํ‰๊ฐ€ ์ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ: LR 8e-5, FP16, TF32 ์‚ฌ์šฉ, ์œ ํšจ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋™์ผํ•˜๊ฒŒ 128๋กœ ์„ค์ • (์˜ˆ: GPU=4 * batch size=GPU ๋‹น 8 * Gradient Accumulation=4). LR Scheduler๋Š” Cosine Decaying, Warmup ratio 0.03. no weight decay.

๋ชจ๋ธ GSM8k test ๋น„๊ณ 
polyglot-ko-1.3b 0.2160
polyglot-ko-12.8b 0.3646 LR์„ 5e-5๋กœ ์„ธํŒ…, Beomi/polyglot-ko-alpaca-12.8b์˜ hparam์„ ๋ณด๊ณ  LR์„ ๊ฒฐ์ •ํ•จ
egpt-1.3b (23/11) 0.4443
egpt-1.3b (24/03) 0.4147

์—…๋ฐ์ดํŠธ ๊ธฐ๋ก/Update log

  • (23/7/27 ๋ชจ๋ธ) ์ดˆ๊ธฐ ๋ชจ๋ธ. Polyglot 1.3b ๋Œ€๋น„ BOOLQ/WIC์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ, ๊ทธ๋ฆฌ๊ณ  COPA/HELLASWAG/SENTINEG์—์„œ ์—ด์„ธ.
  • (23/11/22 ๋ชจ๋ธ) ์œ ์‚ฌํ•œ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ, ์ผ๋ถ€ ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ํ•˜์—ฌ 23/7/27 ๋ชจ๋ธ๋กœ ๋ถ€ํ„ฐ ์ถ”๊ฐ€ ์‚ฌ์ „ํ•™์Šตํ•œ ๊ฒƒ. ์ง€์ • ๋ชฉํ‘œ๋ฅผ ์œ„ํ•œ ๋‹ค๋ฅธ ํ‰๊ฐ€ ์ฒด๊ณ„์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ(39 vs 44)์„ ๋ณด์—ฌ ์—…๋ฐ์ดํŠธ ํ•จ
  • (24/03/21 ๋ชจ๋ธ) AIHUB ๋ฐ์ดํ„ฐ์…‹, ํ•œ๊ตญ์–ด ์œ„ํ‚ค ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ด ์ถ”๊ฐ€ ์‚ฌ์ „ํ•™์Šต

์‚ฌ์ „ํ•™์Šต์— ์ฐธ์—ฌํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ •๋ณด/Datasets

์•„๋ž˜์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค:

์‚ฌ์šฉ ์š”๋ น/How to use

์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด, transformers>=4.28 ๋ฒ„์ „์—์„œ ์ถ”๋ก  ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

import sys

from transformers import (
        AutoTokenizer, AutoModelForCausalLM, GenerationConfig
        )


def load_model(mdl_path):
    tokenizer = AutoTokenizer.from_pretrained(mdl_path, use_fast=True, legacy=False,)
    # device_map ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” accelerator ๋ชจ๋“ˆ ์„ค์น˜ ํ•„์š”.
    model = AutoModelForCausalLM.from_pretrained(mdl_path, device_map="auto",
                                                 torch_dtype="auto")

    return tokenizer, model


if __name__ == '__main__':
    # FIXME: ๋ชจ๋ธ ๊ฒฝ๋กœ ์ˆ˜์ •!
    tokenizer, model = load_model("../egpt-1.3b-test-230720/")
    # print(model.hf_device_map)
    # ํ•„์š”์— ๋”ฐ๋ผ ์•„๋ž˜ ์ƒ์„ฑ ์˜ต์…˜์„ ์ œ์–ด
    gen_cfg = GenerationConfig(max_new_tokens=256, min_length=0,
                               max_time=10.0, do_sample=True,
                               top_p=0.9, epsilon_cutoff=3e-4,)

    print("** Now Ready to input from stdin.")
    for aline in sys.stdin:
        aline = aline.rstrip("\n\r\t")
        input_cond = tokenizer(aline, add_special_tokens=False, return_tensors="pt").to("cuda")
        outs = model.generate(**input_cond, generation_config=gen_cfg)
        out_str = tokenizer.batch_decode(outs, skip_special_tokens=True,
                                         clean_up_tokenization_spaces=True)
        print(">> " + ' '.join(out_str))