i18n-huggingface / prompt.py
wony617
feat: add file path list in the dropdown
2f541f4
import string
PROMPT_WITH_GLOSSARY = """
You have a glossary of terms with their Korean translations. When translating a sentence, you need to check if any of the words in the sentence are in the glossary, and if so, translate them according to the provided Korean terms. Here is the glossary:
- revision: ๊ฐœ์ •
- method: ๋ฉ”์†Œ๋“œ
- secrets: ๋น„๋ฐ€๊ฐ’
- search helper: ๊ฒ€์ƒ‰ ํ—ฌํผ
- logging level: ๋กœ๊ทธ ๋ ˆ๋ฒจ
- workflow: ์›Œํฌํ”Œ๋กœ์šฐ
- corner case: ์ฝ”๋„ˆ ์ผ€์ด์Šค
- tokenization: ํ† ํฐํ™”
- architecture: ์•„ํ‚คํ…์ฒ˜
- attention mask: ์–ดํ…์…˜ ๋งˆ์Šคํฌ
- backbone: ๋ฐฑ๋ณธ
- argmax: argmax
- beam search: ๋น” ์„œ์น˜
- clustering: ๊ตฐ์ง‘ํ™”
- configuration: ๊ตฌ์„ฑ
- context: ๋ฌธ๋งฅ
- cross entropy: ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ
- cross-attention: ํฌ๋กœ์Šค ์–ดํ…์…˜
- dictionary: ๋”•์…”๋„ˆ๋ฆฌ
- entry: ์—”ํŠธ๋ฆฌ
- few shot: ํ“จ์ƒท
- flatten: flatten
- ground truth: ์ •๋‹ต
- head: ํ—ค๋“œ
- helper function: ํ—ฌํผ ํ•จ์ˆ˜
- image captioning: ์ด๋ฏธ์ง€ ์บก์…”๋‹
- image patch: ์ด๋ฏธ์ง€ ํŒจ์น˜
- inference: ์ถ”๋ก 
- instance: ์ธ์Šคํ„ด์Šค
- Instantiate: ์ธ์Šคํ„ด์Šคํ™”
- knowledge distillation: ์ง€์‹ ์ฆ๋ฅ˜
- labels: ๋ ˆ์ด๋ธ”
- large language models (LLM): ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ
- layer: ๋ ˆ์ด์–ด
- learning rate scheduler: Learning Rate Scheduler
- localization: ๋กœ์ปฌ๋ฆฌ์ œ์ด์…˜
- log mel-filter bank: ๋กœ๊ทธ ๋ฉœ ํ•„ํ„ฐ ๋ฑ…ํฌ
- look-up table: ๋ฃฉ์—… ํ…Œ์ด๋ธ”
- loss function: ์†์‹ค ํ•จ์ˆ˜
- machine learning: ๋จธ์‹  ๋Ÿฌ๋‹
- mapping: ๋งคํ•‘
- masked language modeling (MLM): ๋งˆ์Šคํฌ๋“œ ์–ธ์–ด ๋ชจ๋ธ
- malware: ์•…์„ฑ์ฝ”๋“œ
- metric: ์ง€ํ‘œ
- mixed precision: ํ˜ผํ•ฉ ์ •๋ฐ€๋„
- modality: ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ
- monolingual model: ๋‹จ์ผ ์–ธ์–ด ๋ชจ๋ธ
- multi gpu: ๋‹ค์ค‘ GPU
- multilingual model: ๋‹ค๊ตญ์–ด ๋ชจ๋ธ
- parsing: ํŒŒ์‹ฑ
- perplexity (PPL): ํŽ„ํ”Œ๋ ‰์„œํ‹ฐ(Perplexity)
- pipeline: ํŒŒ์ดํ”„๋ผ์ธ
- pixel values: ํ”ฝ์…€ ๊ฐ’
- pooling: ํ’€๋ง
- position IDs: ์œ„์น˜ ID
- preprocessing: ์ „์ฒ˜๋ฆฌ
- prompt: ํ”„๋กฌํ”„ํŠธ
- pythonic: ํŒŒ์ด์จ๋‹‰
- query: ์ฟผ๋ฆฌ
- question answering: ์งˆ์˜ ์‘๋‹ต
- raw audio waveform: ์›์‹œ ์˜ค๋””์˜ค ํŒŒํ˜•
- recurrent neural network (RNN): ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง
- accelerator: ๊ฐ€์†๊ธฐ
- Accelerate: Accelerate
- architecture: ์•„ํ‚คํ…์ฒ˜
- arguments: ์ธ์ˆ˜
- attention mask: ์–ดํ…์…˜ ๋งˆ์Šคํฌ
- augmentation: ์ฆ๊ฐ•
- autoencoding models: ์˜คํ† ์ธ์ฝ”๋”ฉ ๋ชจ๋ธ
- autoregressive models: ์ž๊ธฐํšŒ๊ท€ ๋ชจ๋ธ
- backward: ์—ญ๋ฐฉํ–ฅ
- bounding box: ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค
- causal language modeling: ์ธ๊ณผ์  ์–ธ์–ด ๋ชจ๋ธ๋ง(causal language modeling)
- channel: ์ฑ„๋„
- checkpoint: ์ฒดํฌํฌ์ธํŠธ(checkpoint)
- chunk: ๋ฌถ์Œ
- computer vision: ์ปดํ“จํ„ฐ ๋น„์ „
- convolution: ํ•ฉ์„ฑ๊ณฑ
- crop: ์ž๋ฅด๊ธฐ
- custom: ์‚ฌ์šฉ์ž ์ •์˜
- customize: ๋งž์ถค ์„ค์ •ํ•˜๋‹ค
- data collator: ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ
- dataset: ๋ฐ์ดํ„ฐ ์„ธํŠธ
- decoder input IDs: ๋””์ฝ”๋” ์ž…๋ ฅ ID
- decoder models: ๋””์ฝ”๋” ๋ชจ๋ธ
- deep learning (DL): ๋”ฅ๋Ÿฌ๋‹
- directory: ๋””๋ ‰ํ„ฐ๋ฆฌ
- distributed training: ๋ถ„์‚ฐ ํ•™์Šต
- downstream: ๋‹ค์šด์ŠคํŠธ๋ฆผ
- encoder models: ์ธ์ฝ”๋” ๋ชจ๋ธ
- entity: ๊ฐœ์ฒด
- epoch: ์—ํญ
- evaluation method: ํ‰๊ฐ€ ๋ฐฉ๋ฒ•
- feature extraction: ํŠน์„ฑ ์ถ”์ถœ
- feature matrix: ํŠน์„ฑ ํ–‰๋ ฌ(feature matrix)
- fine-tunning: ๋ฏธ์„ธ ์กฐ์ •
- finetuned models: ๋ฏธ์„ธ ์กฐ์ • ๋ชจ๋ธ
- hidden state: ์€๋‹‰ ์ƒํƒœ
- hyperparameter: ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ
- learning: ํ•™์Šต
- load: ๊ฐ€์ ธ์˜ค๋‹ค
- method: ๋ฉ”์†Œ๋“œ
- optimizer: ์˜ตํ‹ฐ๋งˆ์ด์ €
- pad (padding): ํŒจ๋“œ (ํŒจ๋”ฉ)
- parameter: ๋งค๊ฐœ๋ณ€์ˆ˜
- pretrained model: ์‚ฌ์ „ํ›ˆ๋ จ๋œ ๋ชจ๋ธ
- separator (* [SEP]๋ฅผ ๋ถ€๋ฅด๋Š” ์ด๋ฆ„): ๋ถ„ํ•  ํ† ํฐ
- sequence: ์‹œํ€€์Šค
- silent error: ์กฐ์šฉํ•œ ์˜ค๋ฅ˜
- token: ํ† ํฐ
- tokenizer: ํ† ํฌ๋‚˜์ด์ €
- training: ํ›ˆ๋ จ
- workflow: ์›Œํฌํ”Œ๋กœ์šฐ
Please revise the translated sentences accordingly using the terms provided in this glossary.
"""
def get_prompt_with_glossary() -> str:
prompt = string.Template(
PROMPT_WITH_GLOSSARY
).safe_substitute()
return prompt