File size: 12,705 Bytes
413de8f f33486d 442879a f33486d 8af9656 f33486d 413de8f d3f1e50 4652695 d3f1e50 e5cbd9e 413de8f e5cbd9e 413de8f e5cbd9e 413de8f f14dba2 413de8f fc3331d 413de8f 442879a 413de8f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
---
pipeline_tag: summarization
language:
- ko
tags:
- T5
---
# t5-base-korean-summarization
This is [T5](https://huggingface.co/docs/transformers/model_doc/t5) model for korean text summarization.
Finetuned based on ['paust/pko-t5-base'](https://huggingface.co/paust/pko-t5-base) model.
Finetuned with 3 datasets. Specifically, it is described below.
- [Korean Paper Summarization Dataset(λ
Όλ¬Έμλ£ μμ½)](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=90)
- [Korean Book Summarization Dataset(λμμλ£ μμ½)](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=93)
- [Korean Summary statement and Report Generation Dataset(μμ½λ¬Έ λ° λ ν¬νΈ μμ± λ°μ΄ν°)](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=90)
# Usage (HuggingFace Transformers)
```python
import nltk
nltk.download('punkt')
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained('eenzeenee/t5-base-korean-summarization')
tokenizer = AutoTokenizer.from_pretrained('eenzeenee/t5-base-korean-summarization')
prefix = "summarize: "
sample = """
μλ
νμΈμ? μ°λ¦¬ (2νλ
)/(μ΄ νλ
) μΉκ΅¬λ€ μ°λ¦¬ μΉκ΅¬λ€ νκ΅μ κ°μ μ§μ§ (2νλ
)/(μ΄ νλ
) μ΄ λκ³ μΆμλλ° νκ΅μ λͺ» κ°κ³ μμ΄μ λ΅λ΅νμ£ ?
κ·Έλλ μ°λ¦¬ μΉκ΅¬λ€μ μμ κ³Ό 건κ°μ΄ μ΅μ°μ μ΄λκΉμ μ€λλΆν° μ μλμ΄λ λ§€μΌ λ§€μΌ κ΅μ΄ μ¬νμ λ λ보λλ‘ ν΄μ.
μ΄/ μκ°μ΄ λ²μ¨ μ΄λ κ² λλμ? λ¦μμ΄μ. λ¦μμ΄μ. 빨리 κ΅μ΄ μ¬νμ λ λμΌ λΌμ.
κ·Έλ°λ° μ΄/ κ΅μ΄μ¬νμ λ λκΈ° μ μ μ°λ¦¬κ° μ€λΉλ¬Όμ μ±κ²¨μΌ λκ² μ£ ? κ΅μ΄ μ¬νμ λ λ μ€λΉλ¬Ό, κ΅μμ μ΄λ»κ² λ°μ μ μλμ§ μ μλμ΄ μ€λͺ
μ ν΄μ€κ²μ.
(EBS)/(μ΄λΉμμ€) μ΄λ±μ κ²μν΄μ λ€μ΄κ°λ©΄μ 첫νλ©΄μ΄ μ΄λ κ² λμμ.
μ/ κ·Έλ¬λ©΄μ μ¬κΈ° (X)/(μμ€) λλ¬μ£Ό(κ³ μ)/(ꡬμ). μ κΈ° (λκ·ΈλΌλ―Έ)/(λ₯κ·ΈλΌλ―Έ) (EBS)/(μ΄λΉμμ€) (2μ£Ό)/(μ΄ μ£Ό) λΌμ΄λΈνΉκ°μ΄λΌκ³ λμ΄μμ£ ?
κ±°κΈ°λ₯Ό λ°λ‘ κ°κΈ°λ₯Ό λλ¦
λλ€. μ/ (λλ₯΄λ©΄μ)/(λλ₯΄λ©΄μ). μ΄λ»κ² λλ? b/ λ°μΌλ‘ λ΄λ €μ λ΄λ €μ λ΄λ €μ μ λ΄λ €μ.
μ°λ¦¬ λͺ νλ
μ΄μ£ ? μ/ (2νλ
)/(μ΄ νλ
) μ΄μ£ (2νλ
)/(μ΄ νλ
)μ λ¬΄μ¨ κ³Όλͺ©? κ΅μ΄.
μ΄λ²μ£Όλ (1μ£Ό)/(μΌ μ£Ό) μ°¨λκΉμ μ¬κΈ° κ΅μ. λ€μμ£Όλ μ¬κΈ°μ λ€μ΄μ λ°μΌλ©΄ λΌμ.
μ΄ κ΅μμ ν΄λ¦μ νλ©΄, μ§μ/. μ΄λ κ² κ΅μ¬κ° λμ΅λλ€ .μ΄ κ΅μμ (λ€μ΄)/(λ°μ΄)λ°μμ μ°λ¦¬ κ΅μ΄μ¬νμ λ λ μκ° μμ΄μ.
κ·ΈλΌ μ°λ¦¬ μ§μ§λ‘ κ΅μ΄ μ¬νμ νλ² λ λ보λλ‘ ν΄μ? κ΅μ΄μ¬ν μΆλ°. μ/ (1λ¨μ)/(μΌ λ¨μ) μ λͺ©μ΄ λκ°μ? νλ² μ°Ύμλ΄μ.
μλ₯Ό μ¦κ²¨μ μμ. κ·Έλ₯ μλ₯Ό μ½μ΄μ κ° μλμμ. μλ₯Ό μ¦κ²¨μΌ λΌμ μ¦κ²¨μΌ λΌ. μ΄λ»κ² μ¦κΈΈκΉ? μΌλ¨μ λ΄λ΄ μλ₯Ό μ¦κΈ°λ λ°©λ²μ λν΄μ 곡λΆλ₯Ό ν 건λ°μ.
κ·ΈλΌ μ€λμμ μ΄λ»κ² μ¦κΈΈκΉμ? μ€λ 곡λΆν λ΄μ©μμ μλ₯Ό μ¬λ¬ κ°μ§ λ°©λ²μΌλ‘ μ½κΈ°λ₯Ό 곡λΆν κ²λλ€.
μ΄λ»κ² μ¬λ¬κ°μ§ λ°©λ²μΌλ‘ μ½μκΉ μ°λ¦¬ 곡λΆν΄ 보λλ‘ ν΄μ. μ€λμ μ λμλΌ μ§μ/! μκ° λμμ΅λλ€ μμ μ λͺ©μ΄ λκ°μ? λ€ν° λ μ΄μμ λ€ν° λ .
λꡬλ λ€νλ λμμ΄λ λ€νλ μΈλλ μΉκ΅¬λ? λꡬλ λ€νλμ§ μ μλμ΄ μλ₯Ό μ½μ΄ μ€ ν
λκΉ νλ² μκ°μ ν΄λ³΄λλ‘ ν΄μ."""
inputs = [prefix + sample]
inputs = tokenizer(inputs, max_length=512, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=3, do_sample=True, min_length=10, max_length=64)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
result = nltk.sent_tokenize(decoded_output.strip())[0]
print('RESULT >>', result)
RESULT >> κ΅μ΄ μ¬νμ λ λκΈ° μ μ κ΅μ΄ μ¬νμ λ λ μ€λΉλ¬Όκ³Ό κ΅μμ μ΄λ»κ² λ°μ μ μλμ§ μ μλμ΄ μ€λͺ
ν΄ μ€λ€.
```
# Evalutation Result
- Korean Paper Summarization Dataset(λ
Όλ¬Έμλ£ μμ½)
```
ROUGE-2-R 0.09868624890432466
ROUGE-2-P 0.9666714545849712
ROUGE-2-F 0.17250881441169427
```
- Korean Book Summarization Dataset(λμμλ£ μμ½)
```
ROUGE-2-R 0.1575686156943213
ROUGE-2-P 0.9718318136896944
ROUGE-2-F 0.26548116834852586
```
- Korean Summary statement and Report Generation Dataset(μμ½λ¬Έ λ° λ ν¬νΈ μμ± λ°μ΄ν°)
```
ROUGE-2-R 0.0987891733555808
ROUGE-2-P 0.9276946867981899
ROUGE-2-F 0.17726493110448185
```
# Training
The model was trained with the parameters:
### training arguments
```
Seq2SeqTrainingArguments(
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
auto_find_batch_size=False,
weight_decay=0.01,
learning_rate=4e-05,
lr_scheduler_type=linear,
num_train_epochs=3,
fp16=True)
```
# Model Architecture
```
T5ForConditionalGeneration(
(shared): Embedding(50358, 768)
(encoder): T5Stack(
(embed_tokens): Embedding(50358, 768)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
(relative_attention_bias): Embedding(32, 12)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1~11): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(decoder): T5Stack(
(embed_tokens): Embedding(50358, 768)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
(relative_attention_bias): Embedding(32, 12)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1~11): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(lm_head): Linear(in_features=768, out_features=50358, bias=False)
)
```
## Citation
- Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020): 1-67.
|