Create README.md
Browse filesUsage, Model description, Citation
README.md
ADDED
@@ -0,0 +1,199 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
pipeline_tag: summarization
|
3 |
+
language:
|
4 |
+
- ko
|
5 |
+
tags:
|
6 |
+
- T5
|
7 |
+
---
|
8 |
+
|
9 |
+
# t5-base-korean-summarization
|
10 |
+
|
11 |
+
This is [T5](https://huggingface.co/docs/transformers/model_doc/t5) model: It is an encoder-decoder model and converts all NLP problems into a text-to-text format.
|
12 |
+
|
13 |
+
# Usage (HuggingFace Transformers)
|
14 |
+
|
15 |
+
```python
|
16 |
+
import nltk
|
17 |
+
nltk.download('punkt')
|
18 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
19 |
+
|
20 |
+
sample = """
|
21 |
+
μλ
νμΈμ? μ°λ¦¬ (2νλ
)/(μ΄ νλ
) μΉκ΅¬λ€ μ°λ¦¬ μΉκ΅¬λ€ νκ΅μ κ°μ μ§μ§ (2νλ
)/(μ΄ νλ
) μ΄ λκ³ μΆμλλ° νκ΅μ λͺ» κ°κ³ μμ΄μ λ΅λ΅νμ£ ?
|
22 |
+
κ·Έλλ μ°λ¦¬ μΉκ΅¬λ€μ μμ κ³Ό 건κ°μ΄ μ΅μ°μ μ΄λκΉμ μ€λλΆν° μ μλμ΄λ λ§€μΌ λ§€μΌ κ΅μ΄ μ¬νμ λ λ보λλ‘ ν΄μ.
|
23 |
+
μ΄/ μκ°μ΄ λ²μ¨ μ΄λ κ² λλμ? λ¦μμ΄μ. λ¦μμ΄μ. 빨리 κ΅μ΄ μ¬νμ λ λμΌ λΌμ.
|
24 |
+
κ·Έλ°λ° μ΄/ κ΅μ΄μ¬νμ λ λκΈ° μ μ μ°λ¦¬κ° μ€λΉλ¬Όμ μ±κ²¨μΌ λκ² μ£ ? κ΅μ΄ μ¬νμ λ λ μ€λΉλ¬Ό, κ΅μμ μ΄λ»κ² λ°μ μ μλμ§ μ μλμ΄ μ€λͺ
μ ν΄μ€κ²μ.
|
25 |
+
(EBS)/(μ΄λΉμμ€) μ΄λ±μ κ²μν΄μ λ€μ΄κ°λ©΄μ 첫νλ©΄μ΄ μ΄λ κ² λμμ.
|
26 |
+
μ/ κ·Έλ¬λ©΄μ μ¬κΈ° (X)/(μμ€) λλ¬μ£Ό(κ³ μ)/(ꡬμ). μ κΈ° (λκ·ΈλΌλ―Έ)/(λ₯κ·ΈλΌλ―Έ) (EBS)/(μ΄λΉμμ€) (2μ£Ό)/(μ΄ μ£Ό) λΌμ΄λΈνΉκ°μ΄λΌκ³ λμ΄μμ£ ?
|
27 |
+
κ±°κΈ°λ₯Ό λ°λ‘ κ°κΈ°λ₯Ό λλ¦
λλ€. μ/ (λλ₯΄λ©΄μ)/(λλ₯΄λ©΄μ). μ΄λ»κ² λλ? b/ λ°μΌλ‘ λ΄λ €μ λ΄λ €μ λ΄λ €μ μ λ΄λ €μ.
|
28 |
+
μ°λ¦¬ λͺ νλ
μ΄μ£ ? μ/ (2νλ
)/(μ΄ νλ
) μ΄μ£ (2νλ
)/(μ΄ νλ
)μ λ¬΄μ¨ κ³Όλͺ©? κ΅μ΄.
|
29 |
+
μ΄λ²μ£Όλ (1μ£Ό)/(μΌ μ£Ό) μ°¨λκΉμ μ¬κΈ° κ΅μ. λ€μμ£Όλ μ¬κΈ°μ λ€μ΄μ λ°μΌλ©΄ λΌμ.
|
30 |
+
μ΄ κ΅μμ ν΄λ¦μ νλ©΄, μ§μ/. μ΄λ κ² κ΅μ¬κ° λμ΅λλ€ .μ΄ κ΅μμ (λ€μ΄)/(λ°μ΄)λ°μμ μ°λ¦¬ κ΅μ΄μ¬νμ λ λ μκ° μμ΄μ.
|
31 |
+
κ·ΈλΌ μ°λ¦¬ μ§μ§λ‘ κ΅μ΄ μ¬νμ νλ² λ λ보λλ‘ ν΄μ? κ΅μ΄μ¬ν μΆλ°. μ/ (1λ¨μ)/(μΌ λ¨μ) μ λͺ©μ΄ λκ°μ? νλ² μ°Ύμλ΄μ.
|
32 |
+
μλ₯Ό μ¦κ²¨μ μμ. κ·Έλ₯ μλ₯Ό μ½μ΄μ κ° μλμμ. μλ₯Ό μ¦κ²¨μΌ λΌμ μ¦κ²¨μΌ λΌ. μ΄λ»κ² μ¦κΈΈκΉ? μΌλ¨μ λ΄λ΄ μλ₯Ό μ¦κΈ°λ λ°©λ²μ λν΄μ 곡λΆλ₯Ό ν 건λ°μ.
|
33 |
+
κ·ΈλΌ μ€λμμ μ΄λ»κ² μ¦κΈΈκΉμ? μ€λ 곡λΆν λ΄μ©μμ μλ₯Ό μ¬λ¬ κ°μ§ λ°©λ²μΌλ‘ μ½κΈ°λ₯Ό 곡λΆν κ²λλ€.
|
34 |
+
μ΄λ»κ² μ¬λ¬κ°μ§ λ°©λ²μΌλ‘ μ½μκΉ μ°λ¦¬ 곡λΆν΄ 보λλ‘ ν΄μ. μ€λμ μ λμλΌ μ§μ/! μκ° λμμ΅λλ€ μμ μ λͺ©μ΄ λκ°μ? λ€ν° λ μ΄μμ λ€ν° λ .
|
35 |
+
λꡬλ λ€νλ λμμ΄λ λ€νλ μΈλλ μΉκ΅¬λ? λꡬλ λ€νλμ§ μ μλμ΄ μλ₯Ό μ½μ΄ μ€ ν
λκΉ νλ² μκ°μ ν΄λ³΄λλ‘ ν΄μ."""
|
36 |
+
|
37 |
+
inputs = [args.prefix + sample]
|
38 |
+
|
39 |
+
|
40 |
+
inputs = tokenizer(inputs, max_length=args.max_input_length, truncation=True, return_tensors="pt")
|
41 |
+
output = model.generate(**inputs, num_beams=3, do_sample=True, min_length=10, max_length=64)
|
42 |
+
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
|
43 |
+
result = nltk.sent_tokenize(decoded_output.strip())[0]
|
44 |
+
|
45 |
+
print('RESULT >>', result)
|
46 |
+
```
|
47 |
+
|
48 |
+
# Evalutation Result
|
49 |
+
|
50 |
+
# Training
|
51 |
+
|
52 |
+
# Model Architecture
|
53 |
+
|
54 |
+
```
|
55 |
+
T5ForConditionalGeneration(
|
56 |
+
(shared): Embedding(50358, 768)
|
57 |
+
(encoder): T5Stack(
|
58 |
+
(embed_tokens): Embedding(50358, 768)
|
59 |
+
(block): ModuleList(
|
60 |
+
(0): T5Block(
|
61 |
+
(layer): ModuleList(
|
62 |
+
(0): T5LayerSelfAttention(
|
63 |
+
(SelfAttention): T5Attention(
|
64 |
+
(q): Linear(in_features=768, out_features=768, bias=False)
|
65 |
+
(k): Linear(in_features=768, out_features=768, bias=False)
|
66 |
+
(v): Linear(in_features=768, out_features=768, bias=False)
|
67 |
+
(o): Linear(in_features=768, out_features=768, bias=False)
|
68 |
+
(relative_attention_bias): Embedding(32, 12)
|
69 |
+
)
|
70 |
+
(layer_norm): T5LayerNorm()
|
71 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
72 |
+
)
|
73 |
+
(1): T5LayerFF(
|
74 |
+
(DenseReluDense): T5DenseGatedActDense(
|
75 |
+
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
|
76 |
+
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
|
77 |
+
(wo): Linear(in_features=2048, out_features=768, bias=False)
|
78 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
79 |
+
(act): NewGELUActivation()
|
80 |
+
)
|
81 |
+
(layer_norm): T5LayerNorm()
|
82 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
83 |
+
)
|
84 |
+
)
|
85 |
+
)
|
86 |
+
(1~11): T5Block(
|
87 |
+
(layer): ModuleList(
|
88 |
+
(0): T5LayerSelfAttention(
|
89 |
+
(SelfAttention): T5Attention(
|
90 |
+
(q): Linear(in_features=768, out_features=768, bias=False)
|
91 |
+
(k): Linear(in_features=768, out_features=768, bias=False)
|
92 |
+
(v): Linear(in_features=768, out_features=768, bias=False)
|
93 |
+
(o): Linear(in_features=768, out_features=768, bias=False)
|
94 |
+
)
|
95 |
+
(layer_norm): T5LayerNorm()
|
96 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
97 |
+
)
|
98 |
+
(1): T5LayerFF(
|
99 |
+
(DenseReluDense): T5DenseGatedActDense(
|
100 |
+
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
|
101 |
+
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
|
102 |
+
(wo): Linear(in_features=2048, out_features=768, bias=False)
|
103 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
104 |
+
(act): NewGELUActivation()
|
105 |
+
)
|
106 |
+
(layer_norm): T5LayerNorm()
|
107 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
108 |
+
)
|
109 |
+
)
|
110 |
+
)
|
111 |
+
)
|
112 |
+
(final_layer_norm): T5LayerNorm()
|
113 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
114 |
+
)
|
115 |
+
(decoder): T5Stack(
|
116 |
+
(embed_tokens): Embedding(50358, 768)
|
117 |
+
(block): ModuleList(
|
118 |
+
(0): T5Block(
|
119 |
+
(layer): ModuleList(
|
120 |
+
(0): T5LayerSelfAttention(
|
121 |
+
(SelfAttention): T5Attention(
|
122 |
+
(q): Linear(in_features=768, out_features=768, bias=False)
|
123 |
+
(k): Linear(in_features=768, out_features=768, bias=False)
|
124 |
+
(v): Linear(in_features=768, out_features=768, bias=False)
|
125 |
+
(o): Linear(in_features=768, out_features=768, bias=False)
|
126 |
+
(relative_attention_bias): Embedding(32, 12)
|
127 |
+
)
|
128 |
+
(layer_norm): T5LayerNorm()
|
129 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
130 |
+
)
|
131 |
+
(1): T5LayerCrossAttention(
|
132 |
+
(EncDecAttention): T5Attention(
|
133 |
+
(q): Linear(in_features=768, out_features=768, bias=False)
|
134 |
+
(k): Linear(in_features=768, out_features=768, bias=False)
|
135 |
+
(v): Linear(in_features=768, out_features=768, bias=False)
|
136 |
+
(o): Linear(in_features=768, out_features=768, bias=False)
|
137 |
+
)
|
138 |
+
(layer_norm): T5LayerNorm()
|
139 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
140 |
+
)
|
141 |
+
(2): T5LayerFF(
|
142 |
+
(DenseReluDense): T5DenseGatedActDense(
|
143 |
+
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
|
144 |
+
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
|
145 |
+
(wo): Linear(in_features=2048, out_features=768, bias=False)
|
146 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
147 |
+
(act): NewGELUActivation()
|
148 |
+
)
|
149 |
+
(layer_norm): T5LayerNorm()
|
150 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
151 |
+
)
|
152 |
+
)
|
153 |
+
)
|
154 |
+
(1~11): T5Block(
|
155 |
+
(layer): ModuleList(
|
156 |
+
(0): T5LayerSelfAttention(
|
157 |
+
(SelfAttention): T5Attention(
|
158 |
+
(q): Linear(in_features=768, out_features=768, bias=False)
|
159 |
+
(k): Linear(in_features=768, out_features=768, bias=False)
|
160 |
+
(v): Linear(in_features=768, out_features=768, bias=False)
|
161 |
+
(o): Linear(in_features=768, out_features=768, bias=False)
|
162 |
+
)
|
163 |
+
(layer_norm): T5LayerNorm()
|
164 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
165 |
+
)
|
166 |
+
(1): T5LayerCrossAttention(
|
167 |
+
(EncDecAttention): T5Attention(
|
168 |
+
(q): Linear(in_features=768, out_features=768, bias=False)
|
169 |
+
(k): Linear(in_features=768, out_features=768, bias=False)
|
170 |
+
(v): Linear(in_features=768, out_features=768, bias=False)
|
171 |
+
(o): Linear(in_features=768, out_features=768, bias=False)
|
172 |
+
)
|
173 |
+
(layer_norm): T5LayerNorm()
|
174 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
175 |
+
)
|
176 |
+
(2): T5LayerFF(
|
177 |
+
(DenseReluDense): T5DenseGatedActDense(
|
178 |
+
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
|
179 |
+
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
|
180 |
+
(wo): Linear(in_features=2048, out_features=768, bias=False)
|
181 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
182 |
+
(act): NewGELUActivation()
|
183 |
+
)
|
184 |
+
(layer_norm): T5LayerNorm()
|
185 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
186 |
+
)
|
187 |
+
)
|
188 |
+
)
|
189 |
+
(final_layer_norm): T5LayerNorm()
|
190 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
191 |
+
)
|
192 |
+
(lm_head): Linear(in_features=768, out_features=50358, bias=False)
|
193 |
+
)
|
194 |
+
```
|
195 |
+
|
196 |
+
## Citation
|
197 |
+
|
198 |
+
- Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020): 1-67.
|
199 |
+
|