Korean Question Generation Model

Github

https://github.com/Seoneun/KoBART-Question-Generation

Fine-tuning Dataset

KorQuAD 1.0

Demo

https://huggingface.co/Sehong/kobart-QuestionGeneration

How to use

import torch
from transformers import PreTrainedTokenizerFast
from transformers import BartForConditionalGeneration

tokenizer = PreTrainedTokenizerFast.from_pretrained('Sehong/kobart-QuestionGeneration')
model = BartForConditionalGeneration.from_pretrained('Sehong/kobart-QuestionGeneration')

text = "1989λ…„ 2μ›” 15일 μ—¬μ˜λ„ 농민 폭λ ₯ μ‹œμœ„λ₯Ό μ£Όλ„ν•œ 혐의(폭λ ₯ν–‰μœ„λ“±μ²˜λ²Œμ—κ΄€ν•œλ²•λ₯ μœ„λ°˜)으둜 지λͺ…μˆ˜λ°°λ˜μ—ˆλ‹€. 1989λ…„ 3μ›” 12일 μ„œμšΈμ§€λ°©κ²€μ°°μ²­ κ³΅μ•ˆλΆ€λŠ” μž„μ’…μ„μ˜ μ‚¬μ „κ΅¬μ†μ˜μž₯을 λ°œλΆ€λ°›μ•˜λ‹€. 같은 ν•΄ 6μ›” 30일 평양좕전에 μž„μˆ˜κ²½μ„ λŒ€ν‘œλ‘œ νŒŒκ²¬ν•˜μ—¬ κ΅­κ°€λ³΄μ•ˆλ²•μœ„λ°˜ ν˜μ˜κ°€ μΆ”κ°€λ˜μ—ˆλ‹€. 경찰은 12μ›” 18일~20일 사이 μ„œμšΈ κ²½ν¬λŒ€ν•™κ΅μ—μ„œ μž„μ’…μ„μ΄ μ„±λͺ… λ°œν‘œλ₯Ό μΆ”μ§„ν•˜κ³  μžˆλ‹€λŠ” 첩보λ₯Ό μž…μˆ˜ν–ˆκ³ , 12μ›” 18일 μ˜€μ „ 7μ‹œ 40λΆ„ κ²½ κ°€μŠ€μ΄κ³Ό μ „μžλ΄‰μœΌλ‘œ 무μž₯ν•œ 특곡쑰 및 λŒ€κ³΅κ³Ό 직원 12λͺ… λ“± 22λͺ…μ˜ 사볡 경찰을 승용차 8λŒ€μ— λ‚˜λˆ„μ–΄ κ²½ν¬λŒ€ν•™κ΅μ— νˆ¬μž…ν–ˆλ‹€. 1989λ…„ 12μ›” 18일 μ˜€μ „ 8μ‹œ 15λΆ„ κ²½ μ„œμšΈμ²­λŸ‰λ¦¬κ²½μ°°μ„œλŠ” ν˜Έμœ„ 학생 5λͺ…κ³Ό ν•¨κ»˜ κ²½ν¬λŒ€ν•™κ΅ ν•™μƒνšŒκ΄€ 건물 계단을 λ‚΄λ €μ˜€λŠ” μž„μ’…μ„μ„ 발견, κ²€κ±°ν•΄ ꡬ속을 μ§‘ν–‰ν–ˆλ‹€. μž„μ’…μ„μ€ μ²­λŸ‰λ¦¬κ²½μ°°μ„œμ—μ„œ μ•½ 1μ‹œκ°„ λ™μ•ˆ 쑰사λ₯Ό 받은 λ’€ μ˜€μ „ 9μ‹œ 50λΆ„ κ²½ μ„œμšΈ μž₯μ•ˆλ™μ˜ μ„œμšΈμ§€λ°©κ²½μ°°μ²­ κ³΅μ•ˆλΆ„μ‹€λ‘œ μΈκ³„λ˜μ—ˆλ‹€. <unused0> 1989λ…„ 2μ›” 15일"

raw_input_ids = tokenizer.encode(text)
input_ids = [tokenizer.bos_token_id] + raw_input_ids + [tokenizer.eos_token_id]

summary_ids = model.generate(torch.tensor([input_ids]))
print(tokenizer.decode(summary_ids.squeeze().tolist(), skip_special_tokens=True))

# <unused0> is sep_token, sep_token seperate content and answer
Downloads last month
37
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.