cocoirun/longforemr-kobart-summary-v1

Longformer인코더 KoBART로 AIHUB 금융 및 콜 상담 대화 데이터를 CHATGPT를 통해 요약한 학습 데이터를 학습한 모델

input = """고객: 안녕하세요, 제가 여기서 사용하는 신용카드에 대해 궁금한 게 있어요.

상담원: 안녕하세요! 네, 어떤 문의가 있으신가요?

고객: 제가 이번 달에 카드를 사용하면서 리워드 포인트를 얼마나 쌓았는지 확인하고 싶어요.

상담원: 네, 당신의 리워드 포인트 잔액을 확인해 드릴 수 있습니다. 제가 당신의 카드 번호를 입력하고 확인해볼게요. 번호를 알려주실 수 있을까요?

고객: 네, 제 카드 번호는 1234-5678-9012-3456입니다.

상담원: 감사합니다. 잠시만 기다려주세요. 확인 중이에요... 네, 현재 당신의 리워드 포인트 잔액은 3,250 포인트입니다.

고객: 알겠어요, 감사합니다! 그럼 추가적인 이용 혜택이나 할인에 관한 정보도 얻을 수 있을까요?

상담원: 물론이죠! 저희 카드사는 다양한 이용 혜택을 제공하고 있습니다. 예를 들어, 여행, 쇼핑, 식사 등 다양한 분야에서 할인 혜택을 받을 수 있거나, 리워드 포인트를 사용하여 상품이나 기프트 카드로 교환할 수 있습니다. 어떤 혜택에 관심이 있으신가요?

고객: 저는 여행 할인이나 마일리지 적립에 관심이 있어요.

상담원: 그런 경우에는 당신에게 적합한 여행 카드 혜택을 제공하는 카드를 추천해 드릴 수 있습니다. 여행 카드는 항공사 마일리지를 쌓을 수 있고, 호텔 할인 혜택을 받을 수도 있습니다. 제가 몇 가지 옵션을 제안해 볼까요?

고객: 네, 그러면 좋을 것 같아요. 감사합니다!
상담원: 말씀해 주셔서 감사합니다. 이제 제가 몇 가지 추천을 드리도록 하겠습니다. 어떤 항공사를 주로 이용하시나요?"""

output ="""
- 고객이 신용카드에 대해 궁금한 사항 상담
- 리워드 포인트 확인 요청
- 상담원이 카드 번호와 잔액 확인 후 추가 이용 혜택 안내
- 고객이 여행 할인, 마일리지, 호텔 할인 등 다양한 혜택에 관심 표현
"""

해당 모델을 활용하기 위해서 다음과 같은 class 필요

class LongformerSelfAttentionForBart(nn.Module):
    def __init__(self, config, layer_id):
        super().__init__()
        self.embed_dim = config.d_model
        self.longformer_self_attn = LongformerSelfAttention(config, layer_id=layer_id)
        self.output = nn.Linear(self.embed_dim, self.embed_dim)

    def forward(
        self,
        hidden_states: torch.Tensor,
        key_value_states: Optional[torch.Tensor] = None,
        past_key_value: Optional[Tuple[torch.Tensor]] = None,
        attention_mask: Optional[torch.Tensor] = None,
        layer_head_mask: Optional[torch.Tensor] = None,
        output_attentions: bool = False,
    ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:

        is_cross_attention = key_value_states is not None
        bsz, tgt_len, embed_dim = hidden_states.size()

        # bs x seq_len x seq_len -> bs x seq_len 으로 변경
        attention_mask = attention_mask.squeeze(dim=1)
        attention_mask = attention_mask[:,0]

        is_index_masked = attention_mask < 0
        is_index_global_attn = attention_mask > 0
        is_global_attn = is_index_global_attn.flatten().any().item()

        outputs = self.longformer_self_attn(
            hidden_states,
            attention_mask=attention_mask,
            layer_head_mask=None,
            is_index_masked=is_index_masked,
            is_index_global_attn=is_index_global_attn,
            is_global_attn=is_global_attn,
            output_attentions=output_attentions,
        )

        attn_output = self.output(outputs[0])

        return (attn_output,) + outputs[1:] if len(outputs) == 2 else (attn_output, None, None)

class LongformerEncoderDecoderForConditionalGeneration(BartForConditionalGeneration):
    def __init__(self, config):
        super().__init__(config)
        
        if config.attention_mode == 'n2':
            pass  # do nothing, use BertSelfAttention instead
        else:

            self.model.encoder.embed_positions = BartLearnedPositionalEmbedding(
                config.max_encoder_position_embeddings, 
                config.d_model)

            self.model.decoder.embed_positions = BartLearnedPositionalEmbedding(
                config.max_decoder_position_embeddings, 
                config.d_model)

            for i, layer in enumerate(self.model.encoder.layers):
                layer.self_attn = LongformerSelfAttentionForBart(config, layer_id=i)

class LongformerEncoderDecoderConfig(BartConfig):
    def __init__(self, attention_window: List[int] = None, attention_dilation: List[int] = None,
                 autoregressive: bool = False, attention_mode: str = 'sliding_chunks',
                 gradient_checkpointing: bool = False, **kwargs):
        """
        Args:
            attention_window: list of attention window sizes of length = number of layers.
                window size = number of attention locations on each side.
                For an affective window size of 512, use `attention_window=[256]*num_layers`
                which is 256 on each side.
            attention_dilation: list of attention dilation of length = number of layers.
                attention dilation of `1` means no dilation.
            autoregressive: do autoregressive attention or have attention of both sides
            attention_mode: 'n2' for regular n^2 self-attention, 'tvm' for TVM implemenation of Longformer
                selfattention, 'sliding_chunks' for another implementation of Longformer selfattention
        """
        super().__init__(**kwargs)
        self.attention_window = attention_window
        self.attention_dilation = attention_dilation
        self.autoregressive = autoregressive
        self.attention_mode = attention_mode
        self.gradient_checkpointing = gradient_checkpointing
        assert self.attention_mode in ['tvm', 'sliding_chunks', 'n2']

모델 오브젝트 로드 후 weight파일을 별도로 다운받아서 load_state_dict로 웨이트를 불러야 합니다.

tokenizer = AutoTokenizer.from_pretrained("cocoirun/longforemr-kobart-summary-v1")
model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained("cocoirun/longforemr-kobart-summary-v1")
device = torch.device('cuda')
model.load_state_dict(torch.load("summary weight.ckpt"))
model.to(device)

모델 요약 함수

def summarize(text, max_len):
    max_seq_len = 4096
    context_tokens = ['<s>'] + tokenizer.tokenize(text) + ['</s>']
    input_ids = tokenizer.convert_tokens_to_ids(context_tokens) 

    if len(input_ids) < max_seq_len:   
            while len(input_ids) < max_seq_len: 
                input_ids += [tokenizer.pad_token_id] 

    else:
        input_ids = input_ids[:max_seq_len - 1] + [   
            tokenizer.eos_token_id]

    res_ids = model.generate(torch.tensor([input_ids]).to(device),
                                        max_length=max_len,
                                        num_beams=5,
                                        no_repeat_ngram_size = 3,
                                        eos_token_id=tokenizer.eos_token_id,
                                        bad_words_ids=[[tokenizer.unk_token_id]])        
    
    res = tokenizer.batch_decode(res_ids.tolist(), skip_special_tokens=True)[0]
    res = res.replace("\n\n","\n")
    return res