Text Summarization

This is a assignment of Applied Deep Learning which is a course of National Taiwan University(NTU).

Task Description：Chinese News Summarization (Title Generation)

input(news content)：

Anker在此次CES 2021中，宣布以旗下Soundcore品牌推出新款真無線藍牙耳機Liberty Air 2 Pro，同時也確定引進台灣市場。\nLiberty Air 2 Pro本身採入耳式耳塞設計，並且透過耳機外側觸控手勢進行操作，同時使用者也能配合App設定手勢對應功能。通話部分則可透過6組麥克風加強收音，以及降噪效果，使得藉由耳機通話也不會被環境噪音干擾。\n，\n至於耳機內部則採用11mm發聲單體，在主動式降噪功能對應交通、室內、室外三種模式，甚至可透過「完全通透」模式，讓使用者聆聽音樂之餘，仍可聽見環境周圍聲音，而「人聲增強通透模式」則可針對附近人聲部分提高音量，並且降低背景噪音，讓使用者在配戴耳機情況下仍可聽見他人說話，或是附近廣播內容。\n跟之前的真無線藍牙耳機一樣，Liberty Air 2 Pro也能透過專屬App的HearID 2.0工具分析耳朵聆聽偏好，讓耳機能配合個人需求發揮更貼切的聲音表現，或是由使用者自行調整音場效果。\n至於電力表現部分，Liberty Air 2 Pro在開啟主動式降噪時的電池續航表現為6小時，關閉主動式降噪功能則可達7小時左右，搭配充電盒使用的話，則最長使用時間可達21小時與26小時，充電盒本身也支援Qi無線充電。\n目前Liberty Air 2 Pro總計提供粉色、黑色、白色與藍色四款設計，目前已經開放Soundcore官網等通路銷售，至於台灣市場則預計透過智選家、三創友均選物ANKER旗艦店、台中My Ear 耳機專門店、法雅客、良興、三井3C等通路預購，建議售價為新台幣4280元。\n《原文刊登於合作媒體mashdigi，聯合新聞網獲授權轉載。》

output(news title)：

Anker新款真無線藍牙耳機Liberty Air 2 Pro 引進台灣市場

Objective

Fine-tune a pre-trained model：google/mt5-small to pass the baseline.
Compare the difference between beam search, top k sampling, top p sampling, temperature.
```
Baseline(f1-score)：rouge-1: 22.0, rouge-2: 8.5, rouge-L: 20.5
```

Experiments

Greedy
After the model generate the probility of every token as result, Greedy is the simplest way to choose the next word with most probable word(argmax). However, there is a problem that it's easy to choose the duplicate word with Greedy strategy.
```
Greedy Result(f1-score)：rouge-1: 1.5, rouge-2: 0.9, rouge-L: 1.4  
```
Beam Search
Beam Search strategy is keeping track of the k most probable sentences and finding the best one as a result. Therefore, if beam size is setting as 1, it becomes Greedy. We can say that beam search kind of solves the problem of Greedy. However, if beam size is too large, the result will turn into too generic and less relevant though the result is safe and "correct".
For example
```
input：  
I love to listen Taylor Swift's songs so I decide to participate the concert of Taylor.
output：  
What do you like to listen？
```
```
beam size = 5
Beam Search Result(f1-score)：rouge-1: 7.4, rouge-2: 1.9, rouge-L: 6.9  
```
Top k Sampling
Sampling is a strategy to randomly choose the next word via the probability distribution instead of argmax. Therefore, Top k Sampling samples the word via distribution but restricted to top-k probable words. However, there is a problem when sampling the rarely used word, the sentence will not fluent.
```
k = 5
Top k Result(f1-score)：rouge-1: 4.0, rouge-2: 0.5, rouge-L: 3.7  
```
Nucleus(Top p) Sampling Nucleus Sampling is sampling from a subset of vocabulary with the most probability mass. It can dynamically shrink and expand top-k.
```
p = 5
Top p Result(f1-score)：rouge-1: 3.0, rouge-2: 0.2, rouge-L: 2.9
```
Temperature softmax temperature is applying a temperature hyperparameter to the softmax.
with high temperature： become more uniform, more diversity
with low temperature：become more spiky, less diversity
```
temperature = 5
Temperature Result(f1-score)：rouge-1: 2.1, rouge-2: 0.04, rouge-L: 1.9
```
As the result, we can figure out that in this task, beam search outperforms other strategies.