Update readme
Browse files
README.md
CHANGED
@@ -1,3 +1,118 @@
|
|
1 |
---
|
|
|
|
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- zh
|
4 |
license: apache-2.0
|
5 |
+
tags:
|
6 |
+
- chinese poem
|
7 |
+
- 中文
|
8 |
+
- 对联
|
9 |
+
widget:
|
10 |
+
- text: "对联:北国风光,千里冰封,万里雪飘"
|
11 |
---
|
12 |
+
|
13 |
+
# 一个好玩的中文AI对联模型
|
14 |
+
- 输入格式
|
15 |
+
- `对联:您的上联`,比如 `对联:北国风光,千里冰封,万里雪飘`
|
16 |
+
- 如果你想尝试
|
17 |
+
- 如果自己有GPU环境,可以参考我放在huggingface的[示例代码](https://huggingface.co/hululuzhu/chinese-couplet-t5-mengzi-finetune#%E8%BF%90%E8%A1%8C%E4%BB%A3%E7%A0%81%E7%A4%BA%E4%BE%8B)
|
18 |
+
- 训练代码请参考[我的github链接](https://github.com/hululuzhu/chinese-ai-writing-share)
|
19 |
+
- 如果想了解一些背景和讨论,可以看我的[slides](https://github.com/hululuzhu/chinese-ai-writing-share/tree/main/slides)
|
20 |
+
|
21 |
+
## 架构
|
22 |
+
- 预训练使用 [澜舟科技的孟子 T5](https://huggingface.co/Langboat/mengzi-t5-base)
|
23 |
+
|
24 |
+
## 数据来源
|
25 |
+
- 对联数据集 https://github.com/wb14123/couplet-dataset
|
26 |
+
- 标准输入输出seq2seq,T5使用`对联:`前缀,长度限制32字符
|
27 |
+
|
28 |
+
## 语言支持
|
29 |
+
- 默认简体中文
|
30 |
+
- 支持繁体中文,参考下面代码标记 `is_input_traditional_chinese=True`
|
31 |
+
|
32 |
+
## 训练
|
33 |
+
- 我是用 Google Colab Pro(推荐,16G的GPU一个月才9.99!)
|
34 |
+
|
35 |
+
## 运行代码示例
|
36 |
+
```python
|
37 |
+
# 安装以下2个包方便文字处理和模型生成
|
38 |
+
# !pip install -q simplet5
|
39 |
+
# !pip install -q chinese-converter
|
40 |
+
|
41 |
+
# 具体代码
|
42 |
+
import torch
|
43 |
+
from simplet5 import SimpleT5
|
44 |
+
from transformers import T5Tokenizer, T5ForConditionalGeneration
|
45 |
+
import chinese_converter
|
46 |
+
|
47 |
+
MODEL_PATH = "hululuzhu/chinese-couplet-t5-mengzi-finetune"
|
48 |
+
class PoemModel(SimpleT5):
|
49 |
+
def __init__(self) -> None:
|
50 |
+
super().__init__()
|
51 |
+
self.device = torch.device("cuda")
|
52 |
+
|
53 |
+
def load_my_model(self):
|
54 |
+
self.tokenizer = T5Tokenizer.from_pretrained(MODEL_PATH)
|
55 |
+
self.model = T5ForConditionalGeneration.from_pretrained(MODEL_PATH)
|
56 |
+
|
57 |
+
COUPLET_PROMPOT = '对联:'
|
58 |
+
MAX_SEQ_LEN = 32
|
59 |
+
MAX_OUT_TOKENS = MAX_SEQ_LEN
|
60 |
+
|
61 |
+
def couplet(in_str, model=couplet_model,
|
62 |
+
is_input_traditional_chinese=False,
|
63 |
+
num_beams=10):
|
64 |
+
model.model = model.model.to('cuda')
|
65 |
+
in_request = f"{COUPLET_PROMPOT}{in_str[:MAX_SEQ_LEN]}"
|
66 |
+
if is_input_traditional_chinese:
|
67 |
+
# model only knows s chinese
|
68 |
+
in_request = chinese_converter.to_simplified(in_request)
|
69 |
+
# Note default sampling is turned off for consistent result
|
70 |
+
out = model.predict(in_request,
|
71 |
+
max_length=MAX_OUT_TOKENS,
|
72 |
+
num_beams=num_beams)[0].replace(",", ",")
|
73 |
+
if is_input_traditional_chinese:
|
74 |
+
out = chinese_converter.to_traditional(out)
|
75 |
+
print(f"上: {in_str}\n下: {out}")
|
76 |
+
```
|
77 |
+
|
78 |
+
|
79 |
+
## 简体中文示例
|
80 |
+
```python
|
81 |
+
for pre in ['欢天喜地度佳节',
|
82 |
+
'不待鸣钟已汗颜,重来试手竟何艰',
|
83 |
+
'当年欲跃龙门去,今日真披马革还',
|
84 |
+
'北国风光,千里冰封,万里雪飘',
|
85 |
+
'寂寞寒窗空守寡',
|
86 |
+
'烟锁池塘柳',
|
87 |
+
'五科五状元,金木水火土',
|
88 |
+
'望江楼,望江流,望江楼上望江流,江楼千古,江流千古']:
|
89 |
+
couplet(pre)
|
90 |
+
|
91 |
+
上: 欢天喜地度佳节
|
92 |
+
下: 笑语欢歌迎新春
|
93 |
+
上: 不待鸣钟已汗颜,重来试手竟何艰
|
94 |
+
下: 何堪击鼓频催泪?一别伤心更枉然
|
95 |
+
上: 当年欲跃龙门去,今日真披马革还
|
96 |
+
下: 此日当登虎榜来,他年又见龙图新
|
97 |
+
上: 北国风光,千里冰封,万里雪飘
|
98 |
+
下: 南疆气象,五湖浪涌,三江潮来
|
99 |
+
上: 寂寞寒窗空守寡
|
100 |
+
下: 逍遥野渡醉吟诗
|
101 |
+
上: 烟锁池塘柳
|
102 |
+
下: 云封岭上松
|
103 |
+
上: 五科五状元,金木水火土
|
104 |
+
下: 三才三进士,诗书礼乐诗
|
105 |
+
上: 望江楼,望江流,望江楼上望江流,江楼千古,江流千古
|
106 |
+
下: 听雨阁,听雨落,听雨阁中听雨落,雨阁万重,雨落万重
|
107 |
+
```
|
108 |
+
|
109 |
+
# 繁体中文
|
110 |
+
```python
|
111 |
+
for pre in ['飛龍在天', '臺北風光好']:
|
112 |
+
couplet(pre, is_input_traditional_chinese=True, num_beams=10)
|
113 |
+
|
114 |
+
上: 飛龍在天
|
115 |
+
下: 飛鳳於天
|
116 |
+
上: 臺北風光好
|
117 |
+
下: 神州氣象新
|
118 |
+
```
|