cbdb commited on
Commit
8828dd7
1 Parent(s): 6dbe706

Create README.md

Browse files

Add a readme file

Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zh
4
+ tags:
5
+ - Seq2SeqLM
6
+ - 古文
7
+ - 文言文
8
+ - 中国古代官职翻译
9
+ - ancient
10
+ - classical
11
+ license: cc-by-nc-sa-4.0
12
+ metrics:
13
+ - sacrebleu
14
+ ---
15
+
16
+ # <font color="IndianRed"> TITO (Classical Chinese Office Title Translation)</font>
17
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1UoG3QebyBlK6diiYckiQv-5dRB9dA4iv?usp=sharing/)
18
+
19
+ Our model <font color="cornflowerblue">TITO (Classical Chinese Office Title Translation) </font> is a Sequence to Sequence Classical Chinese language model that is intended to <font color="IndianRed">translate a Classical Chinese office title into English</font>. This model is first inherited from the MarianMTModel, and finetuned using a 6,208 high-quality translation pairs collected CBDB group (China Biographical Database).
20
+
21
+ ### <font color="IndianRed"> How to use </font>
22
+
23
+ Here is how to use this model to get the features of a given text in PyTorch:
24
+
25
+ <font color="cornflowerblue"> 1. Import model and packages </font>
26
+ ```python
27
+ from transformers import MarianMTModel, MarianTokenizer
28
+
29
+ device = torch.device('cuda')
30
+ model_name = 'cbdb/ClassicalChineseOfficeTitleTranslation'
31
+ tokenizer = MarianTokenizer.from_pretrained(model_name)
32
+ model = MarianMTModel.from_pretrained(model_name).to(device)
33
+ ```
34
+
35
+ <font color="cornflowerblue"> 2. Load Data </font>
36
+ ```python
37
+ # Load your data here
38
+ tobe_translated = ['講筵官','判司簿尉','散騎常侍','殿中省尚輦奉御']
39
+ ```
40
+
41
+ <font color="cornflowerblue"> 3. Make a prediction </font>
42
+ ```python
43
+ trans_list = []
44
+ for chinese_sentence in tobe_translated:
45
+ inputs = tokenizer([chinese_sentence], return_tensors="pt", padding=True).to(device)
46
+ translated = model.generate(**inputs, max_length=128)
47
+ tran = [tokenizer.decode(t, skip_special_tokens=True) for t in translated][0]
48
+ print(f'{chinese_sentence}: {tran}')
49
+ trans_list.append(tran[0])
50
+ ```
51
+ 講筵官: Lecturer<br>
52
+ 判司簿尉: Supervisor of the Commandant of Records<br>
53
+ 散騎常侍: Policy Advisor<br>
54
+ 殿中省尚輦奉御: Chief Steward of the Palace Administration<br>
55
+
56
+ ### <font color="IndianRed">Authors </font>
57
+ Queenie Luo (queenieluo[at]g.harvard.edu)
58
+ <br>
59
+ Hongsu Wang
60
+ <br>
61
+ Peter Bol
62
+ <br>
63
+ CBDB Group
64
+
65
+ ### <font color="IndianRed">License </font>
66
+ Copyright (c) 2023 CBDB
67
+
68
+ Except where otherwise noted, content on this repository is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
69
+ To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or
70
+ send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.