lordChipotle commited on
Commit
a47273d
·
verified ·
1 Parent(s): 790e3af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -13,15 +13,22 @@ tags:
13
  Ancient Chinese Translator + Phonology Model (SimaQian)
14
 
15
  Name Origin:
 
16
  The origin of the model name comes from famous ancient chinese historian Qian Sima (司馬遷), known for his Records of the Grand Historian, a general history of China covering more than two thousand years.
17
 
18
  This model combines two key functionalities for Ancient Chinese texts:
 
19
  1. Translation: Converts Ancient Chinese passages into modern Chinese.
 
20
  2. Phonological Reconstruction: Provides historical pronunciations for characters or entire sentences across multiple eras (e.g., Middle Tang, Song, Yuan, Ming/Qing).
21
 
 
22
  Model Description
 
23
  • Architecture: Fine-tuned on top of Google’s Gemma 2 model using LoRA.
 
24
  • Input Format: Special tokens <start_of_turn> / <end_of_turn> define user vs. model turns.
 
25
  • Output: Era identification (optional), phonetic renderings, and modern Chinese translations.
26
 
27
  Training Data
@@ -32,11 +39,14 @@ Training Data
32
 
33
  Usage
34
 
 
35
  from transformers import AutoTokenizer, AutoModelForCausalLM
36
 
37
  tokenizer = AutoTokenizer.from_pretrained("username/ancient-chinese-phonology")
 
38
  model = AutoModelForCausalLM.from_pretrained("username/ancient-chinese-phonology")
39
 
 
40
  prompt = """
41
  <start_of_turn>user
42
  Given the ancient text: 「子曰:學而時習之,不亦說乎?」
@@ -46,11 +56,18 @@ Usage
46
  <end_of_turn>
47
  <start_of_turn>model
48
  """
 
49
  inputs = tokenizer(prompt, return_tensors="pt")
 
50
  outputs = model.generate(**inputs, max_length=256)
 
51
  print(tokenizer.decode(outputs[0]))
52
 
 
53
  Limitations and Biases
 
54
  • Era Estimation: Model may not always correctly guess the historical era.
 
55
  • Pronunciations: Reconstructions are approximate and can vary by scholarly consensus.
 
56
  • Contextual Accuracy: For highly contextual Ancient Chinese passages, translations may need further review by domain experts.
 
13
  Ancient Chinese Translator + Phonology Model (SimaQian)
14
 
15
  Name Origin:
16
+
17
  The origin of the model name comes from famous ancient chinese historian Qian Sima (司馬遷), known for his Records of the Grand Historian, a general history of China covering more than two thousand years.
18
 
19
  This model combines two key functionalities for Ancient Chinese texts:
20
+
21
  1. Translation: Converts Ancient Chinese passages into modern Chinese.
22
+
23
  2. Phonological Reconstruction: Provides historical pronunciations for characters or entire sentences across multiple eras (e.g., Middle Tang, Song, Yuan, Ming/Qing).
24
 
25
+
26
  Model Description
27
+
28
  • Architecture: Fine-tuned on top of Google’s Gemma 2 model using LoRA.
29
+
30
  • Input Format: Special tokens <start_of_turn> / <end_of_turn> define user vs. model turns.
31
+
32
  • Output: Era identification (optional), phonetic renderings, and modern Chinese translations.
33
 
34
  Training Data
 
39
 
40
  Usage
41
 
42
+
43
  from transformers import AutoTokenizer, AutoModelForCausalLM
44
 
45
  tokenizer = AutoTokenizer.from_pretrained("username/ancient-chinese-phonology")
46
+
47
  model = AutoModelForCausalLM.from_pretrained("username/ancient-chinese-phonology")
48
 
49
+
50
  prompt = """
51
  <start_of_turn>user
52
  Given the ancient text: 「子曰:學而時習之,不亦說乎?」
 
56
  <end_of_turn>
57
  <start_of_turn>model
58
  """
59
+
60
  inputs = tokenizer(prompt, return_tensors="pt")
61
+
62
  outputs = model.generate(**inputs, max_length=256)
63
+
64
  print(tokenizer.decode(outputs[0]))
65
 
66
+
67
  Limitations and Biases
68
+
69
  • Era Estimation: Model may not always correctly guess the historical era.
70
+
71
  • Pronunciations: Reconstructions are approximate and can vary by scholarly consensus.
72
+
73
  • Contextual Accuracy: For highly contextual Ancient Chinese passages, translations may need further review by domain experts.