leafspark commited on
Commit
6c0c4b1
·
verified ·
1 Parent(s): b997ebb

readme: add model card

Browse files
Files changed (1) hide show
  1. README.md +107 -3
README.md CHANGED
@@ -1,3 +1,107 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ Anthropic's client side tokenizer.
6
+
7
+ Accuracy compared to actual Claude 3 Haiku tokenizer (Claude 3 family has the same tokenizer):
8
+
9
+ ```python
10
+ Tokenization results saved to __temp.txt.tokens
11
+ Text: Hello, world! This is a simple...
12
+ Actual tokens: 17
13
+ Predicted tokens: 10
14
+ Accuracy: 58.82%
15
+ --------------------------------------------------
16
+ Tokenization results saved to __temp.txt.tokens
17
+ Text: The quick brown fox jumps over...
18
+ Actual tokens: 19
19
+ Predicted tokens: 10
20
+ Accuracy: 52.63%
21
+ --------------------------------------------------
22
+ Tokenization results saved to __temp.txt.tokens
23
+ Text: In computer programming, a hel...
24
+ Actual tokens: 29
25
+ Predicted tokens: 21
26
+ Accuracy: 72.41%
27
+ --------------------------------------------------
28
+ Tokenization results saved to __temp.txt.tokens
29
+ Text: Artificial intelligence (AI) i...
30
+ Actual tokens: 30
31
+ Predicted tokens: 24
32
+ Accuracy: 80.00%
33
+ --------------------------------------------------
34
+ Tokenization results saved to __temp.txt.tokens
35
+ Text: The Eiffel Tower is a wrought-...
36
+ Actual tokens: 56
37
+ Predicted tokens: 48
38
+ Accuracy: 85.71%
39
+ --------------------------------------------------
40
+ Tokenization results saved to __temp.txt.tokens
41
+ Text: To be, or not to be, that is t...
42
+ Actual tokens: 60
43
+ Predicted tokens: 50
44
+ Accuracy: 83.33%
45
+ --------------------------------------------------
46
+ Tokenization results saved to __temp.txt.tokens
47
+ Text: In the beginning God created t...
48
+ Actual tokens: 38
49
+ Predicted tokens: 31
50
+ Accuracy: 81.58%
51
+ --------------------------------------------------
52
+ Tokenization results saved to __temp.txt.tokens
53
+ Text: Four score and seven years ago...
54
+ Actual tokens: 41
55
+ Predicted tokens: 34
56
+ Accuracy: 82.93%
57
+ --------------------------------------------------
58
+ Tokenization results saved to __temp.txt.tokens
59
+ Text: I have a dream that one day th...
60
+ Actual tokens: 51
61
+ Predicted tokens: 43
62
+ Accuracy: 84.31%
63
+ --------------------------------------------------
64
+ Tokenization results saved to __temp.txt.tokens
65
+ Text: That's one small step for man,...
66
+ Actual tokens: 22
67
+ Predicted tokens: 14
68
+ Accuracy: 63.64%
69
+ --------------------------------------------------
70
+ Tokenization results saved to __temp.txt.tokens
71
+ Text: Here are the key points about ...
72
+ Actual tokens: 203
73
+ Predicted tokens: 195
74
+ Accuracy: 96.06%
75
+ --------------------------------------------------
76
+ Tokenization results saved to __temp.txt.tokens
77
+ Text: This appears to be an excerpt ...
78
+ Actual tokens: 179
79
+ Predicted tokens: 180
80
+ Accuracy: 99.44%
81
+ --------------------------------------------------
82
+ Tokenization results saved to __temp.txt.tokens
83
+ Text: This is the beginning of the b...
84
+ Actual tokens: 194
85
+ Predicted tokens: 191
86
+ Accuracy: 98.45%
87
+ --------------------------------------------------
88
+ Tokenization results saved to __temp.txt.tokens
89
+ Text: That is the opening lines of t...
90
+ Actual tokens: 177
91
+ Predicted tokens: 163
92
+ Accuracy: 92.09%
93
+ --------------------------------------------------
94
+ Tokenization results saved to __temp.txt.tokens
95
+ Text: That's a powerful and inspirin...
96
+ Actual tokens: 193
97
+ Predicted tokens: 190
98
+ Accuracy: 98.45%
99
+ --------------------------------------------------
100
+ Tokenization results saved to __temp.txt.tokens
101
+ Text: That famous quote is from Neil...
102
+ Actual tokens: 131
103
+ Predicted tokens: 122
104
+ Accuracy: 93.13%
105
+ --------------------------------------------------
106
+ Average accuracy: 82.69%
107
+ ```