readme: add model card
Browse files
README.md
CHANGED
@@ -1,3 +1,107 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
Anthropic's client side tokenizer.
|
6 |
+
|
7 |
+
Accuracy compared to actual Claude 3 Haiku tokenizer (Claude 3 family has the same tokenizer):
|
8 |
+
|
9 |
+
```python
|
10 |
+
Tokenization results saved to __temp.txt.tokens
|
11 |
+
Text: Hello, world! This is a simple...
|
12 |
+
Actual tokens: 17
|
13 |
+
Predicted tokens: 10
|
14 |
+
Accuracy: 58.82%
|
15 |
+
--------------------------------------------------
|
16 |
+
Tokenization results saved to __temp.txt.tokens
|
17 |
+
Text: The quick brown fox jumps over...
|
18 |
+
Actual tokens: 19
|
19 |
+
Predicted tokens: 10
|
20 |
+
Accuracy: 52.63%
|
21 |
+
--------------------------------------------------
|
22 |
+
Tokenization results saved to __temp.txt.tokens
|
23 |
+
Text: In computer programming, a hel...
|
24 |
+
Actual tokens: 29
|
25 |
+
Predicted tokens: 21
|
26 |
+
Accuracy: 72.41%
|
27 |
+
--------------------------------------------------
|
28 |
+
Tokenization results saved to __temp.txt.tokens
|
29 |
+
Text: Artificial intelligence (AI) i...
|
30 |
+
Actual tokens: 30
|
31 |
+
Predicted tokens: 24
|
32 |
+
Accuracy: 80.00%
|
33 |
+
--------------------------------------------------
|
34 |
+
Tokenization results saved to __temp.txt.tokens
|
35 |
+
Text: The Eiffel Tower is a wrought-...
|
36 |
+
Actual tokens: 56
|
37 |
+
Predicted tokens: 48
|
38 |
+
Accuracy: 85.71%
|
39 |
+
--------------------------------------------------
|
40 |
+
Tokenization results saved to __temp.txt.tokens
|
41 |
+
Text: To be, or not to be, that is t...
|
42 |
+
Actual tokens: 60
|
43 |
+
Predicted tokens: 50
|
44 |
+
Accuracy: 83.33%
|
45 |
+
--------------------------------------------------
|
46 |
+
Tokenization results saved to __temp.txt.tokens
|
47 |
+
Text: In the beginning God created t...
|
48 |
+
Actual tokens: 38
|
49 |
+
Predicted tokens: 31
|
50 |
+
Accuracy: 81.58%
|
51 |
+
--------------------------------------------------
|
52 |
+
Tokenization results saved to __temp.txt.tokens
|
53 |
+
Text: Four score and seven years ago...
|
54 |
+
Actual tokens: 41
|
55 |
+
Predicted tokens: 34
|
56 |
+
Accuracy: 82.93%
|
57 |
+
--------------------------------------------------
|
58 |
+
Tokenization results saved to __temp.txt.tokens
|
59 |
+
Text: I have a dream that one day th...
|
60 |
+
Actual tokens: 51
|
61 |
+
Predicted tokens: 43
|
62 |
+
Accuracy: 84.31%
|
63 |
+
--------------------------------------------------
|
64 |
+
Tokenization results saved to __temp.txt.tokens
|
65 |
+
Text: That's one small step for man,...
|
66 |
+
Actual tokens: 22
|
67 |
+
Predicted tokens: 14
|
68 |
+
Accuracy: 63.64%
|
69 |
+
--------------------------------------------------
|
70 |
+
Tokenization results saved to __temp.txt.tokens
|
71 |
+
Text: Here are the key points about ...
|
72 |
+
Actual tokens: 203
|
73 |
+
Predicted tokens: 195
|
74 |
+
Accuracy: 96.06%
|
75 |
+
--------------------------------------------------
|
76 |
+
Tokenization results saved to __temp.txt.tokens
|
77 |
+
Text: This appears to be an excerpt ...
|
78 |
+
Actual tokens: 179
|
79 |
+
Predicted tokens: 180
|
80 |
+
Accuracy: 99.44%
|
81 |
+
--------------------------------------------------
|
82 |
+
Tokenization results saved to __temp.txt.tokens
|
83 |
+
Text: This is the beginning of the b...
|
84 |
+
Actual tokens: 194
|
85 |
+
Predicted tokens: 191
|
86 |
+
Accuracy: 98.45%
|
87 |
+
--------------------------------------------------
|
88 |
+
Tokenization results saved to __temp.txt.tokens
|
89 |
+
Text: That is the opening lines of t...
|
90 |
+
Actual tokens: 177
|
91 |
+
Predicted tokens: 163
|
92 |
+
Accuracy: 92.09%
|
93 |
+
--------------------------------------------------
|
94 |
+
Tokenization results saved to __temp.txt.tokens
|
95 |
+
Text: That's a powerful and inspirin...
|
96 |
+
Actual tokens: 193
|
97 |
+
Predicted tokens: 190
|
98 |
+
Accuracy: 98.45%
|
99 |
+
--------------------------------------------------
|
100 |
+
Tokenization results saved to __temp.txt.tokens
|
101 |
+
Text: That famous quote is from Neil...
|
102 |
+
Actual tokens: 131
|
103 |
+
Predicted tokens: 122
|
104 |
+
Accuracy: 93.13%
|
105 |
+
--------------------------------------------------
|
106 |
+
Average accuracy: 82.69%
|
107 |
+
```
|