felixb85 commited on
Commit
00e6621
β€’
1 Parent(s): 10bdd73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -4,13 +4,14 @@ datasets:
4
  - lc_quad
5
  ---
6
 
7
- This repo contains a custom tokenizer for SPARQL. Here is an example.
8
 
 
9
  ```
10
- Query: SELECT ?answer WHERE { wd:Q825946 wdt:P371 ?X . ?X wdt:P2048 ?answer}
11
  ```
12
 
13
- Result from default T5 tokenizer:
14
  ```
15
  ['▁', 'SEL', 'ECT', '▁', '?', 'ans', 'wer', '▁W', 'HER', 'E', '▁', '{', '▁', 'w', 'd', ':', 'Q', '82', '59', '46', '▁',
16
  'w', 'd', 't', ':', 'P', '37', '1', '▁', '?', 'X', '▁', '.', '▁', '?', 'X', '▁', 'w', 'd', 't', ':', 'P', '20', '48',
 
4
  - lc_quad
5
  ---
6
 
7
+ This repo contains a custom tokenizer for SPARQL. Here is an example. It is a SentencePieceBPE tokenizer trained on lc_quad.
8
 
9
+ Original query:
10
  ```
11
+ SELECT ?answer WHERE { wd:Q825946 wdt:P371 ?X . ?X wdt:P2048 ?answer}
12
  ```
13
 
14
+ Result from default T5 tokenizer (just as an example):
15
  ```
16
  ['▁', 'SEL', 'ECT', '▁', '?', 'ans', 'wer', '▁W', 'HER', 'E', '▁', '{', '▁', 'w', 'd', ':', 'Q', '82', '59', '46', '▁',
17
  'w', 'd', 't', ':', 'P', '37', '1', '▁', '?', 'X', '▁', '.', '▁', '?', 'X', '▁', 'w', 'd', 't', ':', 'P', '20', '48',