kajyuuen commited on
Commit
1c89263
·
1 Parent(s): 1220bcd

Allow only unidic_lite

Browse files
Files changed (2) hide show
  1. README.md +9 -0
  2. distilbert_japanese_tokenizer.py +0 -16
README.md CHANGED
@@ -17,6 +17,7 @@ The model was trained by [LINE Corporation](https://linecorp.com/).
17
 
18
  https://github.com/line/LINE-DistilBERT-Japanese/blob/main/README_ja.md is written in Japanese.
19
 
 
20
  ## How to use
21
 
22
  ```python
@@ -28,6 +29,14 @@ sentence = "LINE株式会社で[MASK]の研究・開発をしている。"
28
  print(model(**tokenizer(sentence, return_tensors="pt")))
29
  ```
30
 
 
 
 
 
 
 
 
 
31
  ## Model architecture
32
 
33
  The model architecture is the DitilBERT base model; 6 layers, 768 dimensions of hidden states, 12 attention heads, 66M parameters.
 
17
 
18
  https://github.com/line/LINE-DistilBERT-Japanese/blob/main/README_ja.md is written in Japanese.
19
 
20
+
21
  ## How to use
22
 
23
  ```python
 
29
  print(model(**tokenizer(sentence, return_tensors="pt")))
30
  ```
31
 
32
+ ### Requirements
33
+
34
+ ```txt
35
+ fugashi
36
+ sentencepiece
37
+ unidic-lite
38
+ ```
39
+
40
  ## Model architecture
41
 
42
  The model architecture is the DitilBERT base model; 6 layers, 768 dimensions of hidden states, 12 attention heads, 66M parameters.
distilbert_japanese_tokenizer.py CHANGED
@@ -485,22 +485,6 @@ class MecabTokenizer:
485
  )
486
 
487
  dic_dir = unidic_lite.DICDIR
488
- elif mecab_dic == "unidic":
489
- try:
490
- import unidic
491
- except ModuleNotFoundError as error:
492
- raise error.__class__(
493
- "The unidic dictionary is not installed. "
494
- "See https://github.com/polm/unidic-py for installation."
495
- )
496
-
497
- dic_dir = unidic.DICDIR
498
- if not os.path.isdir(dic_dir):
499
- raise RuntimeError(
500
- "The unidic dictionary itself is not found. "
501
- "See https://github.com/polm/unidic-py for installation."
502
- )
503
-
504
  else:
505
  raise ValueError("Invalid mecab_dic is specified.")
506
 
 
485
  )
486
 
487
  dic_dir = unidic_lite.DICDIR
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
488
  else:
489
  raise ValueError("Invalid mecab_dic is specified.")
490