eolang commited on
Commit
1aae1ae
·
1 Parent(s): fec8fa9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -5
README.md CHANGED
@@ -14,8 +14,6 @@ widget:
14
 
15
  # SW
16
 
17
- * Pre-trained model on Swahili language using a masked language modeling (MLM) objective.
18
-
19
  ## Model description
20
 
21
  This is a transformers model pre-trained on a large corpus of Swahili data in a self-supervised fashion. This means it
@@ -42,10 +40,11 @@ The model is based on the Orginal BERT UNCASED which can be found on [google-res
42
  You can use the raw model for masked language modeling, but it's primarily intended to be fine-tuned on a downstream task.
43
 
44
  ### How to use
45
-
46
  You can use this model directly with a pipeline for masked language modeling:
47
 
48
 
 
 
49
  ```python
50
  from transformers import AutoTokenizer, AutoModelForMaskedLM
51
 
@@ -55,8 +54,26 @@ model = AutoModelForMaskedLM.from_pretrained("eolang/SW-v1")
55
  text = "Hii ni tovuti ya idhaa ya Kiswahili ya BBC ambayo hukuletea habari na makala kutoka Afrika na kote duniani kwa lugha ya Kiswahili."
56
  encoded_input = tokenizer(text, return_tensors='pt')
57
  output = model(**encoded_input)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ```
 
59
  ### Limitations and Bias
60
 
61
- Even if the training data used for this model could be reasonably neutral, this model can have biased
62
- predictions. This is something we are still working on improving.
 
14
 
15
  # SW
16
 
 
 
17
  ## Model description
18
 
19
  This is a transformers model pre-trained on a large corpus of Swahili data in a self-supervised fashion. This means it
 
40
  You can use the raw model for masked language modeling, but it's primarily intended to be fine-tuned on a downstream task.
41
 
42
  ### How to use
 
43
  You can use this model directly with a pipeline for masked language modeling:
44
 
45
 
46
+ #### Tokenizer
47
+
48
  ```python
49
  from transformers import AutoTokenizer, AutoModelForMaskedLM
50
 
 
54
  text = "Hii ni tovuti ya idhaa ya Kiswahili ya BBC ambayo hukuletea habari na makala kutoka Afrika na kote duniani kwa lugha ya Kiswahili."
55
  encoded_input = tokenizer(text, return_tensors='pt')
56
  output = model(**encoded_input)
57
+ print(output)
58
+ ```
59
+
60
+ #### Fill Mask Model
61
+
62
+ ```python
63
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
64
+ from transformers import pipeline
65
+
66
+ tokenizer = AutoTokenizer.from_pretrained("eolang/SW-v1")
67
+ model = AutoModelForMaskedLM.from_pretrained("eolang/SW-v1")
68
+
69
+ fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)
70
+ sample_text = "Tumefanya mabadiliko muhimu [MASK] sera zetu za faragha na vidakuzi"
71
+
72
+ for prediction in fill_mask(sample_text):
73
+ print(f"{prediction['sequence']}, confidence: {prediction['score']}")
74
  ```
75
+
76
  ### Limitations and Bias
77
 
78
+ Even if the training data used for this model could be reasonably neutral, this model can have biased predictions.
79
+ This is something I'm still working on improving. Feel free to share suggestions/comments via Discussion or [Email Me 😀](mailto:[email protected]?subject=HF%20Model%20Suggestions)