File size: 5,559 Bytes
a2d538b
 
 
4a46fe2
c3be920
 
a2d538b
 
5371262
a2d538b
 
5371262
 
a2d538b
 
 
 
 
 
 
 
5371262
 
 
6e1d91b
a2d538b
61c200f
a2d538b
 
 
 
61c200f
 
a2d538b
 
 
 
5371262
a2d538b
 
 
 
5371262
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
a2d538b
 
 
 
 
5371262
a2d538b
5371262
 
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
 
a2d538b
5371262
 
a2d538b
5371262
 
 
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
 
a2d538b
5371262
 
 
a2d538b
5371262
 
a2d538b
5371262
 
a2d538b
5371262
 
 
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
 
a2d538b
5371262
 
 
 
a2d538b
5371262
 
a2d538b
5371262
 
a2d538b
5371262
 
 
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
 
a2d538b
5371262
 
 
 
 
 
a2d538b
5371262
 
a2d538b
5371262
 
 
a2d538b
5371262
a2d538b
 
5371262
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
a2d538b
5371262
f2db341
5371262
a2d538b
 
 
 
 
 
 
5371262
 
 
a2d538b
6e1d91b
a2d538b
5371262
a2d538b
 
 
5371262
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
library_name: transformers
tags: []
widget:
- text: 'Please correct the following sentence: ndaids kurnda kumba kwaco'
  example_title: Spelling Correction
---

# Model Card for T5-Shona-SC

<!-- Provide a quick summary of what the model is/does. -->
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan2_architecture.jpg"
alt="drawing" width="600"/>


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [Thabolezwe Mabandla](http://www.linkedin.com/in/thabolezwe-mabandla-81a62a22b)
- **Model type:** Language Model
- **Language(s) (NLP):** Shona
- **Finetuned from model:** [FLAN-T5](https://huggingface.co/google/flan-t5-small)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper:** [More Information Needed]
- **Demo:** [More Information Needed]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
> Correction of spelling errors in shona sentences or phrases.

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
> Spelling correction

# Bias, Risks, and Limitations

The information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):

> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.

## Ethical considerations and risks

> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.

## How to Get Started with the Model

Use the code below to get started with the model.

### Running the model on a CPU

<details>
<summary> Click to expand </summary>

```python

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector")

input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU

<details>
<summary> Click to expand </summary>

```python
# pip install accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto")

input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU using different precisions

#### FP16

<details>
<summary> Click to expand </summary>

```python
# pip install accelerate
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto", torch_dtype=torch.float16)

input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

#### INT8

<details>
<summary> Click to expand </summary>

```python
# pip install bitsandbytes accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto", load_in_8bit=True)

input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>


## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Metrics

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->
<img src="https://huggingface.co/thaboe01/t5-spelling-corrector/blob/main/Screenshot%202024-05-21%20121138.png"
alt="metrics" width="600"/>

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [T4 GPU x 2]
- **Hours used:** [8]
- **Cloud Provider:** [Kaggle]

## Model Card Authors

Thabolezwe Mabandla

## Model Card Contact

[email protected]