File size: 5,559 Bytes

---
library_name: transformers
tags: []
widget:
- text: 'Please correct the following sentence: ndaids kurnda kumba kwaco'
  example_title: Spelling Correction
---

# Model Card for T5-Shona-SC

<!-- Provide a quick summary of what the model is/does. -->
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan2_architecture.jpg"
alt="drawing" width="600"/>


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [Thabolezwe Mabandla](http://www.linkedin.com/in/thabolezwe-mabandla-81a62a22b)
- **Model type:** Language Model
- **Language(s) (NLP):** Shona
- **Finetuned from model:** [FLAN-T5](https://huggingface.co/google/flan-t5-small)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper:** [More Information Needed]
- **Demo:** [More Information Needed]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
> Correction of spelling errors in shona sentences or phrases.

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
> Spelling correction

# Bias, Risks, and Limitations

The information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):

> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.

## Ethical considerations and risks

> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.

## How to Get Started with the Model

Use the code below to get started with the model.

### Running the model on a CPU

<details>
<summary> Click to expand </summary>

```python

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector")

input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU

<details>
<summary> Click to expand </summary>

```python
# pip install accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto")

input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

### Running the model on a GPU using different precisions

#### FP16

<details>
<summary> Click to expand </summary>

```python
# pip install accelerate
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto", torch_dtype=torch.float16)

input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>

#### INT8

<details>
<summary> Click to expand </summary>

```python
# pip install bitsandbytes accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto", load_in_8bit=True)

input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```

</details>


## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Metrics

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->
<img src="https://huggingface.co/thaboe01/t5-spelling-corrector/blob/main/Screenshot%202024-05-21%20121138.png"
alt="metrics" width="600"/>

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [T4 GPU x 2]
- **Hours used:** [8]
- **Cloud Provider:** [Kaggle]

## Model Card Authors

Thabolezwe Mabandla

## Model Card Contact

[email protected]