thaboe01's picture
Update README.md
61c200f verified
---
library_name: transformers
tags: []
widget:
- text: 'Please correct the following sentence: ndaids kurnda kumba kwaco'
example_title: Spelling Correction
---
# Model Card for T5-Shona-SC
<!-- Provide a quick summary of what the model is/does. -->
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan2_architecture.jpg"
alt="drawing" width="600"/>
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [Thabolezwe Mabandla](http://www.linkedin.com/in/thabolezwe-mabandla-81a62a22b)
- **Model type:** Language Model
- **Language(s) (NLP):** Shona
- **Finetuned from model:** [FLAN-T5](https://huggingface.co/google/flan-t5-small)
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper:** [More Information Needed]
- **Demo:** [More Information Needed]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
> Correction of spelling errors in shona sentences or phrases.
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
> Spelling correction
# Bias, Risks, and Limitations
The information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):
> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
## Ethical considerations and risks
> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.
## How to Get Started with the Model
Use the code below to get started with the model.
### Running the model on a CPU
<details>
<summary> Click to expand </summary>
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector")
input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```
</details>
### Running the model on a GPU
<details>
<summary> Click to expand </summary>
```python
# pip install accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto")
input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```
</details>
### Running the model on a GPU using different precisions
#### FP16
<details>
<summary> Click to expand </summary>
```python
# pip install accelerate
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto", torch_dtype=torch.float16)
input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```
</details>
#### INT8
<details>
<summary> Click to expand </summary>
```python
# pip install bitsandbytes accelerate
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("thaboe01/t5-spelling-corrector")
model = T5ForConditionalGeneration.from_pretrained("thaboe01/t5-spelling-corrector", device_map="auto", load_in_8bit=True)
input_text = "Please correct the following sentence: ndaids kurnda kumba kwaco"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```
</details>
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Metrics
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
<img src="https://huggingface.co/thaboe01/t5-spelling-corrector/blob/main/Screenshot%202024-05-21%20121138.png"
alt="metrics" width="600"/>
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [T4 GPU x 2]
- **Hours used:** [8]
- **Cloud Provider:** [Kaggle]
## Model Card Authors
Thabolezwe Mabandla
## Model Card Contact
[email protected]