Gerson Fabian Buenahora Ormaza
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -9,6 +9,8 @@ base_model:
|
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
|
|
|
|
|
12 |
# ST3: Simple Transformer 3
|
13 |
|
14 |
## Model description
|
@@ -22,7 +24,7 @@ ST3 (Simple Transformer 3) is a lightweight transformer-based model derived from
|
|
22 |
- **Parameters:** 4 million FP32 parameters.
|
23 |
- **Batch size:** 32.
|
24 |
- **Training environment:** 1 epoch on a Kaggle P100 GPU.
|
25 |
-
- **Tokenizer:** Custom WordPiece tokenizer "ST3" with
|
26 |
|
27 |
## Intended use
|
28 |
ST3 is not a highly powerful or fully functional model compared to larger transformer models but can be used for:
|
@@ -32,6 +34,32 @@ ST3 is not a highly powerful or fully functional model compared to larger transf
|
|
32 |
|
33 |
This model has not been fine-tuned or evaluated with performance metrics as it’s not designed for state-of-the-art tasks.
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
## Limitations
|
36 |
- **Performance:** ST3 lacks the power of larger models and may not perform well on complex language tasks.
|
37 |
- **No evaluation:** The model hasn’t been benchmarked with metrics.
|
@@ -60,4 +88,3 @@ If you find this model useful and would like to support further development, ple
|
|
60 |
---
|
61 |
|
62 |
*Contributions to this project are always welcome!*
|
63 |
-
|
|
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
|
12 |
+
|
13 |
+
|
14 |
# ST3: Simple Transformer 3
|
15 |
|
16 |
## Model description
|
|
|
24 |
- **Parameters:** 4 million FP32 parameters.
|
25 |
- **Batch size:** 32.
|
26 |
- **Training environment:** 1 epoch on a Kaggle P100 GPU.
|
27 |
+
- **Tokenizer:** Custom WordPiece tokenizer "ST3" that generates tokens with "##" as a prefix for subword units.
|
28 |
|
29 |
## Intended use
|
30 |
ST3 is not a highly powerful or fully functional model compared to larger transformer models but can be used for:
|
|
|
34 |
|
35 |
This model has not been fine-tuned or evaluated with performance metrics as it’s not designed for state-of-the-art tasks.
|
36 |
|
37 |
+
### Usage
|
38 |
+
To use the ST3 model, you can follow this example:
|
39 |
+
|
40 |
+
```python
|
41 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
42 |
+
|
43 |
+
tokenizer = AutoTokenizer.from_pretrained("BueormLLC/ST3")
|
44 |
+
model = AutoModelForCausalLM.from_pretrained("BueormLLC/ST3")
|
45 |
+
|
46 |
+
def clean_wordpiece_tokens(text):
|
47 |
+
return text.replace(" ##", "").replace("##", "")
|
48 |
+
|
49 |
+
input_text = "Esto es un ejemplo"
|
50 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
51 |
+
|
52 |
+
outputs = model.generate(inputs.input_ids, max_length=2048, num_return_sequences=1)
|
53 |
+
|
54 |
+
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
55 |
+
cleaned_text = clean_wordpiece_tokens(generated_text)
|
56 |
+
|
57 |
+
print(cleaned_text)
|
58 |
+
```
|
59 |
+
|
60 |
+
### Explanation
|
61 |
+
The ST3 tokenizer uses the WordPiece algorithm, which generates tokens prefixed with "##" to indicate subword units. The provided `clean_wordpiece_tokens` function removes these prefixes, allowing for cleaner output text.
|
62 |
+
|
63 |
## Limitations
|
64 |
- **Performance:** ST3 lacks the power of larger models and may not perform well on complex language tasks.
|
65 |
- **No evaluation:** The model hasn’t been benchmarked with metrics.
|
|
|
88 |
---
|
89 |
|
90 |
*Contributions to this project are always welcome!*
|
|