ST3: Simple Transformer 3
Model description
ST3 (Simple Transformer 3) is a lightweight transformer-based model derived from OpenAI's GPT-2 architecture. It was specifically designed to enable quick fine-tuning and experimentation, making it a great choice for researchers and developers seeking an efficient model for downstream tasks.
Key features:
- Architecture: GPT-2-based model with 3 attention heads and 3 layers.
- Embedding size: 288 parameters.
- Context size: 2048 tokens, allowing for extended input/output sequences.
- Pretrained on: Wikimedia/Wikipedia subset "20231101.es" (Spanish text corpus).
- Parameters: 4 million FP32 parameters.
- Batch size: 32.
- Training environment: 1 epoch on a Kaggle P100 GPU.
- Tokenizer: Custom WordPiece tokenizer "ST3" that generates tokens with "##" as a prefix for subword units.
Intended use
ST3 is not a highly powerful or fully functional model compared to larger transformer models but can be used for:
- Quick fine-tuning on small datasets.
- Research purposes to test new ideas.
- Educational and experimentation purposes.
This model has not been fine-tuned or evaluated with performance metrics as it’s not designed for state-of-the-art tasks.
Usage
To use the ST3 model, you can follow this example:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("BueormLLC/ST3")
model = AutoModelForCausalLM.from_pretrained("BueormLLC/ST3")
def clean_wordpiece_tokens(text):
return text.replace(" ##", "").replace("##", "")
input_text = "Esto es un ejemplo"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=2048, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
cleaned_text = clean_wordpiece_tokens(generated_text)
print(cleaned_text)
Explanation
The ST3 tokenizer uses the WordPiece algorithm, which generates tokens prefixed with "##" to indicate subword units. The provided clean_wordpiece_tokens
function removes these prefixes, allowing for cleaner output text.
Limitations
- Performance: ST3 lacks the power of larger models and may not perform well on complex language tasks.
- No evaluation: The model hasn’t been benchmarked with metrics.
- Not suitable for production use without further fine-tuning.
Training details
- Dataset: Wikimedia/Wikipedia subset "20231101.es".
- Number of layers: 3.
- Number of attention heads: 3.
- Embedding size: 288.
- Parameters: 4 million.
- Training: The model was trained for one epoch with a batch size of 32 on a P100 GPU provided by Kaggle.
Developer and publisher
- Developed by: BueormAI.
- Published by: BueormLLC.
Acknowledgments
Thank you for using ST3! Your feedback and support are appreciated as we continue to develop and improve our models.
If you find this model useful and would like to support further development, please consider making a donation to:
Contributions to this project are always welcome!
- Downloads last month
- 19
Model tree for BueormLLC/ST3
Base model
openai-community/gpt2