ST3: Simple Transformer 3

Model description

ST3 (Simple Transformer 3) is a lightweight transformer-based model derived from OpenAI's GPT-2 architecture. It was specifically designed to enable quick fine-tuning and experimentation, making it a great choice for researchers and developers seeking an efficient model for downstream tasks.

Key features:

Architecture: GPT-2-based model with 3 attention heads and 3 layers.
Embedding size: 288 parameters.
Context size: 2048 tokens, allowing for extended input/output sequences.
Pretrained on: Wikimedia/Wikipedia subset "20231101.es" (Spanish text corpus).
Parameters: 4 million FP32 parameters.
Batch size: 32.
Training environment: 1 epoch on a Kaggle P100 GPU.
Tokenizer: Custom WordPiece tokenizer "ST3" that generates tokens with "##" as a prefix for subword units.

Intended use

ST3 is not a highly powerful or fully functional model compared to larger transformer models but can be used for:

Quick fine-tuning on small datasets.
Research purposes to test new ideas.
Educational and experimentation purposes.

This model has not been fine-tuned or evaluated with performance metrics as it’s not designed for state-of-the-art tasks.

Usage

To use the ST3 model, you can follow this example:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("BueormLLC/ST3")
model = AutoModelForCausalLM.from_pretrained("BueormLLC/ST3")

def clean_wordpiece_tokens(text):
    return text.replace(" ##", "").replace("##", "")

input_text = "Esto es un ejemplo"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(inputs.input_ids, max_length=2048, num_return_sequences=1)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
cleaned_text = clean_wordpiece_tokens(generated_text)

print(cleaned_text)

Explanation

The ST3 tokenizer uses the WordPiece algorithm, which generates tokens prefixed with "##" to indicate subword units. The provided clean_wordpiece_tokens function removes these prefixes, allowing for cleaner output text.

Limitations

Performance: ST3 lacks the power of larger models and may not perform well on complex language tasks.
No evaluation: The model hasn’t been benchmarked with metrics.
Not suitable for production use without further fine-tuning.

Training details

Dataset: Wikimedia/Wikipedia subset "20231101.es".
Number of layers: 3.
Number of attention heads: 3.
Embedding size: 288.
Parameters: 4 million.
Training: The model was trained for one epoch with a batch size of 32 on a P100 GPU provided by Kaggle.

Developer and publisher

Developed by: BueormAI.
Published by: BueormLLC.

Acknowledgments

Thank you for using ST3! Your feedback and support are appreciated as we continue to develop and improve our models.

If you find this model useful and would like to support further development, please consider making a donation to:

Contributions to this project are always welcome!

BueormLLC
/

ST3