Model Card
This repository contains an LSTM-based next word prediction model implemented in PyTorch. The model utilizes advanced techniques including an extra fully connected layer with ReLU and dropout, layer normalization, label smoothing loss, gradient clipping, and learning rate scheduling to improve performance. It also uses SentencePiece for subword tokenization.
Model Details
Model Description
The LSTM Next Word Predictor is designed to predict the next word or subword given an input sentence. The model is trained on a dataset provided in CSV format (with a 'data' column) and uses an LSTM network with many enhancements.
- Developed by: Aarohan Verma
- Model type: LSTM-based Next Word Prediction
- Language(s) (NLP): English
- License: Apache-2.0
Model Sources
- Repository: https://huggingface.co/aarohanverma/lstm-next-word-predictor
- Demo: LSTM Next Word Predictor Demo
Uses
Direct Use
This model can be directly used for next word prediction in text autocompletion.
Downstream Use
The model can be fine-tuned for related tasks such as:
- Text generation.
- Language modeling for specific domains.
Out-of-Scope Use
This model is not suitable for:
- Tasks requiring deep contextual understanding beyond next-word prediction.
- Applications where transformer-based architectures are preferred for longer contexts.
- Sensitive applications where data bias could lead to unintended outputs.
Risks and Limitations
- Risks: Inaccurate or unexpected predictions may occur if the input context is too complex or ambiguous.
- Limitations: The model’s performance is bounded by the size and quality of the training data as well as the inherent limitations of LSTM architectures in modeling long-range dependencies.
Recommendations
Users should be aware of the above limitations and conduct appropriate evaluations before deploying the model in production. Consider further fine-tuning or additional data preprocessing if the model is applied in sensitive contexts.
How to Get Started with the Model
Use the code below to get started with the model.
To run the model, follow these steps:
Training:
- Ensure you have a CSV file with a column named
data
containing your training sentences. - Run training with:
python next_word_prediction.py --data_path data.csv --train
- This will train the model, save a checkpoint (
best_model.pth
), and export a TorchScript version (best_model_scripted.pt
).
- Ensure you have a CSV file with a column named
Inference:
- To predict the next word, run:
python next_word_prediction.py --inference "Your partial sentence"
- The model will output the top predicted word or subword.
- To predict the next word, run:
Training Details
Training Data
- Data Source: CSV file with a column
data
containing sentences. - Preprocessing: Uses SentencePiece for subword tokenization.
- Dataset: The training and validation datasets are split based on a user-defined ratio.
Training Procedure
- Preprocessing: Tokenization using a SentencePiece model.
- Training Hyperparameters:
- Batch Size: Configurable via
--batch_size
(default: 512) - Learning Rate: Configurable via
--learning_rate
(default: 0.001) - Epochs: Configurable via
--num_epochs
(default: 25) - LSTM Parameters: Configurable number of layers, dropout, and hidden dimensions.
- Label Smoothing: Applied with a configurable factor (default: 0.1)
- Optimization: Uses Adam optimizer with weight decay and gradient clipping.
- Learning Rate Scheduling: ReduceLROnPlateau scheduler based on validation loss.
- Batch Size: Configurable via
Speeds, Sizes, Times
- Checkpoint and TorchScript models are saved during training for later inference.
Evaluation
Testing Data, Factors & Metrics
- Testing Data: Derived from the same CSV, split from the training data.
- Metrics: Primary metric is the loss (with label smoothing), with qualitative evaluation based on next-word accuracy.
- Factors: Evaluations may vary based on sentence length and dataset diversity.
Summary
- The model demonstrates promising performance on next word prediction tasks; however, quantitative results (e.g., accuracy, loss) should be validated on your specific dataset.
Model Examination
- Interpretability techniques such as examining predicted token distributions can be applied to further understand model behavior.
Technical Specifications
Model Architecture and Objective
- Architecture: LSTM-based network with enhancements such as an extra fully connected layer, dropout, and layer normalization.
- Objective: Predict the next word/subword given a sequence of tokens.
Citation
BibTeX:
@misc {aarohan_verma_2025,
author = { {Aarohan Verma} },
title = { lstm-next-word-predictor },
year = 2025,
url = { https://huggingface.co/aarohanverma/lstm-next-word-predictor },
doi = { 10.57967/hf/4882 },
publisher = { Hugging Face }
}
Model Card Contact
For inquiries or further information, please contact:
LinkedIn: https://www.linkedin.com/in/aarohanverma/
Email: [email protected]