Model Card

This repository contains an LSTM-based next word prediction model implemented in PyTorch. The model utilizes advanced techniques including an extra fully connected layer with ReLU and dropout, layer normalization, label smoothing loss, gradient clipping, and learning rate scheduling to improve performance. It also uses SentencePiece for subword tokenization.

Model Details

Model Description

The LSTM Next Word Predictor is designed to predict the next word or subword given an input sentence. The model is trained on a dataset provided in CSV format (with a 'data' column) and uses an LSTM network with many enhancements.

  • Developed by: Aarohan Verma
  • Model type: LSTM-based Next Word Prediction
  • Language(s) (NLP): English
  • License: Apache-2.0

Model Sources

Uses

Direct Use

This model can be directly used for next word prediction in text autocompletion.

Downstream Use

The model can be fine-tuned for related tasks such as:

  • Text generation.
  • Language modeling for specific domains.

Out-of-Scope Use

This model is not suitable for:

  • Tasks requiring deep contextual understanding beyond next-word prediction.
  • Applications where transformer-based architectures are preferred for longer contexts.
  • Sensitive applications where data bias could lead to unintended outputs.

Risks and Limitations

  • Risks: Inaccurate or unexpected predictions may occur if the input context is too complex or ambiguous.
  • Limitations: The model’s performance is bounded by the size and quality of the training data as well as the inherent limitations of LSTM architectures in modeling long-range dependencies.

Recommendations

Users should be aware of the above limitations and conduct appropriate evaluations before deploying the model in production. Consider further fine-tuning or additional data preprocessing if the model is applied in sensitive contexts.

How to Get Started with the Model

Use the code below to get started with the model.

To run the model, follow these steps:

  1. Training:

    • Ensure you have a CSV file with a column named data containing your training sentences.
    • Run training with:
      python next_word_prediction.py --data_path data.csv --train
      
    • This will train the model, save a checkpoint (best_model.pth), and export a TorchScript version (best_model_scripted.pt).
  2. Inference:

    • To predict the next word, run:
      python next_word_prediction.py --inference "Your partial sentence"
      
    • The model will output the top predicted word or subword.

Training Details

Training Data

  • Data Source: CSV file with a column data containing sentences.
  • Preprocessing: Uses SentencePiece for subword tokenization.
  • Dataset: The training and validation datasets are split based on a user-defined ratio.

Training Procedure

  • Preprocessing: Tokenization using a SentencePiece model.
  • Training Hyperparameters:
    • Batch Size: Configurable via --batch_size (default: 512)
    • Learning Rate: Configurable via --learning_rate (default: 0.001)
    • Epochs: Configurable via --num_epochs (default: 25)
    • LSTM Parameters: Configurable number of layers, dropout, and hidden dimensions.
    • Label Smoothing: Applied with a configurable factor (default: 0.1)
    • Optimization: Uses Adam optimizer with weight decay and gradient clipping.
    • Learning Rate Scheduling: ReduceLROnPlateau scheduler based on validation loss.

Speeds, Sizes, Times

  • Checkpoint and TorchScript models are saved during training for later inference.

Evaluation

Testing Data, Factors & Metrics

  • Testing Data: Derived from the same CSV, split from the training data.
  • Metrics: Primary metric is the loss (with label smoothing), with qualitative evaluation based on next-word accuracy.
  • Factors: Evaluations may vary based on sentence length and dataset diversity.

Summary

  • The model demonstrates promising performance on next word prediction tasks; however, quantitative results (e.g., accuracy, loss) should be validated on your specific dataset.

Model Examination

  • Interpretability techniques such as examining predicted token distributions can be applied to further understand model behavior.

Technical Specifications

Model Architecture and Objective

  • Architecture: LSTM-based network with enhancements such as an extra fully connected layer, dropout, and layer normalization.
  • Objective: Predict the next word/subword given a sequence of tokens.

Citation

BibTeX:

@misc {aarohan_verma_2025,
    author       = { {Aarohan Verma} },
    title        = { lstm-next-word-predictor },
    year         = 2025,
    url          = { https://huggingface.co/aarohanverma/lstm-next-word-predictor },
    doi          = { 10.57967/hf/4882 },
    publisher    = { Hugging Face }
}

Model Card Contact

For inquiries or further information, please contact:

LinkedIn: https://www.linkedin.com/in/aarohanverma/

Email: [email protected]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train aarohanverma/lstm-next-word-predictor