mpt-125m-c4

Model Description

Pretrained model for MPT-125M trained on C4 dataset

Training data

Trained on HuggingFace C4 dataset

Training procedure

This model was trained on C4 for ~2.5B tokens. Training time was ~1 hour with 104 A100-40gb GPUs.

Intended Use and Limitations

This model is primarily for generating texts from a prompt. The purpose is to explore pretraining models for research.

Downloads last month
3,419
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Model tree for wtang06/mpt-125m-c4

Quantizations
1 model

Dataset used to train wtang06/mpt-125m-c4