Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
5.12.0
metadata
title: GPT From Scratch
emoji: ⚡
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.4.0
app_file: app.py
pinned: false
license: mit
GPT from scratch
This repo contains code to train a GPT from scratch. The dataset is taken from the RedPajama 1 trillion data. Only samples from this are taken and used for the training purposes. The implementation of the transformer is similar to the LitGPT.
The trained model has a parameter count of about 160M. The final training loss was found to be 3.2154.
The training details can be found in the attached notebooks. The initial training was stopped when the loss was around 4.
Using the checkpoint, the training was resumed and stopped when it went below 3.5.
Github link - https://github.com/mkthoma/gpt_from_scratch