apcl
/

Jam-CGPT

Model card Files Files and versions Community

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Jam-CGPT

Jam-CGPT is a GPT2-like model that follows jam's pretraining procedure to pretrain models ranging from 38 million to 350 million parameters and finetuning with comments generated by GPT-3.5 and data size ranging from 170k to 2.15m.

Jam-CGPT Training Details

We follow jam's pretraining procedure and use the same data to pretrain our 38m, 110m and 350m parameters models.
We finetune our Jam-CGPT with the summaries generated by GPT-3.5 and 4 different dataset size Jam-CGPT dataset.
We finetune our models for 3 epochs.
Our GitHub repo contains the code for reproduction using the same data.

Jam-CGPT 38 million parameters model

Hyperparameter	Description	Value
e	embedding dimensions	512
L	number of layers	4
h	attention heads	4
c	block size / context length	256
b	batch size	64
a	accumulation steps	2
d	dropout	0.20
r	learning rate	3e-5
y	iterations	1e-5
iter	number of iterations after pretraing	757,000

Jam-CGPT 110 million parameters model

Hyperparameter	Description	Value
e	embedding dimensions	768
L	number of layers	10
h	attention heads	8
c	block size / context length	256
b	batch size	32
a	accumulation steps	4
d	dropout	0.20
r	learning rate	3e-5
y	iterations	1e-5
iter	number of iterations after pretraing	762,000

Jam-CGPT 350 million parameters model

Hyperparameter	Description	Value
e	embedding dimensions	1024
L	number of layers	24
h	attention heads	16
c	block size / context length	256
b	batch size	4
a	accumulation steps	32
d	dropout	0.20
r	learning rate	3e-5
y	weight decay	1e-5
iter	iterations	272,000

Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
We pretrained 38m and 110m models for 3 epochs.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.