apcl
/

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Jam-CGPT

Jam-CGPT is a GPT2-like model that follows jam's pretraining procedure to pretrain models ranging from 38 million to 350 million parameters and finetuning with comments generated by GPT-3.5 and data size ranging from 170k to 2.15m.

Jam-CGPT Training Details

  • We follow jam's pretraining procedure and use the same data to pretrain our 38m, 110m and 350m parameters models.
  • We finetune our Jam-CGPT with the summaries generated by GPT-3.5 and 4 different dataset size Jam-CGPT dataset.
  • We finetune our models for 3 epochs.
  • Our GitHub repo contains the code for reproduction using the same data.

Jam-CGPT 38 million parameters model

Hyperparameter Description Value
e embedding dimensions 512
L number of layers 4
h attention heads 4
c block size / context length 256
b batch size 64
a accumulation steps 2
d dropout 0.20
r learning rate 3e-5
y iterations 1e-5
iter number of iterations after pretraing 757,000

Jam-CGPT 110 million parameters model

Hyperparameter Description Value
e embedding dimensions 768
L number of layers 10
h attention heads 8
c block size / context length 256
b batch size 32
a accumulation steps 4
d dropout 0.20
r learning rate 3e-5
y iterations 1e-5
iter number of iterations after pretraing 762,000

Jam-CGPT 350 million parameters model

Hyperparameter Description Value
e embedding dimensions 1024
L number of layers 24
h attention heads 16
c block size / context length 256
b batch size 4
a accumulation steps 32
d dropout 0.20
r learning rate 3e-5
y weight decay 1e-5
iter iterations 272,000
  • Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
  • If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
  • We pretrained 38m and 110m models for 3 epochs.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.