mega-ar-small-4096-NC-minipile-v1

65M parameter MEGA autoregressive model initialized from scratch and trained on:

  1. pszemraj/simple_wikipedia_LM
  2. JeanKaddour/minipile

It achieves the following results on the evaluation set:

  • Loss: 3.7502
  • Accuracy: 0.3650

eval

initial 'get the feet wet':

hf-causal-experimental (pretrained=pszemraj/mega-ar-small-4096-sw_minipile,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16

Task Version Metric Value Stderr
arc_easy 0 acc 0.3173 ± 0.0096
acc_norm 0.3022 ± 0.0094
boolq 1 acc 0.4107 ± 0.0086
lambada_openai 0 ppl 6843.1824 ± 295.0792
acc 0.0155 ± 0.0017
openbookqa 0 acc 0.1220 ± 0.0147
acc_norm 0.2480 ± 0.0193
piqa 0 acc 0.5609 ± 0.0116
acc_norm 0.5566 ± 0.0116
winogrande 0 acc 0.5059 ± 0.0141

still some ways to go.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 80085
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Framework versions

  • Transformers 4.33.1
  • Pytorch 2.2.0.dev20230907+cu118
  • Datasets 2.14.5
  • Tokenizers 0.13.3
Downloads last month
35
Safetensors
Model size
64.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/mega-ar-small-4096-sw_minipile

Finetuned
(1)
this model

Datasets used to train pszemraj/mega-ar-small-4096-sw_minipile