Model description

Based on Jellywibble/dalio-pretrained-book-bs4-seed1 which was pre-trained on the Dalio Principles Book Finetuned on handwritten conversations Jellywibble/dalio_handwritten-conversations

Dataset Used

Jellywibble/dalio_handwritten-conversations

Training Parameters

  • Deepspeed on 4xA40 GPUs
  • Ensuring EOS token <s> appears only at the beginning of each 'This is a conversation where Ray ...'
  • Gradient Accumulation steps = 1 (Effective batch size of 4)
  • 2e-6 Learning Rate, AdamW optimizer
  • Block size of 1000
  • Trained for 1 Epoch (additional epochs yielded worse Hellaswag result)

Metrics

Downloads last month
17
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.