File size: 4,025 Bytes
17f975c b1e1d29 17f975c 9862717 17f975c b1e1d29 17f975c 9862717 17f975c 9862717 6d6296c 9862717 17f975c 9862717 17f975c 9862717 17f975c 9862717 6d6296c 17f975c 9862717 17f975c 9862717 17f975c 9862717 17f975c 9862717 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
---
library_name: transformers
tags:
- llm.c
license: mit
datasets:
- HuggingFaceFW/fineweb-edu
- teknium/OpenHermes-2.5
language:
- en
pipeline_tag: text-generation
---
# Model Card for llm.c GPT2_350M
## Instruction Pretraining: Fineweb-edu 10B interleaved with OpenHermes 2.5
<!-- Provide a quick summary of what the model is/does. -->
![Loss](loss_curve.png)
## Model Details
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import pipeline
p = pipeline("text-generation", "jrahn/gpt2_350M_edu_hermes")
# instruction following
p("<|im_start|>user\nTeach me to fish.<|im_end|>\n<|im_start|>assistant\n", max_lenght=128)
#[{'generated_text': '<|im_start|>user\nTeach me to fish.<|im_end|>\n<|im_start|>assistant\nTo fish, you can start by learning the basics of fishing. First, you need to learn how to catch fish. Fish are a type of fish that are found in the ocean. They are also known as sea fish. They are a type of fish that are found in the ocean. They are a type of fish that are found in the ocean. They are a type of fish that are found in the ocean. They are a type of fish that are found in the ocean'}]
# text completion
p("In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. ", max_length=128)
# [{'generated_text': 'In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. \nThe researchers believe that the animals were able to communicate with each other by using a unique vocalization system. The researchers believe that the animals were able to communicate with each other by using a unique vocalization system.\nThe researchers believe that the animals were able to communicate with each other by using a unique vocalization system. The researchers believe that the animals were able to communicate with each other by using a unique'}]
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Datasets used: Fineweb-Edu 10B + OpenHermes 2.5 (chatml)
Dataset proportions:
- Part 1: FWE 4,836,050 + OH 100,000 (2.03%) = 4,936,050
- Part 2: FWE 4,336,051 + OH 400,000 (8.45%) = 4,736,051
- Part 3: FWE 500,000 + OH 501,551 (50.08%) = 1,001,551
Total documents: 10,669,024
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
- Fineweb-Edu: none, just the "text" feature
- OpenHermes 2.5: applied ChatML prompt template to "conversations" to create the "text" feature
#### Training Hyperparameters
- **Training regime:**
- bf16
- per device batch size 16, global batch size 524288, gradient accumulation 16
- zero stage 1
- lr 3e-4, cosine schedule, 700 warmup steps
- more details see [run script](run_gpt2_350M_edu_hermes.sh)
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
Params: 355M -> Checkpoint: 710MB
Tokens: ~10B
Total training time: 30hrs
Hardware: 2x RTX4090
MFU: 71% (110,000 tok/s)
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Results
HellaSwag: 34.4
- more details see [main.log](main.log)
## Technical Specifications [optional]
### Model Architecture and Objective
GTP2 350M, Causal Language Modeling
### Compute Infrastructure
#### Hardware
2x RTX4090
#### Software
[llm.c](https://github.com/karpathy/llm.c)
|