File size: 6,034 Bytes
ef71605 ca965c6 ef71605 ca965c6 c38deaa ca965c6 c38deaa ca965c6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
license: mit
pipeline_tag: text-generation
library_name: transformers
language: [
'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
]
datasets:
# core - base
- ontocord/fineweb-permissive-multilingual-2m
- distily/c4_multilingual_1M
- data-silence/sumnews
- xu-song/cc100-samples
- badrex/llm-emoji-dataset
- fblgit/simple-math
- Gusarich/math-expressions-1m
- neuralwork/arxiver
- christopher/rosetta-code
- nampdn-ai/tiny-codes
- JeanKaddour/minipile
# core - instruct
- NousResearch/hermes-function-calling-v1
- simplescaling/s1K-1.1
# base - instruct
- mlabonne/open-perfectblend
- allenai/tulu-3-sft-mixture
- rombodawg/Everything_Instruct_Multilingual
# base - reason
- open-r1/OpenR1-Math-220k
- open-thoughts/OpenThoughts-114k
- cognitivecomputations/dolphin-r1
- simplescaling/s1K-1.1
tags:
- chat
- core
- base
- instruct
- reason
---
# tangled-alpha-0.11-core

```bash
time python -B prepare_core_datasets.py
```
```
i=0, min_len=0, max_len=1073741824, block_size=1025, chunk_size=16400000, len(dataset)=10913927, len(dataset) * block_size=11186775175
Total number of tokens in the optimized dataset '../core-data-0-0-1073741824-1025-16000' is 11186775175
i=1, min_len=1025, max_len=2049, block_size=2049, chunk_size=16392000, len(dataset)=893465, len(dataset) * block_size=1830709785
Total number of tokens in the optimized dataset '../core-data-1-1025-2049-2049-8000' is 1830709785
i=2, min_len=2049, max_len=4097, block_size=4097, chunk_size=16388000, len(dataset)=375104, len(dataset) * block_size=1536801088
Total number of tokens in the optimized dataset '../core-data-2-2049-4097-4097-4000' is 1536801088
i=3, min_len=4097, max_len=8193, block_size=8193, chunk_size=16386000, len(dataset)=177522, len(dataset) * block_size=1454437746
Total number of tokens in the optimized dataset '../core-data-3-4097-8193-8193-2000' is 1454437746
i=4, min_len=8193, max_len=16385, block_size=16385, chunk_size=16385000, len(dataset)=77725, len(dataset) * block_size=1273524125
Total number of tokens in the optimized dataset '../core-data-4-8193-16385-16385-1000' is 1273524125
i=5, min_len=16385, max_len=32769, block_size=32769, chunk_size=16384500, len(dataset)=22931, len(dataset) * block_size=751425939
Total number of tokens in the optimized dataset '../core-data-5-16385-32769-32769-500' is 751425939
i=6, min_len=32769, max_len=65537, block_size=65537, chunk_size=16384250, len(dataset)=4988, len(dataset) * block_size=326898556
Total number of tokens in the optimized dataset '../core-data-6-32769-65537-65537-250' is 326898556
i=7, min_len=65537, max_len=131073, block_size=131073, chunk_size=16384125, len(dataset)=1137, len(dataset) * block_size=149030001
Total number of tokens in the optimized dataset '../core-data-7-65537-131073-131073-125' is 149030001
42G ../core-data-0-0-1073741824-1025-16000
6.9G ../core-data-1-1025-2049-2049-8000
5.8G ../core-data-2-2049-4097-4097-4000
5.5G ../core-data-3-4097-8193-8193-2000
4.8G ../core-data-4-8193-16385-16385-1000
2.9G ../core-data-5-16385-32769-32769-500
1.3G ../core-data-6-32769-65537-65537-250
573M ../core-data-7-65537-131073-131073-125
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain_core_model_0.yaml
```
```
Seed set to 23
Time to instantiate model: 0.20 seconds.
Total parameters: 234,897,920
Verifying settings ...
Measured TFLOPs: 28077.03
Epoch 1 | iter 64 step 1 | loss train: 11.977, val: n/a | iter time: 350.96 ms (step) remaining time: 10 days, 14:14:05
Epoch 1 | iter 128 step 2 | loss train: 11.977, val: n/a | iter time: 280.36 ms (step) remaining time: 7 days, 8:25:44
Epoch 1 | iter 192 step 3 | loss train: 11.974, val: n/a | iter time: 280.80 ms (step) remaining time: 6 days, 6:28:36
Epoch 1 | iter 256 step 4 | loss train: 11.975, val: n/a | iter time: 281.44 ms (step) remaining time: 5 days, 17:28:43
Epoch 1 | iter 320 step 5 | loss train: 11.974, val: n/a | iter time: 280.13 ms (step) remaining time: 5 days, 9:40:25
Epoch 1 | iter 384 step 6 | loss train: 11.976, val: n/a | iter time: 281.50 ms (step) remaining time: 5 days, 4:26:59
Epoch 1 | iter 448 step 7 | loss train: 11.974, val: n/a | iter time: 280.34 ms (step) remaining time: 5 days, 0:43:34
Epoch 1 | iter 512 step 8 | loss train: 11.970, val: n/a | iter time: 280.74 ms (step) remaining time: 4 days, 21:55:15
Epoch 1 | iter 576 step 9 | loss train: 11.970, val: n/a | iter time: 279.90 ms (step) remaining time: 4 days, 19:44:24
Epoch 1 | iter 640 step 10 | loss train: 11.971, val: n/a | iter time: 279.74 ms (step) remaining time: 4 days, 17:59:44
# ...
```
Backup `wandb`:
```bash
mv wandb wandb-pretrain-core-0
```
Copy config:
```bash
cp ../config-0.json ../out/pretrain-core-0/final/config.json
```
Chat with model:
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core-0/final
```
```bash
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size '4' --dtype 'bfloat16' '../out/pretrain-core-0/final'
```
```
```
|