Niansuh commited on
Commit
1001802
·
verified ·
1 Parent(s): d2e2bcf

Upload 10 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ biggie_groked_int8_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Biggie_SmolLM_0.15B_Base_bf16.gguf filter=lfs diff=lfs merge=lfs -text
Biggie_SmolLM_0.15B_Base_bf16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa567e7fe3fb07549881f8cb50172dd3fd6a15ada0e8e9b007fd4a04ad254180
3
+ size 362923584
README.md ADDED
@@ -0,0 +1,368 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: HuggingFaceTB/SmolLM-135M
3
+ datasets:
4
+ - LDJnr/Capybara
5
+ inference:
6
+ parameters:
7
+ model_file: biggie_groked_int8_q8_0.gguf
8
+ temperature: 1
9
+ license: mit
10
+ ---
11
+
12
+ ### TINY Frankenstein of [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) upped to 0.18b
13
+ Use this frankenbase for training.
14
+ Sorry for the mislabelling, the model is a 0.18b 181m parameter, not 0.15.
15
+ I did not except this repo to blow up and now all the training scripts depend on it.
16
+
17
+ * ## CITE WORK FROM THIS HF PAGE AND [@cognitivecompai](https://huggingface.co/ehartford)'s OPTIMIZER ON YOUR FUTURE PAPERS OR I WILL DRAG YOUR ORG ON TWITTER LIKE I DID WITH COHERE LOL (we're cool now btw, visited them :)
18
+ * https://github.com/cognitivecomputations/grokadamw
19
+ * https://github.com/SakanaAI/evolutionary-model-merge/
20
+ * https://huggingface.co/blog/smollm
21
+
22
+ >>[!TIP]🐧 If you're impppatient, get the trained checkpoint file that runs on 1 cpu core:
23
+ >>
24
+ >>wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
25
+ >>
26
+ >>make sure to install latest llama.cpp first, it's easy on linux & mac:
27
+ >>
28
+ >> git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
29
+
30
+ Now for the magic trained finetune that runs at insane speeds:
31
+
32
+ The settings are very finicky so be careful with your experimentation
33
+ ```verilog
34
+ ./llama-cli -fa -b 512 -ctv q8_0 -ctk q8_0 --min-p 0.3 --top-p 0.85 --keep -1 \
35
+ -p "You are a NASA JPL Scientists. Human: I want to bring my cat to mars." \
36
+ --in-prefix "<|im_start|>Human:" --reverse-prompt "Human:" \
37
+ -m biggie_groked_int8_q8_0.gguf -co -cnv \
38
+ -c 1024 -n 700 --temp 1.5 -ngl 0 -t 1
39
+ ```
40
+ Yup, that's no gpu, 1 cpu core.
41
+
42
+ This base model was built one via semi-automated continuous merging to figure out the recipe.
43
+ Model is more coherent.
44
+
45
+ The temperature settings and min p etc need to be adjusted but even at default temp0 it was coherent for first 100 tokens.
46
+ Amazing option for further training. And this is a merge of the base, not the instruct!
47
+
48
+ ## 🧠 What's Really Going Down Here?
49
+
50
+ We're talking about a convergence of whole bunch of stuff, more papers will be written about this:
51
+
52
+ 1. **Evolutionary Merging**:
53
+ 2. **BitNet Integration**:
54
+ 4. **Experimental GrokAdamW Optimizer**:
55
+
56
+ ## Prior work, from last week
57
+
58
+ Credits for optimizer go to [@cognitivecompai](https://github.com/cognitivecomputations/grokadamw) for laying the groundwork with the original GrokAdamW optimizer.
59
+
60
+ ## LETS TRY OUT THE EXPERIMENTAL GROKKED FINETUNE:
61
+
62
+ ```bash
63
+ wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
64
+ ```
65
+
66
+ Yes we will be talking with a 164mb file that runs at 160 tokens per second on a single cpu core
67
+ ## you read all of that correctly yes, 1 cpu core 160 tps https://x.com/nisten/status/1819752034305970649
68
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/nTNISjByBkN7bJZzuOvOw.png)
69
+
70
+ ## 🚀 run it with NO GPU and only one CPU core it with these settings
71
+ ```bash
72
+ ./llama-cli -n -1 -fa -b 512 -ctv q8_0 -ctk q8_0 -fa --min-p 0.3 --top-p 0.85 --keep -1 -p "You are a NASA JPL Scientists. Human: I want to bring my cat to mars." -m biggie_groked_int8_q8_0.gguf -co -cnv --in-prefix "<|im_start|>Human:" --reverse-prompt "Human:" -c 1024 -n 512 --temp 1.5 -ngl 0
73
+ ```
74
+
75
+
76
+ ## 🏋️ Training Tutorial, MAKE YOUR OWN BIGGIE_SMOlLM
77
+
78
+
79
+ Clone the repo like you're stealing code from the future:
80
+ ```bash
81
+ git clone https://github.com/nisten/grokadamw
82
+ cd grokadamw
83
+ ```
84
+
85
+ Fire up the training script and watch the magic happen:
86
+ ```bash
87
+ python smoltrainer.py
88
+ ```
89
+
90
+ ## 💻 Do it from scratch yourself
91
+ Install the secret sauce (dependencies):
92
+ ```bash
93
+ pip install torch transformers datasets tqdm
94
+ ```
95
+
96
+ make a file named meow.py , copy paste in this code, and then run it ```python meow.py```
97
+
98
+ ```python
99
+ import torch
100
+ import torch.nn as nn
101
+ import logging
102
+ from datasets import load_dataset, Dataset
103
+ from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForLanguageModeling
104
+ from torch.cuda.amp import autocast
105
+ import warnings
106
+ from tqdm import tqdm
107
+
108
+ warnings.filterwarnings("ignore", category=FutureWarning)
109
+ warnings.filterwarnings("ignore", category=UserWarning)
110
+
111
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
112
+ logger = logging.getLogger(__name__)
113
+
114
+ MODEL_NAME = "nisten/Biggie-SmoLlm-0.15B-Base"
115
+ MAX_LENGTH = 2048
116
+ BATCH_SIZE = 8
117
+ LEARNING_RATE = 2e-4
118
+ MAX_STEPS = 3000
119
+ GRADIENT_ACCUMULATION_STEPS = 2
120
+ NUM_WARMUP_STEPS = 30
121
+ OUTPUT_DIR = "./capybara_finetuned_results"
122
+
123
+ torch.backends.cuda.matmul.allow_tf32 = True
124
+ torch.backends.cudnn.allow_tf32 = True
125
+
126
+ class GrokAdamW(torch.optim.Optimizer):
127
+ def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=1e-2,
128
+ alpha_init=0.98, lamb=2.0, gamma=0.1, grokking_signal_fns=None,
129
+ grokking_signal_decay_rate=0.1, gradient_clipping=1.0):
130
+ defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay,
131
+ alpha_init=alpha_init, lamb=lamb, gamma=gamma,
132
+ grokking_signal_fns=grokking_signal_fns,
133
+ grokking_signal_decay_rate=grokking_signal_decay_rate,
134
+ gradient_clipping=gradient_clipping)
135
+ super(GrokAdamW, self).__init__(params, defaults)
136
+
137
+ @torch.no_grad()
138
+ def step(self, closure=None):
139
+ loss = None
140
+ if closure is not None:
141
+ with torch.enable_grad():
142
+ loss = closure()
143
+
144
+ for group in self.param_groups:
145
+ grokking_signal = self._compute_grokking_signal(group)
146
+ for i, p in enumerate(group['params']):
147
+ if p.grad is None:
148
+ continue
149
+ grad = p.grad
150
+
151
+ if group['gradient_clipping'] > 0:
152
+ grad = torch.clamp(grad, -group['gradient_clipping'], group['gradient_clipping'])
153
+
154
+ state = self.state[p]
155
+
156
+ if len(state) == 0:
157
+ state['step'] = 0
158
+ state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
159
+ state['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)
160
+ state['grok_ema'] = torch.zeros_like(p, memory_format=torch.preserve_format)
161
+
162
+ exp_avg, exp_avg_sq, grok_ema = state['exp_avg'], state['exp_avg_sq'], state['grok_ema']
163
+ beta1, beta2 = group['betas']
164
+
165
+ state['step'] += 1
166
+
167
+ layer_beta1 = beta1 * (1 - group['gamma'])**i
168
+
169
+ alpha = group['alpha_init'] * torch.exp(torch.tensor(-group['grokking_signal_decay_rate'] * grokking_signal))
170
+ grok_ema.mul_(alpha).add_(grad, alpha=1 - alpha)
171
+ grok_grad = grad + group['lamb'] * grok_ema
172
+
173
+ exp_avg.mul_(layer_beta1).add_(grok_grad, alpha=1 - layer_beta1)
174
+ exp_avg_sq.mul_(beta2).addcmul_(grok_grad, grok_grad, value=1 - beta2)
175
+
176
+ denom = exp_avg_sq.sqrt().add_(group['eps'])
177
+ step_size = group['lr']
178
+
179
+ if group['weight_decay'] != 0:
180
+ p.data.mul_(1 - group['lr'] * group['weight_decay'])
181
+
182
+ p.addcdiv_(exp_avg, denom, value=-step_size)
183
+
184
+ return loss
185
+
186
+ def _compute_grokking_signal(self, group):
187
+ if group['grokking_signal_fns'] is None:
188
+ return 0.0
189
+
190
+ signals = []
191
+ for fn in group['grokking_signal_fns']:
192
+ try:
193
+ signal = fn()
194
+ if signal is not None:
195
+ signals.append(signal)
196
+ except Exception as e:
197
+ logger.warning(f"Error in grokking_signal_fn: {e}. Ignoring this function.")
198
+
199
+ if not signals:
200
+ return 0.0
201
+
202
+ return sum(signals) / len(signals)
203
+
204
+ def format_capybara_prompts(examples):
205
+ texts = []
206
+ for conversation in examples['conversation']:
207
+ formatted_text = ""
208
+ for turn in conversation:
209
+ if 'input' in turn:
210
+ formatted_text += f"Human: {turn['input']}\n\n"
211
+ if 'output' in turn:
212
+ formatted_text += f"Assistant: {turn['output']}\n\n"
213
+ texts.append(formatted_text.strip())
214
+ return {"text": texts}
215
+
216
+ class CustomTrainer(Trainer):
217
+ def __init__(self, *args, **kwargs):
218
+ super().__init__(*args, **kwargs)
219
+ self.grokking_signal = 0.0
220
+
221
+ def compute_loss(self, model, inputs, return_outputs=False):
222
+ labels = inputs.pop("labels")
223
+ outputs = model(**inputs)
224
+ logits = outputs.logits
225
+ shift_logits = logits[..., :-1, :].contiguous()
226
+ shift_labels = labels[..., 1:].contiguous()
227
+ loss_fct = nn.CrossEntropyLoss()
228
+ loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
229
+ return (loss, outputs) if return_outputs else loss
230
+
231
+ def training_step(self, model, inputs):
232
+ model.train()
233
+ inputs = self._prepare_inputs(inputs)
234
+
235
+ with autocast(dtype=torch.bfloat16):
236
+ loss = self.compute_loss(model, inputs)
237
+
238
+ if self.args.gradient_accumulation_steps > 1:
239
+ loss = loss / self.args.gradient_accumulation_steps
240
+
241
+ loss.backward()
242
+
243
+ self.grokking_signal = loss.item()
244
+
245
+ return loss.detach()
246
+
247
+ def grokking_signal_fn():
248
+ return trainer.grokking_signal
249
+
250
+ def main():
251
+ logger.info(f"🚀 Initializing {MODEL_NAME} finetuning with GrokAdamW")
252
+
253
+ try:
254
+ config = AutoConfig.from_pretrained(MODEL_NAME)
255
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
256
+ model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16)
257
+ except Exception as e:
258
+ logger.error(f"❌ Failed to load model or tokenizer: {str(e)}")
259
+ return
260
+
261
+ if tokenizer.pad_token is None:
262
+ tokenizer.pad_token = tokenizer.eos_token
263
+ model.config.pad_token_id = model.config.eos_token_id
264
+
265
+ logger.info("📚 Loading Capybara dataset")
266
+ try:
267
+ capybara_dataset = load_dataset("LDJnr/Capybara", split="train")
268
+ capybara_dataset = capybara_dataset.map(format_capybara_prompts, batched=True, remove_columns=capybara_dataset.column_names)
269
+ except Exception as e:
270
+ logger.error(f"❌ Failed to load Capybara dataset: {str(e)}")
271
+ return
272
+
273
+ logger.info(f"📊 Capybara dataset size: {len(capybara_dataset)}")
274
+
275
+ def tokenize_function(examples):
276
+ return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=MAX_LENGTH)
277
+
278
+ logger.info("🔢 Tokenizing dataset")
279
+ tokenized_dataset = capybara_dataset.map(tokenize_function, batched=True, remove_columns=capybara_dataset.column_names)
280
+
281
+ logger.info("🏋️ Setting up the training arguments")
282
+ training_args = TrainingArguments(
283
+ output_dir=OUTPUT_DIR,
284
+ num_train_epochs=3,
285
+ per_device_train_batch_size=BATCH_SIZE,
286
+ gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
287
+ learning_rate=LEARNING_RATE,
288
+ weight_decay=0.01,
289
+ bf16=True,
290
+ logging_steps=10,
291
+ save_steps=300,
292
+ save_total_limit=10,
293
+ dataloader_num_workers=4,
294
+ warmup_steps=NUM_WARMUP_STEPS,
295
+ gradient_checkpointing=True,
296
+ evaluation_strategy="steps",
297
+ eval_steps=300,
298
+ max_steps=MAX_STEPS,
299
+ fp16=False,
300
+ optim="adamw_hf",
301
+ lr_scheduler_type="cosine",
302
+ load_best_model_at_end=True,
303
+ metric_for_best_model="loss",
304
+ greater_is_better=False,
305
+ )
306
+
307
+ data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
308
+
309
+ optimizer = GrokAdamW(
310
+ model.parameters(),
311
+ lr=LEARNING_RATE,
312
+ betas=(0.9, 0.999),
313
+ eps=1e-8,
314
+ weight_decay=0.01,
315
+ alpha_init=0.98,
316
+ lamb=2.0,
317
+ gamma=0.1,
318
+ grokking_signal_fns=[grokking_signal_fn],
319
+ grokking_signal_decay_rate=0.1,
320
+ gradient_clipping=1.0
321
+ )
322
+
323
+ logger.info("🏃‍♂️ Initializing Trainer with GrokAdamW")
324
+ global trainer
325
+ trainer = CustomTrainer(
326
+ model=model,
327
+ args=training_args,
328
+ train_dataset=tokenized_dataset,
329
+ eval_dataset=tokenized_dataset.select(range(min(1000, len(tokenized_dataset)))),
330
+ data_collator=data_collator,
331
+ optimizers=(optimizer, None),
332
+ )
333
+
334
+ logger.info("🔥 Starting the training with GrokAdamW")
335
+ try:
336
+ trainer.train()
337
+ except Exception as e:
338
+ logger.error(f"❌ Training failed: {str(e)}")
339
+ return
340
+
341
+ logger.info("💾 Saving the model")
342
+ try:
343
+ trainer.save_model(OUTPUT_DIR)
344
+ except Exception as e:
345
+ logger.error(f"❌ Failed to save model: {str(e)}")
346
+
347
+ logger.info("🎉 Finetuning with GrokAdamW completed!")
348
+
349
+ if __name__ == "__main__":
350
+ main()
351
+ ```
352
+ 🚀 Now go forth and train, accelerate that code!
353
+
354
+ > **Note:** You'll need about 14GB of VRAM. If you have 8GB, change to batch size 4.
355
+
356
+ Results will appear in `./capybara_finetuned_results`
357
+
358
+ ---
359
+
360
+ ### Author
361
+
362
+ **Nisten Tahiraj**
363
+ 🏢 [rakun.ai](https://rakun.ai)
364
+ 📍 Toronto, Canada
365
+
366
+ ---
367
+ Happy training!
368
+ <video controls autoplay muted src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/WCLhKzZWbrLo8BETGaKvI.qt"></video>
biggie_groked_int8_q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c94683f4bda200ab06337ec237ab7953012d428b449e085563cd7a7a2b865658
3
+ size 163636960
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/Users/n/hf/SmolLM-135M",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 0,
9
+ "eos_token_id": 0,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 576,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "max_position_embeddings": 2048,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 9,
18
+ "num_hidden_layers": 35,
19
+ "num_key_value_heads": 3,
20
+ "pretraining_tp": 1,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_scaling": null,
23
+ "rope_theta": 10000.0,
24
+ "tie_word_embeddings": true,
25
+ "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.43.3",
27
+ "use_cache": true,
28
+ "vocab_size": 49152
29
+ }
gitattributes ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ smolbiggie_bf16.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Biggie_SmolLM_0.15B_Base_bf16.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Biggie_SmolLM_0.15B_Base_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
39
+ biggie-smollm-checkpoint-twitter-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
40
+ biggie_groked_int8_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
41
+ old-biggie-smollm-checkpoint-twitter-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8868a6657a83f51c2b8d5d88cf48f96cdf0631feed5582a152cbcc77cfae4299
3
+ size 269149770
special_tokens_map.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|im_start|>",
5
+ "<|im_end|>",
6
+ "<repo_name>",
7
+ "<reponame>",
8
+ "<file_sep>",
9
+ "<filename>",
10
+ "<gh_stars>",
11
+ "<issue_start>",
12
+ "<issue_comment>",
13
+ "<issue_closed>",
14
+ "<jupyter_start>",
15
+ "<jupyter_text>",
16
+ "<jupyter_code>",
17
+ "<jupyter_output>",
18
+ "<jupyter_script>",
19
+ "<empty_output>"
20
+ ],
21
+ "bos_token": {
22
+ "content": "<|endoftext|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "eos_token": {
29
+ "content": "<|endoftext|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ },
35
+ "unk_token": {
36
+ "content": "<|endoftext|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false
41
+ }
42
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<repo_name>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<reponame>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<file_sep>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<filename>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<gh_stars>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_start>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_comment>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<issue_closed>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_start>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_text>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_code>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<jupyter_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<jupyter_script>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<empty_output>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ }
140
+ },
141
+ "additional_special_tokens": [
142
+ "<|endoftext|>",
143
+ "<|im_start|>",
144
+ "<|im_end|>",
145
+ "<repo_name>",
146
+ "<reponame>",
147
+ "<file_sep>",
148
+ "<filename>",
149
+ "<gh_stars>",
150
+ "<issue_start>",
151
+ "<issue_comment>",
152
+ "<issue_closed>",
153
+ "<jupyter_start>",
154
+ "<jupyter_text>",
155
+ "<jupyter_code>",
156
+ "<jupyter_output>",
157
+ "<jupyter_script>",
158
+ "<empty_output>"
159
+ ],
160
+ "bos_token": "<|endoftext|>",
161
+ "clean_up_tokenization_spaces": false,
162
+ "eos_token": "<|endoftext|>",
163
+ "model_max_length": 1000000000000000019884624838656,
164
+ "tokenizer_class": "GPT2Tokenizer",
165
+ "unk_token": "<|endoftext|>",
166
+ "vocab_size": 49152
167
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff