BEE-spoke-data
/

smol_llama-220M-GQA-fineweb_edu

+---
+license: apache-2.0
+base_model: BEE-spoke-data/smol_llama-220M-GQA
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: smol_llama-220M-GQA-fineweb-edu-10BT-mincols-vN
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# smol_llama-220M-GQA-fineweb-edu-10BT-mincols-vN
+This model is a fine-tuned version of [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 2.7416
+- Accuracy: 0.4559
+- Num Input Tokens Seen: 10695475200
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 80085
+- gradient_accumulation_steps: 32
+- total_train_batch_size: 256
+- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.05
+- num_epochs: 1.0
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss | Accuracy | Input Tokens Seen |
+|:-------------:|:------:|:-----:|:---------------:|:--------:|:-----------------:|
+| 2.8567        | 0.0145 | 300   | 2.8291          | 0.4450   | 157286400         |
+| 2.8517        | 0.0291 | 600   | 2.8153          | 0.4465   | 314572800         |
+| 2.8224        | 0.0436 | 900   | 2.8025          | 0.4481   | 471859200         |
+| 2.8178        | 0.0582 | 1200  | 2.7912          | 0.4495   | 629145600         |
+| 2.8001        | 0.0727 | 1500  | 2.7832          | 0.4505   | 786432000         |
+| 2.8045        | 0.0873 | 1800  | 2.7772          | 0.4512   | 943718400         |
+| 2.8019        | 0.1018 | 2100  | 2.7729          | 0.4516   | 1101004800        |
+| 2.7995        | 0.1164 | 2400  | 2.7691          | 0.4522   | 1258291200        |
+| 2.8006        | 0.1309 | 2700  | 2.7657          | 0.4526   | 1415577600        |
+| 2.7886        | 0.1455 | 3000  | 2.7631          | 0.4528   | 1572864000        |
+| 2.7907        | 0.1600 | 3300  | 2.7606          | 0.4532   | 1730150400        |
+| 2.7907        | 0.1746 | 3600  | 2.7588          | 0.4536   | 1887436800        |
+| 2.7788        | 0.1891 | 3900  | 2.7569          | 0.4537   | 2044723200        |
+| 2.7942        | 0.2037 | 4200  | 2.7552          | 0.4540   | 2202009600        |
+| 2.793         | 0.2182 | 4500  | 2.7538          | 0.4543   | 2359296000        |
+| 2.7958        | 0.2328 | 4800  | 2.7526          | 0.4544   | 2516582400        |
+| 2.78          | 0.2473 | 5100  | 2.7515          | 0.4547   | 2673868800        |
+| 2.7937        | 0.2619 | 5400  | 2.7506          | 0.4548   | 2831155200        |
+| 2.7717        | 0.2764 | 5700  | 2.7498          | 0.4548   | 2988441600        |
+| 2.7832        | 0.2910 | 6000  | 2.7490          | 0.4548   | 3145728000        |
+| 2.768         | 0.3055 | 6300  | 2.7482          | 0.4550   | 3303014400        |
+| 2.7653        | 0.3201 | 6600  | 2.7476          | 0.4551   | 3460300800        |
+| 2.7843        | 0.3346 | 6900  | 2.7470          | 0.4551   | 3617587200        |
+| 2.7765        | 0.3492 | 7200  | 2.7464          | 0.4550   | 3774873600        |
+| 2.7778        | 0.3637 | 7500  | 2.7460          | 0.4552   | 3932160000        |
+| 2.7655        | 0.3783 | 7800  | 2.7455          | 0.4553   | 4089446400        |
+| 2.7943        | 0.3928 | 8100  | 2.7449          | 0.4554   | 4246732800        |
+| 2.7715        | 0.4074 | 8400  | 2.7447          | 0.4552   | 4404019200        |
+| 2.7828        | 0.4219 | 8700  | 2.7443          | 0.4554   | 4561305600        |
+| 2.7883        | 0.4365 | 9000  | 2.7440          | 0.4556   | 4718592000        |
+| 2.7627        | 0.4510 | 9300  | 2.7437          | 0.4556   | 4875878400        |
+| 2.7841        | 0.4656 | 9600  | 2.7435          | 0.4557   | 5033164800        |
+| 2.7734        | 0.4801 | 9900  | 2.7433          | 0.4557   | 5190451200        |
+| 2.7829        | 0.4947 | 10200 | 2.7430          | 0.4557   | 5347737600        |
+| 2.781         | 0.5092 | 10500 | 2.7429          | 0.4557   | 5505024000        |
+| 2.7757        | 0.5238 | 10800 | 2.7428          | 0.4557   | 5662310400        |
+| 2.779         | 0.5383 | 11100 | 2.7426          | 0.4559   | 5819596800        |
+| 2.7771        | 0.5529 | 11400 | 2.7425          | 0.4559   | 5976883200        |
+| 2.7828        | 0.5674 | 11700 | 2.7424          | 0.4560   | 6134169600        |
+| 2.7814        | 0.5820 | 12000 | 2.7423          | 0.4558   | 6291456000        |
+| 2.7735        | 0.5965 | 12300 | 2.7422          | 0.4559   | 6448742400        |
+| 2.7848        | 0.6111 | 12600 | 2.7420          | 0.4559   | 6606028800        |
+| 2.7748        | 0.6256 | 12900 | 2.7420          | 0.4559   | 6763315200        |
+| 2.7697        | 0.6402 | 13200 | 2.7419          | 0.4560   | 6920601600        |
+| 2.7689        | 0.6547 | 13500 | 2.7419          | 0.4560   | 7077888000        |
+| 2.7747        | 0.6692 | 13800 | 2.7419          | 0.4559   | 7235174400        |
+| 2.786         | 0.6838 | 14100 | 2.7418          | 0.4561   | 7392460800        |
+| 2.7801        | 0.6983 | 14400 | 2.7417          | 0.4560   | 7549747200        |
+| 2.7658        | 0.7129 | 14700 | 2.7417          | 0.4561   | 7707033600        |
+| 2.7717        | 0.7274 | 15000 | 2.7417          | 0.4560   | 7864320000        |
+| 2.7717        | 0.7420 | 15300 | 2.7417          | 0.4560   | 8021606400        |
+| 2.777         | 0.7565 | 15600 | 2.7417          | 0.4559   | 8178892800        |
+| 2.7793        | 0.7711 | 15900 | 2.7416          | 0.4560   | 8336179200        |
+| 2.7718        | 0.7856 | 16200 | 2.7416          | 0.4559   | 8493465600        |
+| 2.7757        | 0.8002 | 16500 | 2.7416          | 0.4560   | 8650752000        |
+| 2.7763        | 0.8147 | 16800 | 2.7416          | 0.4559   | 8808038400        |
+| 2.7581        | 0.8293 | 17100 | 2.7416          | 0.4559   | 8965324800        |
+| 2.7719        | 0.8438 | 17400 | 2.7416          | 0.4560   | 9122611200        |
+| 2.7609        | 0.8584 | 17700 | 2.7416          | 0.4560   | 9279897600        |
+| 2.7753        | 0.8729 | 18000 | 2.7416          | 0.4559   | 9437184000        |
+| 2.7674        | 0.8875 | 18300 | 2.7415          | 0.4560   | 9594470400        |
+| 2.7601        | 0.9020 | 18600 | 2.7416          | 0.4560   | 9751756800        |
+| 2.7823        | 0.9166 | 18900 | 2.7416          | 0.4560   | 9909043200        |
+| 2.7767        | 0.9311 | 19200 | 2.7416          | 0.4560   | 10066329600       |
+| 2.7759        | 0.9457 | 19500 | 2.7416          | 0.4560   | 10223616000       |
+| 2.7722        | 0.9602 | 19800 | 2.7415          | 0.4560   | 10380902400       |
+| 2.7764        | 0.9748 | 20100 | 2.7416          | 0.4560   | 10538188800       |
+| 2.7724        | 0.9893 | 20400 | 2.7416          | 0.4559   | 10695475200       |
+### Framework versions
+- Transformers 4.41.1
+- Pytorch 2.3.1+cu118
+- Datasets 2.19.1
+- Tokenizers 0.19.1

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.41.1",
+  "use_cache": false
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4637c98a44aa97e5b79fb81bacee9fefe41d8a02d45578ccf214d8937e4bfa74
 size 435736840

 version https://git-lfs.github.com/spec/v1
+oid sha256:05de9a7c82dcedc415e1d974ca7dce551dbdd4b3343fe065d1ea62d9523327e3
 size 435736840