Model save

Browse files

Files changed (5) hide show

README.md +218 -0
config.json +18 -0
generation_config.json +4 -0
model.safetensors +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,218 @@

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: reverse_add_replicate_eval30
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# reverse_add_replicate_eval30
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 4.2069
+- Accuracy: 0.0
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.001
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 128
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss | Accuracy |
+|:-------------:|:------:|:-----:|:---------------:|:--------:|
+| No log        | 0      | 0     | 2.7824          | 0.0      |
+| 4.4924        | 0.0064 | 100   | 2.3666          | 0.0      |
+| 4.4303        | 0.0128 | 200   | 2.3891          | 0.0      |
+| 4.2696        | 0.0192 | 300   | 2.3886          | 0.0      |
+| 4.1579        | 0.0256 | 400   | 2.3007          | 0.0      |
+| 3.9553        | 0.032  | 500   | 2.2664          | 0.0      |
+| 3.443         | 0.0384 | 600   | 2.1687          | 0.0      |
+| 2.9354        | 0.0448 | 700   | 2.2864          | 0.0      |
+| 3.1202        | 0.0512 | 800   | 2.3458          | 0.0      |
+| 2.6519        | 0.0576 | 900   | 2.2538          | 0.0      |
+| 2.8012        | 0.064  | 1000  | 1.9890          | 0.0      |
+| 2.6125        | 0.0704 | 1100  | 2.0957          | 0.0      |
+| 2.4014        | 0.0768 | 1200  | 2.2698          | 0.0      |
+| 2.6731        | 0.0832 | 1300  | 2.4123          | 0.0      |
+| 2.6851        | 0.0896 | 1400  | 2.5450          | 0.0      |
+| 2.5736        | 0.096  | 1500  | 1.9995          | 0.0      |
+| 2.3324        | 0.1024 | 1600  | 2.1640          | 0.0      |
+| 2.4952        | 0.1088 | 1700  | 2.3585          | 0.0      |
+| 2.2542        | 0.1152 | 1800  | 2.3618          | 0.0      |
+| 2.3456        | 0.1216 | 1900  | 2.2109          | 0.0      |
+| 2.386         | 0.128  | 2000  | 2.3208          | 0.0      |
+| 2.528         | 0.1344 | 2100  | 2.2732          | 0.0      |
+| 2.5489        | 0.1408 | 2200  | 2.0203          | 0.0      |
+| 2.4391        | 0.1472 | 2300  | 2.2699          | 0.0      |
+| 2.4452        | 0.1536 | 2400  | 2.2424          | 0.0      |
+| 2.3189        | 0.16   | 2500  | 2.1381          | 0.0      |
+| 4.4429        | 0.1664 | 2600  | 2.2751          | 0.0      |
+| 2.2243        | 0.1728 | 2700  | 2.5335          | 0.0      |
+| 2.3893        | 0.1792 | 2800  | 2.6115          | 0.0      |
+| 2.3394        | 0.1856 | 2900  | 2.6899          | 0.0      |
+| 2.6173        | 0.192  | 3000  | 2.4414          | 0.0      |
+| 2.3326        | 0.1984 | 3100  | 2.6462          | 0.0      |
+| 2.1011        | 0.2048 | 3200  | 2.6349          | 0.0      |
+| 2.5971        | 0.2112 | 3300  | 2.2816          | 0.0      |
+| 2.1758        | 0.2176 | 3400  | 2.9367          | 0.0      |
+| 2.0391        | 0.224  | 3500  | 2.5660          | 0.0      |
+| 2.683         | 0.2304 | 3600  | 2.2197          | 0.0      |
+| 2.1331        | 0.2368 | 3700  | 1.9519          | 0.0      |
+| 2.5328        | 0.2432 | 3800  | 2.8043          | 0.0      |
+| 2.307         | 0.2496 | 3900  | 3.1588          | 0.0      |
+| 2.4706        | 0.256  | 4000  | 1.7832          | 0.0      |
+| 2.2706        | 0.2624 | 4100  | 2.3912          | 0.0      |
+| 2.0165        | 0.2688 | 4200  | 3.3943          | 0.0      |
+| 2.1406        | 0.2752 | 4300  | 2.8255          | 0.0      |
+| 2.1749        | 0.2816 | 4400  | 1.9678          | 0.0      |
+| 2.3053        | 0.288  | 4500  | 2.1989          | 0.0      |
+| 2.1815        | 0.2944 | 4600  | 2.5911          | 0.0      |
+| 2.1675        | 0.3008 | 4700  | 2.7480          | 0.0      |
+| 2.0463        | 0.3072 | 4800  | 2.0872          | 0.0      |
+| 2.2269        | 0.3136 | 4900  | 2.5325          | 0.0      |
+| 1.9049        | 0.32   | 5000  | 2.7470          | 0.0      |
+| 2.0929        | 0.3264 | 5100  | 2.2618          | 0.0      |
+| 1.8538        | 0.3328 | 5200  | 2.0825          | 0.0      |
+| 1.4804        | 0.3392 | 5300  | 2.0630          | 0.0      |
+| 1.3704        | 0.3456 | 5400  | 1.7688          | 0.0      |
+| 1.2029        | 0.352  | 5500  | 2.7977          | 0.0      |
+| 0.6455        | 0.3584 | 5600  | 2.7000          | 0.0      |
+| 0.9013        | 0.3648 | 5700  | 2.7678          | 0.0      |
+| 0.4867        | 0.3712 | 5800  | 2.7487          | 0.0      |
+| 1.4135        | 0.3776 | 5900  | 2.3163          | 0.0      |
+| 0.1742        | 0.384  | 6000  | 2.7119          | 0.0      |
+| 0.5932        | 0.3904 | 6100  | 2.8409          | 0.0      |
+| 0.3346        | 0.3968 | 6200  | 2.4202          | 0.0      |
+| 0.7908        | 0.4032 | 6300  | 3.1410          | 0.0      |
+| 0.4932        | 0.4096 | 6400  | 3.6802          | 0.0      |
+| 0.3316        | 0.416  | 6500  | 2.2771          | 0.0      |
+| 0.3097        | 0.4224 | 6600  | 2.5632          | 0.0      |
+| 0.3322        | 0.4288 | 6700  | 2.7335          | 0.0      |
+| 0.2749        | 0.4352 | 6800  | 3.1795          | 0.0      |
+| 0.3139        | 0.4416 | 6900  | 3.6155          | 0.0      |
+| 0.2414        | 0.448  | 7000  | 3.1201          | 0.0      |
+| 2.3029        | 0.4544 | 7100  | 3.7881          | 0.0      |
+| 0.4029        | 0.4608 | 7200  | 3.7932          | 0.0      |
+| 0.1374        | 0.4672 | 7300  | 2.8160          | 0.0      |
+| 0.2741        | 0.4736 | 7400  | 2.4926          | 0.0      |
+| 0.2765        | 0.48   | 7500  | 3.0036          | 0.0      |
+| 0.022         | 0.4864 | 7600  | 3.4260          | 0.0      |
+| 0.1145        | 0.4928 | 7700  | 3.2066          | 0.0      |
+| 0.2077        | 0.4992 | 7800  | 2.9937          | 0.0      |
+| 0.086         | 0.5056 | 7900  | 2.7413          | 0.0      |
+| 0.0884        | 0.512  | 8000  | 3.2411          | 0.0      |
+| 0.158         | 0.5184 | 8100  | 3.4493          | 0.0      |
+| 0.1484        | 0.5248 | 8200  | 2.4997          | 0.0      |
+| 0.0335        | 0.5312 | 8300  | 3.4125          | 0.0      |
+| 0.0128        | 0.5376 | 8400  | 2.4688          | 0.0      |
+| 0.0317        | 0.544  | 8500  | 2.9963          | 0.0      |
+| 0.092         | 0.5504 | 8600  | 3.6333          | 0.0      |
+| 0.028         | 0.5568 | 8700  | 2.9518          | 0.0      |
+| 0.0442        | 0.5632 | 8800  | 3.4326          | 0.0      |
+| 0.0901        | 0.5696 | 8900  | 3.9253          | 0.0      |
+| 0.1999        | 0.576  | 9000  | 3.9988          | 0.0      |
+| 0.0344        | 0.5824 | 9100  | 3.8048          | 0.0      |
+| 0.0015        | 0.5888 | 9200  | 3.7488          | 0.0      |
+| 0.0846        | 0.5952 | 9300  | 4.5372          | 0.0      |
+| 0.0147        | 0.6016 | 9400  | 4.1808          | 0.0      |
+| 0.0282        | 0.608  | 9500  | 3.3729          | 0.0      |
+| 0.0273        | 0.6144 | 9600  | 2.6182          | 0.0      |
+| 0.0229        | 0.6208 | 9700  | 3.7613          | 0.0      |
+| 0.0005        | 0.6272 | 9800  | 4.1924          | 0.0      |
+| 0.0023        | 0.6336 | 9900  | 3.0110          | 0.0      |
+| 0.002         | 0.64   | 10000 | 3.1986          | 0.0      |
+| 0.0018        | 0.6464 | 10100 | 2.8814          | 0.0      |
+| 0.0006        | 0.6528 | 10200 | 3.2804          | 0.0      |
+| 0.0002        | 0.6592 | 10300 | 4.5229          | 0.0      |
+| 0.007         | 0.6656 | 10400 | 4.0220          | 0.0      |
+| 0.0           | 0.672  | 10500 | 4.3269          | 0.0      |
+| 0.0012        | 0.6784 | 10600 | 4.4005          | 0.0      |
+| 0.0           | 0.6848 | 10700 | 4.0689          | 0.0      |
+| 0.0003        | 0.6912 | 10800 | 3.0147          | 0.0      |
+| 0.0023        | 0.6976 | 10900 | 5.2716          | 0.0      |
+| 0.0004        | 0.704  | 11000 | 3.3269          | 0.0      |
+| 0.0006        | 0.7104 | 11100 | 3.6125          | 0.0      |
+| 0.0002        | 0.7168 | 11200 | 2.9193          | 0.0      |
+| 0.0002        | 0.7232 | 11300 | 4.0888          | 0.0      |
+| 0.0002        | 0.7296 | 11400 | 3.3349          | 0.0      |
+| 0.0           | 0.736  | 11500 | 3.4065          | 0.0      |
+| 0.0           | 0.7424 | 11600 | 3.5861          | 0.0      |
+| 0.0           | 0.7488 | 11700 | 3.6467          | 0.0      |
+| 0.0           | 0.7552 | 11800 | 3.6487          | 0.0      |
+| 0.0           | 0.7616 | 11900 | 3.6888          | 0.0      |
+| 0.0           | 0.768  | 12000 | 3.7449          | 0.0      |
+| 0.0           | 0.7744 | 12100 | 3.8121          | 0.0      |
+| 0.0           | 0.7808 | 12200 | 3.8735          | 0.0      |
+| 0.0           | 0.7872 | 12300 | 3.9032          | 0.0      |
+| 0.0           | 0.7936 | 12400 | 3.9248          | 0.0      |
+| 0.0           | 0.8    | 12500 | 3.9542          | 0.0      |
+| 0.0           | 0.8064 | 12600 | 3.9486          | 0.0      |
+| 0.0           | 0.8128 | 12700 | 3.9630          | 0.0      |
+| 0.0           | 0.8192 | 12800 | 3.9758          | 0.0      |
+| 0.0           | 0.8256 | 12900 | 3.9721          | 0.0      |
+| 0.0           | 0.832  | 13000 | 3.9492          | 0.0      |
+| 0.0           | 0.8384 | 13100 | 3.9657          | 0.0      |
+| 0.0           | 0.8448 | 13200 | 3.9868          | 0.0      |
+| 0.0           | 0.8512 | 13300 | 4.0069          | 0.0      |
+| 0.0           | 0.8576 | 13400 | 4.0213          | 0.0      |
+| 0.0           | 0.864  | 13500 | 4.0300          | 0.0      |
+| 0.0           | 0.8704 | 13600 | 4.0330          | 0.0      |
+| 0.0           | 0.8768 | 13700 | 4.0619          | 0.0      |
+| 0.0           | 0.8832 | 13800 | 4.2990          | 0.0      |
+| 0.0           | 0.8896 | 13900 | 4.2865          | 0.0      |
+| 0.0           | 0.896  | 14000 | 4.2903          | 0.0      |
+| 0.0           | 0.9024 | 14100 | 4.2958          | 0.0      |
+| 0.0           | 0.9088 | 14200 | 4.2725          | 0.0      |
+| 0.0           | 0.9152 | 14300 | 4.2739          | 0.0      |
+| 0.0           | 0.9216 | 14400 | 4.2878          | 0.0      |
+| 0.0           | 0.928  | 14500 | 4.2924          | 0.0      |
+| 0.0           | 0.9344 | 14600 | 4.2934          | 0.0      |
+| 0.0           | 0.9408 | 14700 | 4.2081          | 0.0      |
+| 0.0           | 0.9472 | 14800 | 4.2132          | 0.0      |
+| 0.0           | 0.9536 | 14900 | 4.2024          | 0.0      |
+| 0.0           | 0.96   | 15000 | 4.2036          | 0.0      |
+| 0.0           | 0.9664 | 15100 | 4.2049          | 0.0      |
+| 0.0           | 0.9728 | 15200 | 4.2054          | 0.0      |
+| 0.0           | 0.9792 | 15300 | 4.2064          | 0.0      |
+| 0.0           | 0.9856 | 15400 | 4.2069          | 0.0      |
+| 0.0           | 0.992  | 15500 | 4.2069          | 0.0      |
+| 0.0           | 0.9984 | 15600 | 4.2069          | 0.0      |
+### Framework versions
+- Transformers 4.46.0
+- Pytorch 2.5.1
+- Datasets 3.1.0
+- Tokenizers 0.20.1

config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "architectures": [
+    "NanoGPT"
+  ],
+  "bias": true,
+  "block_size": 256,
+  "dropout": 0.0,
+  "model_type": "nanogpt",
+  "n_embd": 384,
+  "n_head": 6,
+  "n_layer": 6,
+  "nonlinearity": "RELU",
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.0",
+  "use_NoPE": true,
+  "use_layernorm": true,
+  "vocab_size": 14
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "_from_model_config": true,
+  "transformers_version": "4.46.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5617b31ad24eb7b1e71540f13748b7d894f8471282f0d79d98cba399b2bc7f4d
+size 42640744

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e17ee82b634fd4df3f5751ec12f577a5d53ee3ba3d70f23dc089aedaab70fb10
+size 5240