Model save

Browse files

Files changed (5) hide show

README.md +218 -0
config.json +18 -0
generation_config.json +4 -0
model.safetensors +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,218 @@

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: reverse_add_replicate
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# reverse_add_replicate
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0000
+- Accuracy: 1.0
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.001
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 128
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss | Accuracy |
+|:-------------:|:------:|:-----:|:---------------:|:--------:|
+| No log        | 0      | 0     | 2.6817          | 0.0      |
+| 4.4613        | 0.0064 | 100   | 2.3109          | 0.0      |
+| 4.335         | 0.0128 | 200   | 2.2529          | 0.0      |
+| 4.2668        | 0.0192 | 300   | 2.1999          | 0.0      |
+| 4.1908        | 0.0256 | 400   | 2.2754          | 0.0      |
+| 4.2314        | 0.032  | 500   | 2.2359          | 0.0      |
+| 3.6347        | 0.0384 | 600   | 1.8769          | 0.0      |
+| 3.4763        | 0.0448 | 700   | 1.9187          | 0.0      |
+| 3.1229        | 0.0512 | 800   | 1.9776          | 0.0      |
+| 2.8398        | 0.0576 | 900   | 1.6601          | 0.0      |
+| 3.0181        | 0.064  | 1000  | 1.6472          | 0.0      |
+| 3.0209        | 0.0704 | 1100  | 1.6118          | 0.0      |
+| 2.5603        | 0.0768 | 1200  | 1.3266          | 0.002    |
+| 2.6247        | 0.0832 | 1300  | 1.4725          | 0.0      |
+| 2.502         | 0.0896 | 1400  | 1.4013          | 0.0      |
+| 2.6392        | 0.096  | 1500  | 1.6963          | 0.0      |
+| 2.3297        | 0.1024 | 1600  | 1.5349          | 0.001    |
+| 2.4639        | 0.1088 | 1700  | 1.3084          | 0.001    |
+| 2.4555        | 0.1152 | 1800  | 1.3022          | 0.0      |
+| 2.1326        | 0.1216 | 1900  | 1.2985          | 0.002    |
+| 2.2766        | 0.128  | 2000  | 1.2175          | 0.0      |
+| 2.5442        | 0.1344 | 2100  | 1.2973          | 0.0      |
+| 2.3005        | 0.1408 | 2200  | 1.4224          | 0.0      |
+| 2.4872        | 0.1472 | 2300  | 1.3877          | 0.001    |
+| 2.3095        | 0.1536 | 2400  | 1.2081          | 0.002    |
+| 2.3245        | 0.16   | 2500  | 1.2708          | 0.001    |
+| 2.6165        | 0.1664 | 2600  | 1.5453          | 0.001    |
+| 2.2608        | 0.1728 | 2700  | 1.2128          | 0.002    |
+| 2.3363        | 0.1792 | 2800  | 1.2837          | 0.002    |
+| 2.262         | 0.1856 | 2900  | 1.2287          | 0.007    |
+| 2.1686        | 0.192  | 3000  | 1.3750          | 0.0      |
+| 2.3021        | 0.1984 | 3100  | 1.1819          | 0.005    |
+| 1.8808        | 0.2048 | 3200  | 1.1540          | 0.003    |
+| 2.5449        | 0.2112 | 3300  | 1.1970          | 0.0      |
+| 2.1555        | 0.2176 | 3400  | 1.1703          | 0.001    |
+| 1.8908        | 0.224  | 3500  | 1.2023          | 0.003    |
+| 2.074         | 0.2304 | 3600  | 1.3576          | 0.002    |
+| 2.2279        | 0.2368 | 3700  | 1.7341          | 0.0      |
+| 2.4889        | 0.2432 | 3800  | 1.2299          | 0.003    |
+| 2.0978        | 0.2496 | 3900  | 1.2305          | 0.0      |
+| 2.6161        | 0.256  | 4000  | 1.8482          | 0.002    |
+| 1.937         | 0.2624 | 4100  | 1.1050          | 0.005    |
+| 1.9751        | 0.2688 | 4200  | 1.2011          | 0.003    |
+| 2.1199        | 0.2752 | 4300  | 1.2652          | 0.004    |
+| 1.3263        | 0.2816 | 4400  | 0.7553          | 0.018    |
+| 1.6805        | 0.288  | 4500  | 1.2216          | 0.005    |
+| 1.1079        | 0.2944 | 4600  | 0.8702          | 0.012    |
+| 1.4584        | 0.3008 | 4700  | 1.0929          | 0.0      |
+| 1.1793        | 0.3072 | 4800  | 0.8990          | 0.005    |
+| 0.7387        | 0.3136 | 4900  | 0.5412          | 0.031    |
+| 1.4369        | 0.32   | 5000  | 1.4076          | 0.057    |
+| 0.4073        | 0.3264 | 5100  | 0.4967          | 0.384    |
+| 0.4319        | 0.3328 | 5200  | 0.4954          | 0.22     |
+| 0.4177        | 0.3392 | 5300  | 0.5079          | 0.461    |
+| 0.3973        | 0.3456 | 5400  | 0.4415          | 0.377    |
+| 0.7054        | 0.352  | 5500  | 0.6503          | 0.1      |
+| 0.5802        | 0.3584 | 5600  | 0.8201          | 0.063    |
+| 0.1897        | 0.3648 | 5700  | 0.2479          | 0.462    |
+| 0.3982        | 0.3712 | 5800  | 1.3623          | 0.186    |
+| 0.6079        | 0.3776 | 5900  | 0.9248          | 0.195    |
+| 0.2099        | 0.384  | 6000  | 0.4132          | 0.308    |
+| 0.1991        | 0.3904 | 6100  | 0.1490          | 0.605    |
+| 0.4226        | 0.3968 | 6200  | 0.5506          | 0.284    |
+| 1.0515        | 0.4032 | 6300  | 1.1107          | 0.129    |
+| 0.1014        | 0.4096 | 6400  | 0.2367          | 0.447    |
+| 0.2219        | 0.416  | 6500  | 0.4163          | 0.347    |
+| 2.1345        | 0.4224 | 6600  | 1.4566          | 0.0      |
+| 0.5009        | 0.4288 | 6700  | 0.5398          | 0.158    |
+| 0.1368        | 0.4352 | 6800  | 0.3955          | 0.17     |
+| 0.0253        | 0.4416 | 6900  | 0.1468          | 0.629    |
+| 0.1325        | 0.448  | 7000  | 0.3457          | 0.467    |
+| 0.1866        | 0.4544 | 7100  | 0.4352          | 0.313    |
+| 0.6098        | 0.4608 | 7200  | 0.8387          | 0.16     |
+| 0.1887        | 0.4672 | 7300  | 0.2170          | 0.453    |
+| 0.058         | 0.4736 | 7400  | 0.0872          | 0.731    |
+| 0.2518        | 0.48   | 7500  | 0.3798          | 0.267    |
+| 0.0314        | 0.4864 | 7600  | 0.3710          | 0.311    |
+| 0.5078        | 0.4928 | 7700  | 0.5315          | 0.18     |
+| 0.0894        | 0.4992 | 7800  | 0.2551          | 0.366    |
+| 0.0788        | 0.5056 | 7900  | 0.1619          | 0.468    |
+| 0.6913        | 0.512  | 8000  | 0.5418          | 0.198    |
+| 0.2068        | 0.5184 | 8100  | 0.3154          | 0.323    |
+| 0.8031        | 0.5248 | 8200  | 0.6006          | 0.149    |
+| 0.0841        | 0.5312 | 8300  | 0.1740          | 0.74     |
+| 0.1649        | 0.5376 | 8400  | 0.1316          | 0.592    |
+| 0.4631        | 0.544  | 8500  | 0.5998          | 0.226    |
+| 0.2732        | 0.5504 | 8600  | 0.7268          | 0.168    |
+| 0.2153        | 0.5568 | 8700  | 0.2141          | 0.4      |
+| 0.6022        | 0.5632 | 8800  | 0.3403          | 0.412    |
+| 0.115         | 0.5696 | 8900  | 0.0905          | 0.712    |
+| 0.1791        | 0.576  | 9000  | 0.1527          | 0.554    |
+| 0.2843        | 0.5824 | 9100  | 0.3514          | 0.319    |
+| 0.0359        | 0.5888 | 9200  | 0.0447          | 0.829    |
+| 0.018         | 0.5952 | 9300  | 0.0565          | 0.781    |
+| 0.0363        | 0.6016 | 9400  | 0.1747          | 0.507    |
+| 0.1352        | 0.608  | 9500  | 0.3075          | 0.498    |
+| 0.0642        | 0.6144 | 9600  | 0.2735          | 0.475    |
+| 0.0619        | 0.6208 | 9700  | 0.0728          | 0.773    |
+| 0.0305        | 0.6272 | 9800  | 0.2225          | 0.694    |
+| 0.1128        | 0.6336 | 9900  | 0.1043          | 0.649    |
+| 0.1403        | 0.64   | 10000 | 0.0730          | 0.692    |
+| 0.1471        | 0.6464 | 10100 | 0.1880          | 0.497    |
+| 0.0632        | 0.6528 | 10200 | 0.1933          | 0.657    |
+| 0.0757        | 0.6592 | 10300 | 0.0467          | 0.806    |
+| 0.0969        | 0.6656 | 10400 | 0.3012          | 0.546    |
+| 0.0552        | 0.672  | 10500 | 0.2214          | 0.37     |
+| 0.0821        | 0.6784 | 10600 | 0.2411          | 0.504    |
+| 0.0254        | 0.6848 | 10700 | 0.1192          | 0.619    |
+| 0.0058        | 0.6912 | 10800 | 0.0409          | 0.901    |
+| 0.0343        | 0.6976 | 10900 | 0.1508          | 0.671    |
+| 0.0357        | 0.704  | 11000 | 0.0646          | 0.766    |
+| 0.1314        | 0.7104 | 11100 | 0.1610          | 0.558    |
+| 0.3291        | 0.7168 | 11200 | 1.1259          | 0.282    |
+| 0.0217        | 0.7232 | 11300 | 0.0448          | 0.855    |
+| 0.0486        | 0.7296 | 11400 | 0.1727          | 0.719    |
+| 0.0055        | 0.736  | 11500 | 0.0911          | 0.715    |
+| 0.028         | 0.7424 | 11600 | 0.0281          | 0.904    |
+| 0.0518        | 0.7488 | 11700 | 0.2969          | 0.421    |
+| 0.0049        | 0.7552 | 11800 | 0.0311          | 0.871    |
+| 0.0044        | 0.7616 | 11900 | 0.0091          | 0.955    |
+| 0.0158        | 0.768  | 12000 | 0.0036          | 0.979    |
+| 0.0015        | 0.7744 | 12100 | 0.0169          | 0.919    |
+| 0.0099        | 0.7808 | 12200 | 0.0078          | 0.961    |
+| 0.0098        | 0.7872 | 12300 | 0.0123          | 0.952    |
+| 0.0006        | 0.7936 | 12400 | 0.0065          | 0.966    |
+| 0.0015        | 0.8    | 12500 | 0.0058          | 0.971    |
+| 0.0           | 0.8064 | 12600 | 0.0031          | 0.984    |
+| 0.0002        | 0.8128 | 12700 | 0.0124          | 0.961    |
+| 0.0002        | 0.8192 | 12800 | 0.0024          | 0.988    |
+| 0.0           | 0.8256 | 12900 | 0.0034          | 0.987    |
+| 0.0           | 0.832  | 13000 | 0.0055          | 0.98     |
+| 0.0           | 0.8384 | 13100 | 0.0063          | 0.979    |
+| 0.0063        | 0.8448 | 13200 | 0.0082          | 0.958    |
+| 0.0003        | 0.8512 | 13300 | 0.0016          | 0.993    |
+| 0.0001        | 0.8576 | 13400 | 0.0007          | 0.996    |
+| 0.0002        | 0.864  | 13500 | 0.0009          | 0.996    |
+| 0.0           | 0.8704 | 13600 | 0.0004          | 0.997    |
+| 0.0           | 0.8768 | 13700 | 0.0072          | 0.971    |
+| 0.0012        | 0.8832 | 13800 | 0.0011          | 0.995    |
+| 0.0           | 0.8896 | 13900 | 0.0059          | 0.986    |
+| 0.0           | 0.896  | 14000 | 0.0091          | 0.981    |
+| 0.0           | 0.9024 | 14100 | 0.0081          | 0.984    |
+| 0.0           | 0.9088 | 14200 | 0.0023          | 0.991    |
+| 0.0           | 0.9152 | 14300 | 0.0031          | 0.991    |
+| 0.0           | 0.9216 | 14400 | 0.0001          | 0.999    |
+| 0.0           | 0.928  | 14500 | 0.0001          | 1.0      |
+| 0.0           | 0.9344 | 14600 | 0.0001          | 1.0      |
+| 0.0           | 0.9408 | 14700 | 0.0001          | 1.0      |
+| 0.0           | 0.9472 | 14800 | 0.0000          | 1.0      |
+| 0.0           | 0.9536 | 14900 | 0.0001          | 1.0      |
+| 0.0           | 0.96   | 15000 | 0.0000          | 1.0      |
+| 0.0001        | 0.9664 | 15100 | 0.0000          | 1.0      |
+| 0.0           | 0.9728 | 15200 | 0.0000          | 1.0      |
+| 0.0           | 0.9792 | 15300 | 0.0000          | 1.0      |
+| 0.0           | 0.9856 | 15400 | 0.0000          | 1.0      |
+| 0.0           | 0.992  | 15500 | 0.0000          | 1.0      |
+| 0.0           | 0.9984 | 15600 | 0.0000          | 1.0      |
+### Framework versions
+- Transformers 4.46.0
+- Pytorch 2.5.1
+- Datasets 3.1.0
+- Tokenizers 0.20.1

config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "architectures": [
+    "NanoGPT"
+  ],
+  "bias": true,
+  "block_size": 256,
+  "dropout": 0.0,
+  "model_type": "nanogpt",
+  "n_embd": 384,
+  "n_head": 6,
+  "n_layer": 6,
+  "nonlinearity": "RELU",
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.0",
+  "use_NoPE": true,
+  "use_layernorm": true,
+  "vocab_size": 14
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "_from_model_config": true,
+  "transformers_version": "4.46.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e690fbbbc9ca587e0419aa902d1bcbf5f20545ed219a9a32e047b1caa1b7915
+size 42640744

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa0fdf48f20c59e3d6da245ddade0ec393ed2bdef608d2fb69b015c1077fe2e6
+size 5240