Model save

Browse files

Files changed (5) hide show

README.md +218 -0
config.json +18 -0
generation_config.json +4 -0
model.safetensors +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,218 @@

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: reverse_add_replicate_eval18
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# reverse_add_replicate_eval18
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0462
+- Accuracy: 0.822
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.001
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 128
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step  | Validation Loss | Accuracy |
+|:-------------:|:------:|:-----:|:---------------:|:--------:|
+| No log        | 0      | 0     | 2.6252          | 0.0      |
+| 4.5838        | 0.0064 | 100   | 2.3366          | 0.0      |
+| 4.431         | 0.0128 | 200   | 2.2655          | 0.0      |
+| 4.337         | 0.0192 | 300   | 2.3128          | 0.0      |
+| 4.2999        | 0.0256 | 400   | 2.1857          | 0.0      |
+| 4.3044        | 0.032  | 500   | 2.2308          | 0.0      |
+| 3.9499        | 0.0384 | 600   | 2.0985          | 0.0      |
+| 3.778         | 0.0448 | 700   | 2.0666          | 0.0      |
+| 3.3655        | 0.0512 | 800   | 1.7091          | 0.0      |
+| 3.1301        | 0.0576 | 900   | 2.0713          | 0.0      |
+| 3.3799        | 0.064  | 1000  | 1.7607          | 0.0      |
+| 2.7031        | 0.0704 | 1100  | 1.4787          | 0.0      |
+| 2.4529        | 0.0768 | 1200  | 1.4061          | 0.0      |
+| 3.1457        | 0.0832 | 1300  | 1.5655          | 0.0      |
+| 2.7466        | 0.0896 | 1400  | 1.5409          | 0.0      |
+| 2.545         | 0.096  | 1500  | 1.6788          | 0.0      |
+| 2.215         | 0.1024 | 1600  | 1.4968          | 0.0      |
+| 2.4219        | 0.1088 | 1700  | 1.7860          | 0.0      |
+| 2.3565        | 0.1152 | 1800  | 1.3797          | 0.0      |
+| 2.5352        | 0.1216 | 1900  | 1.4154          | 0.0      |
+| 2.4223        | 0.128  | 2000  | 1.3751          | 0.0      |
+| 2.3618        | 0.1344 | 2100  | 1.2539          | 0.002    |
+| 2.7171        | 0.1408 | 2200  | 1.4281          | 0.0      |
+| 2.3209        | 0.1472 | 2300  | 1.2035          | 0.002    |
+| 2.6747        | 0.1536 | 2400  | 1.9148          | 0.0      |
+| 2.3485        | 0.16   | 2500  | 1.4721          | 0.0      |
+| 3.1889        | 0.1664 | 2600  | 1.3520          | 0.0      |
+| 2.2109        | 0.1728 | 2700  | 1.1634          | 0.001    |
+| 2.2771        | 0.1792 | 2800  | 1.2053          | 0.003    |
+| 2.4756        | 0.1856 | 2900  | 1.2916          | 0.0      |
+| 2.085         | 0.192  | 3000  | 1.2781          | 0.001    |
+| 2.496         | 0.1984 | 3100  | 1.2191          | 0.001    |
+| 2.1021        | 0.2048 | 3200  | 1.2496          | 0.001    |
+| 2.5637        | 0.2112 | 3300  | 1.4160          | 0.001    |
+| 2.2194        | 0.2176 | 3400  | 1.2784          | 0.006    |
+| 1.9931        | 0.224  | 3500  | 1.3809          | 0.001    |
+| 2.2398        | 0.2304 | 3600  | 1.2137          | 0.001    |
+| 2.2651        | 0.2368 | 3700  | 1.2654          | 0.001    |
+| 2.3313        | 0.2432 | 3800  | 1.1569          | 0.007    |
+| 2.3756        | 0.2496 | 3900  | 1.2921          | 0.004    |
+| 2.5309        | 0.256  | 4000  | 1.3908          | 0.0      |
+| 2.2903        | 0.2624 | 4100  | 1.2230          | 0.001    |
+| 2.0149        | 0.2688 | 4200  | 1.1809          | 0.002    |
+| 2.2713        | 0.2752 | 4300  | 1.4983          | 0.001    |
+| 2.2298        | 0.2816 | 4400  | 1.2858          | 0.004    |
+| 2.4944        | 0.288  | 4500  | 1.4696          | 0.0      |
+| 2.2138        | 0.2944 | 4600  | 1.2601          | 0.001    |
+| 2.1888        | 0.3008 | 4700  | 1.2084          | 0.008    |
+| 2.6511        | 0.3072 | 4800  | 1.3996          | 0.003    |
+| 2.0815        | 0.3136 | 4900  | 1.1967          | 0.003    |
+| 2.296         | 0.32   | 5000  | 1.2770          | 0.001    |
+| 2.3372        | 0.3264 | 5100  | 1.2541          | 0.0      |
+| 2.0872        | 0.3328 | 5200  | 1.1923          | 0.001    |
+| 2.1522        | 0.3392 | 5300  | 1.2390          | 0.005    |
+| 1.9354        | 0.3456 | 5400  | 1.1832          | 0.005    |
+| 2.5261        | 0.352  | 5500  | 1.5203          | 0.001    |
+| 2.134         | 0.3584 | 5600  | 1.1531          | 0.004    |
+| 1.7733        | 0.3648 | 5700  | 1.2219          | 0.015    |
+| 1.8473        | 0.3712 | 5800  | 1.1778          | 0.018    |
+| 2.1981        | 0.3776 | 5900  | 1.3320          | 0.001    |
+| 2.0556        | 0.384  | 6000  | 1.5240          | 0.01     |
+| 1.9013        | 0.3904 | 6100  | 1.3080          | 0.008    |
+| 1.9382        | 0.3968 | 6200  | 1.0881          | 0.017    |
+| 1.9539        | 0.4032 | 6300  | 1.1345          | 0.025    |
+| 1.9041        | 0.4096 | 6400  | 1.5530          | 0.005    |
+| 2.0314        | 0.416  | 6500  | 1.1389          | 0.023    |
+| 1.9645        | 0.4224 | 6600  | 1.1751          | 0.018    |
+| 1.9642        | 0.4288 | 6700  | 1.1277          | 0.03     |
+| 1.8727        | 0.4352 | 6800  | 1.2153          | 0.001    |
+| 1.6457        | 0.4416 | 6900  | 1.5273          | 0.059    |
+| 1.3439        | 0.448  | 7000  | 0.8178          | 0.055    |
+| 0.6969        | 0.4544 | 7100  | 0.6332          | 0.092    |
+| 0.9132        | 0.4608 | 7200  | 0.9930          | 0.032    |
+| 0.5933        | 0.4672 | 7300  | 0.5563          | 0.292    |
+| 0.4775        | 0.4736 | 7400  | 0.6892          | 0.275    |
+| 0.8709        | 0.48   | 7500  | 1.1903          | 0.074    |
+| 0.6346        | 0.4864 | 7600  | 0.8924          | 0.159    |
+| 0.756         | 0.4928 | 7700  | 1.1650          | 0.135    |
+| 0.283         | 0.4992 | 7800  | 0.7111          | 0.105    |
+| 0.2729        | 0.5056 | 7900  | 0.3804          | 0.135    |
+| 1.1775        | 0.512  | 8000  | 1.3814          | 0.122    |
+| 0.1442        | 0.5184 | 8100  | 0.3274          | 0.491    |
+| 0.9447        | 0.5248 | 8200  | 0.7572          | 0.203    |
+| 0.5056        | 0.5312 | 8300  | 0.4948          | 0.166    |
+| 0.5298        | 0.5376 | 8400  | 0.5869          | 0.279    |
+| 0.1373        | 0.544  | 8500  | 0.7485          | 0.2      |
+| 0.0449        | 0.5504 | 8600  | 0.2730          | 0.375    |
+| 0.1203        | 0.5568 | 8700  | 0.3131          | 0.258    |
+| 0.0388        | 0.5632 | 8800  | 0.1571          | 0.477    |
+| 0.0707        | 0.5696 | 8900  | 0.1798          | 0.459    |
+| 0.8594        | 0.576  | 9000  | 0.7271          | 0.156    |
+| 0.1756        | 0.5824 | 9100  | 0.3364          | 0.307    |
+| 0.4308        | 0.5888 | 9200  | 0.3278          | 0.334    |
+| 0.2429        | 0.5952 | 9300  | 0.6799          | 0.068    |
+| 0.008         | 0.6016 | 9400  | 0.1588          | 0.443    |
+| 0.0404        | 0.608  | 9500  | 0.2014          | 0.43     |
+| 0.0879        | 0.6144 | 9600  | 0.5365          | 0.136    |
+| 0.6424        | 0.6208 | 9700  | 0.6502          | 0.228    |
+| 0.1784        | 0.6272 | 9800  | 0.5427          | 0.088    |
+| 0.0782        | 0.6336 | 9900  | 1.0986          | 0.211    |
+| 0.0053        | 0.64   | 10000 | 0.1458          | 0.632    |
+| 0.0158        | 0.6464 | 10100 | 0.1768          | 0.456    |
+| 0.0506        | 0.6528 | 10200 | 0.1966          | 0.409    |
+| 0.017         | 0.6592 | 10300 | 0.2878          | 0.195    |
+| 0.0401        | 0.6656 | 10400 | 0.3751          | 0.246    |
+| 0.0371        | 0.672  | 10500 | 0.2150          | 0.389    |
+| 0.0237        | 0.6784 | 10600 | 0.0889          | 0.567    |
+| 0.0158        | 0.6848 | 10700 | 0.0455          | 0.787    |
+| 0.0112        | 0.6912 | 10800 | 0.2969          | 0.454    |
+| 0.0105        | 0.6976 | 10900 | 0.4749          | 0.454    |
+| 0.0051        | 0.704  | 11000 | 0.0889          | 0.732    |
+| 0.0072        | 0.7104 | 11100 | 0.1155          | 0.723    |
+| 0.0009        | 0.7168 | 11200 | 0.1212          | 0.701    |
+| 0.0012        | 0.7232 | 11300 | 0.1257          | 0.574    |
+| 0.0071        | 0.7296 | 11400 | 0.1758          | 0.618    |
+| 0.0006        | 0.736  | 11500 | 0.0439          | 0.867    |
+| 0.0008        | 0.7424 | 11600 | 0.2523          | 0.511    |
+| 0.0129        | 0.7488 | 11700 | 0.1786          | 0.612    |
+| 0.0001        | 0.7552 | 11800 | 0.0333          | 0.838    |
+| 0.0017        | 0.7616 | 11900 | 0.1826          | 0.524    |
+| 0.0002        | 0.768  | 12000 | 0.1427          | 0.499    |
+| 0.0001        | 0.7744 | 12100 | 0.0132          | 0.952    |
+| 0.0003        | 0.7808 | 12200 | 0.0720          | 0.692    |
+| 0.0001        | 0.7872 | 12300 | 0.0181          | 0.935    |
+| 0.0004        | 0.7936 | 12400 | 0.0166          | 0.926    |
+| 0.0062        | 0.8    | 12500 | 0.0919          | 0.642    |
+| 0.0003        | 0.8064 | 12600 | 0.0160          | 0.915    |
+| 0.0           | 0.8128 | 12700 | 0.0232          | 0.911    |
+| 0.0001        | 0.8192 | 12800 | 0.0177          | 0.921    |
+| 0.0001        | 0.8256 | 12900 | 0.0448          | 0.812    |
+| 0.0001        | 0.832  | 13000 | 0.0027          | 0.984    |
+| 0.0001        | 0.8384 | 13100 | 0.0877          | 0.717    |
+| 0.028         | 0.8448 | 13200 | 0.0960          | 0.804    |
+| 0.0078        | 0.8512 | 13300 | 0.0969          | 0.66     |
+| 0.0           | 0.8576 | 13400 | 0.0824          | 0.736    |
+| 0.0001        | 0.864  | 13500 | 0.0756          | 0.718    |
+| 0.0           | 0.8704 | 13600 | 0.0649          | 0.778    |
+| 0.0           | 0.8768 | 13700 | 0.0152          | 0.927    |
+| 0.0002        | 0.8832 | 13800 | 0.0610          | 0.813    |
+| 0.0           | 0.8896 | 13900 | 0.0067          | 0.968    |
+| 0.0015        | 0.896  | 14000 | 0.0314          | 0.867    |
+| 0.0005        | 0.9024 | 14100 | 0.0174          | 0.92     |
+| 0.0           | 0.9088 | 14200 | 0.0864          | 0.716    |
+| 0.0001        | 0.9152 | 14300 | 0.0513          | 0.807    |
+| 0.0006        | 0.9216 | 14400 | 0.0106          | 0.95     |
+| 0.0009        | 0.928  | 14500 | 0.0238          | 0.905    |
+| 0.0001        | 0.9344 | 14600 | 0.0335          | 0.856    |
+| 0.0           | 0.9408 | 14700 | 0.0411          | 0.829    |
+| 0.0           | 0.9472 | 14800 | 0.0456          | 0.822    |
+| 0.0           | 0.9536 | 14900 | 0.0425          | 0.833    |
+| 0.0           | 0.96   | 15000 | 0.0460          | 0.821    |
+| 0.0           | 0.9664 | 15100 | 0.0457          | 0.821    |
+| 0.0           | 0.9728 | 15200 | 0.0460          | 0.823    |
+| 0.0           | 0.9792 | 15300 | 0.0477          | 0.821    |
+| 0.0023        | 0.9856 | 15400 | 0.0474          | 0.821    |
+| 0.0           | 0.992  | 15500 | 0.0464          | 0.822    |
+| 0.0           | 0.9984 | 15600 | 0.0462          | 0.822    |
+### Framework versions
+- Transformers 4.46.0
+- Pytorch 2.5.1
+- Datasets 3.1.0
+- Tokenizers 0.20.1

config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "architectures": [
+    "NanoGPT"
+  ],
+  "bias": true,
+  "block_size": 256,
+  "dropout": 0.0,
+  "model_type": "nanogpt",
+  "n_embd": 384,
+  "n_head": 6,
+  "n_layer": 6,
+  "nonlinearity": "RELU",
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.0",
+  "use_NoPE": true,
+  "use_layernorm": true,
+  "vocab_size": 14
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "_from_model_config": true,
+  "transformers_version": "4.46.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4f893dc72d887fc9d48104abfce7b616975e4955477a35833d02c20c77f98e5
+size 42640744

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:837a743ebcf1e835c6bc1584f401590db21b635ceba27b24736e397a64f31e42
+size 5240