File size: 13,791 Bytes
c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad 0d39bc9 d5365b6 1909f9a c1d20ad d5365b6 25281be 0aae6f5 c1d20ad 01d49ea c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad 8302bae c1d20ad 01d49ea c1d20ad d5365b6 c1d20ad 01d49ea 17abd8d 01d49ea c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad d5365b6 86dd4bc c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad d5365b6 c1d20ad 9b6013a 01d49ea 9b6013a 01d49ea c1d20ad 01d49ea c1d20ad 80f0578 01d49ea d5365b6 c1d20ad d5365b6 c1d20ad 98737b9 01d49ea c1d20ad 01d49ea 05ad944 01d49ea c1d20ad d007fd6 c1d20ad d007fd6 c1d20ad 01d49ea d007fd6 01d49ea d5365b6 c1d20ad d007fd6 c1d20ad d007fd6 d5365b6 d007fd6 c1d20ad d007fd6 c1d20ad d007fd6 c1d20ad d5365b6 c1d20ad d007fd6 c1d20ad d007fd6 c1d20ad d007fd6 c1d20ad b63a5d1 c1d20ad b63a5d1 c1d20ad d5365b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
---
base_model:
- google/flan-t5-large
datasets:
- deepmind/math_dataset
language:
- en
library_name: transformers
metrics:
- exact_match
---
# Model Card for Model ID
Welcome to the 🤖🧮CyberSolve LinAlg 1.2🧠📐 model card!
We introduce **CyberSolve LinAlg 1.2**, a text-to-text large language model trained to solve linear equations. Specifically, *CyberSolve LingAlg 1.2* is a downstream version of the *FLAN-T5 large*
model, [Google/FLAN-T5-large](https://huggingface.co/google/flan-t5-large), fine-tuned on the one-dimensional linear algebra split of the Google DeepMind mathematics dataset.
The model weights of *CyberSolve LinAlg 1.2* are a further downstream checkpoint from the original *CyberSolve LinAlg 1.1* checkpoint, trained for additional epochs to improve model capability.
**Note**: This is currently the most capable version of CyberSolve LinAlg. See this model demoed in the [CyberSolve LinAlg 1.2 🤖 Space](https://huggingface.co/spaces/MarioBarbeque/CyberSolveLinAlg1.2).
## Model Details
### Model Description and Overview
To construct **CyberSolve LinAlg 1.2**, the *FLAN-T5 large* model is fined-tuned using a custom PyTorch training loop optimized for multiple Nvidia A100 GPUs. We supervise a training of *FLAN-T5 large* on the *algebra__linear_1d* split of the Google DeepMind mathematics dataset, an open source
dateset from Google DeepMind available through the 🤗 hub at [deepmind/math_dataset](https://huggingface.co/datasets/deepmind/math_dataset). This large dataset consists of code generating mathematical problems and their solutions to a variety of tasks across unique mathematical disciplines.
In this preliminary family of CyberSolve models, we are specifically interested in understanding the ability of neural models to solve non-trivial mathematical tasks. As such, the CyberSolve **LinAlg 1.x** family of models are trained on a set of 2M simpler, one-dimension linear equations.
We preprocessed the data and simulated the training on a smaller, downsampled set of the dataset before training for multiple epochs over the dataset's entirety. This model in particular has been trained for 2 additional epochs, limited only by funds, beyond the original *CyberSolve LinAlg 1.1* checkpoint.
Version 1.2 is the most capable version of CyberSolve LinAlg, scoring a **90.75** exact match score on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
- **Developed by:** John Graham Reynolds
- **Funded by:** Vanderbilt University
- **Model type:** Text-to-Text Generation
- **Language(s) (NLP):** English
- **Finetuned from model:** "Google/FLAN-T5-large"
### Model Source
<!-- Provide the basic links for the model. -->
- **Repository:** TODO
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
In order to effectively query the model's ability to solve linear equations, a string of the format `"Solve <any one-dimensional linear equation of variable x> for x."` should be tokenized and passed to the model's `generate` attribute.
An example input string is `input_text = "Solve 24 = 1601*c - 1605*c for c."`. The model will attempt to solve the equation, outputting its prediction in a simple numeric format. See the example below.
## How to Use and Query the Model
Use the code below to get started with the model. Users pass an `input_text` string (again, of the form `input_text = "Solve 24 = 1601*c - 1605*c for c."`) which prompts the model to solve a one-dimensional linear equation.
Model prediction is significantly faster on a GPU, and so usage of the `.to('cuda')` commands to make sure both the model and all input ids are on the GPU is best practice.
Furthermore, the FLAN-T5 model architecture makes use
of many normalization layers, as is common in the transformer architecture. By default, CyberSolve uses the T5 model's `T5LayerNorm` Python class; it is highly recommended that user install the Nvidia `Apex` package for Nvidia GPUs
or the ROCm `Apex` package for AMD GPUs. Once installed, the model will default to using the `apex.normalization.FusedRMSNorm` class when computing the normalization layers. The `FusedRMSNorm` class from `apex` makes use of an optimized fused kernel
that is much faster than the standard `T5LayerNorm` class, thereby significantly improving both inference and training.
The base FLAN-T5 model is capable of answering a variety of prompts, but the domain-adapted CyberSolve LinAlg model is designed specifically for solving linear equations. As such, users must be considerate in their prompt
engineering to issue a coherent, relevant query as outlined above and below.
``` python
# import apex
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained("MarioBarbeque/CyberSolve-LinAlg-1.2").to("cuda")
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large") # CyberSolve uses the same tokenizer as the base FLAN-T5 model
# Pass the model instruction to solve a linear equation in the following simple format
input_text = "Solve 24 = 1601*c - 1605*c for c."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
This code outputs the following:
``` python
-6
```
## Training Details
### Training Data / Preprocessing
The data used comes from Google DeepMind and the 🤗 hub. The model card can be found [here](https://huggingface.co/datasets/deepmind/mathematics). The Deepmind Mathematics DatasetDict object is composed of a vast variety of underlying mathematics datasets.
Each of the underlying datasets contains a specific class of mathematical problems and their solutions. For the CyberSolve LinAlg *1.x* family of models, we are interested specifically in the solving of one-dimensional linear equations, so we use the *algebra__linear_1d* split.
The training and evaluation splits of the 1D linear algebra dataset split are preprocessed in the following way: we format the raw problems and their solutions of the form `"b'Solve 65*l - 361 + 881 = 0 for l.\\n'"` and `"b'-8\\n'"` into the much cleanear `"Solve 65*l - 361 + 881 = 0 for l."` and `"-8"`.
All inputs and labels are then tokenized. We subsequently evaluate the length of each *input_ids* vector and each *labels* vector to ensure there are no outliers and no inputs that need to be truncated. For later ease of loading, we upload these preprocessed and tokenized training and evaluation datasets
to the 🤗 hub at the following locations: [MarioBarbeque/DeepMind-LinAlg-1D-train](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-train) and [MarioBarbeque/DeepMind-LinAlg-1D-eval](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-eval).
### Training Procedure
The model was trained locally on a single-node with multiple Nvidia A100 GPUs using 🤗 Transformers, 🤗 Tokenizers, and a custom PyTorch training loop that made use of both Nvidia Apex and 🤗 Accelerate.
#### Training Hyperparameters
- **Precision:** We trained the model in bfloat16, and subsequently publish it with the same precision of the base "google/flan-t5-large" model in FP32.
- **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia Apex
- **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 1e-4 to further adjust the CyberSolve LinAlg **1.1** weights
- **Batch Size:** 64
- **Number of Training Steps**: 1918 training steps over 2 additional epochs (CyberSolve LinAlg **1.2**) - beyond the original 2877 total steps over 3 epochs (CyberSolve LinAlg **1.1**)
## Evaluation / Metrics
We evaluate our text-to-text linear equation solver by using the `exact_match` metric to compare the model's decoded predicted tokens with their numeric labels. *CyberSolve LinAlg 1.2* scores a **90.75** exact match score
on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
This dataset was created with the goal of analyzing both the token-to-token and decoded-sequence-to-decoded-sequence partial correctness of CyberSolve's predicitions in detail beyond just its ability to get answers flat out right or wrong. Similar partial correctness benchmark datasets were created for the
intial [FLAN-T5 model](https://huggingface.co/datasets/MarioBarbeque/FLAN-T5-DeepMind-LinAlg-1D-benchmark), the [zeroth-generation downsampled training](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-DeepMind-LinAlg-1D-downsample-benchmark-v2) of CyberSolve, and
the [1.1 version](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.1-correctness-benchmark) of the model. *We have yet to complete partial correctness analysis of the various model versions and their predicitions, but we look forward to better understanding the mathematical
reasoning capabilities of models and publishing our results when complete!*
### Testing Data, Factors & Metrics
#### Testing Data
The 1D Linear Algebra split of the Google DeepMind Mathematics dataset comes pre-split into training and evaluation datasets of 2M and 10k records, respectively. Before training *CyberSolve LinAlg 1.1*, we trained a zeroth-generation, downsampled verison of CyberSolve
by scikit-learn `train_test_split`-ing the set of 2M training records into much smaller training and evaluation datasets. We used this smaller set to evaluate the less-interesting zeroth-generation model, while we used the standard set of 10k evaluation records for evaluating
both *CyberSolve LinAlg 1.1* and *CyberSolve LinAlg 1.2*.
### Results
We find the following benchmark scores for our each of our neural models after the corresponding epoch of training.
|model | epoch | exact_match score |
|--------------------------------|-------|-------------------|
|CyberSolve LinAlg **1.2** |1 | 90.75 |
|CyberSolve LinAlg **1.2** |0 | 83.12 |
| -------------------------------|--------|--------|
|CyberSolve LinAlg **1.1** |2 | 86.56 |
|CyberSolve LinAlg **1.1** |1 | 73.80 |
|CyberSolve LinAlg **1.1** |0 | 55.35 |
| -------------------------------|--------|--------|
|CyberSolve LinAlg **Downsample**|2 | 44.99 |
|CyberSolve LinAlg **Downsample**|1 | 39.69 |
|CyberSolve LinAlg **Downsample**|0 | 32.21 |
#### Summary
We train this model for the purpose of researching the mathematical reasoning abilities of transformer-based neural models (both the full-correctness and partial-correctness mathematical reasoning abilities of neural models).
Our efforts made use of the 🤗 ecosystem, a system of parallelized Nvidia A100 GPUs in an Azure Databricks environment, custom PyTorch training and evaluation code, novel high-performance computing and deep learning libraries like Nvidia Apex, and more.
We learned a great deal and look forward to finalizing our research on the partial correctness reasoning abilities of these preliminary models. We also eagerly plan to further improve the CyberSolve family of models to tackle more difficult mathematical tasks.
As we look forward, CyberSolve LinAlg *2.x* will likely incoropate knowledge of systems of composed one-dimensional linear equations and more general multiple variable linear equations. Finally, methods related to reinforcement learning are equally enticing
for improving neural reasoning abilities; the future is bright for teaching mathematics to AI!
We look forward to taking part in this great and worthy endeavor.
## Environmental Impact
- **Hardware Type:** Nvidia Ampere A100 80GB
- **Hours used:** 21.5
- **Cloud Provider:** Microsoft Azure
- **Compute Region:** EastUS
- **Carbon Emitted:** 3.18 kgCO2
Experiments were conducted using Azure in region eastus, which has a carbon efficiency of 0.37 kgCO$_2$eq/kWh. A cumulative of 21.5 hours of computation was performed on hardware of type A100 SXM4 80 GB (TDP of 400W).
Total emissions are estimated to be 3.18 kgCO$_2$eq of which 100 percents were directly offset by the cloud provider.
Estimations were conducted using the MachineLearning Impact calculator presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
#### Hardware
The model was trained locally in an Azure Databricks workspace using a single node cloud compute instance with 2 Nvidia A100 80GB GPUs for 21.5 GPU Hours.
#### Software
Training utilized PyTorch, Nvidia Apex, 🤗 Transformers, 🤗 Tokenizers, 🤗 Datasets, 🤗 Accelerate, and more in an Azure Databricks execution environment.
#### Cite This Model
@misc{cybersolve,
author = {John Graham Reynolds},
title = {{CyberSolve-LinAlg-1.2 Hugging Face model card}},
year = 2025,
url = {https://huggingface.co/MarioBarbeque/CyberSolve-LinAlg-1.2},
urldate = {date-you-accessed}
}
#### Citations
@article{lacoste2019quantifying,
title={Quantifying the Carbon Emissions of Machine Learning},
author={Lacoste, Alexandre and Luccioni, Alexandra and Schmidt, Victor and Dandres, Thomas},
journal={arXiv preprint arXiv:1910.09700},
year={2019}
}
|