---
language:
- en
license: apache-2.0
widget:
 - text: "System:\n  You're an expert Python programmer and mathematician.   Help the user to solve this problem using code when necessary.   Make sure to put the answer (and only answer) inside \\boxed{}.\n\nUser:\nHigh school graduating class is made up of 466 students. There are 212 more girls than boys. How many boys are in the class? \n\nAssistant:\n"
 - text: "System:\n  You're an expert Python programmer and mathematician.   Help the user to solve this problem using code when necessary.   Make sure to put the answer (and only answer) inside \\boxed{}.\n\nUser:\nSuppose you invested 10,000 dollars , part at 6 % annual interest and the rest at 9 % annual interest. If you received 684 dollars in interest after one year , how much did you invest at 6 % annual interest and at 9 % annual interest? \n\nAssistant:\n"
 - text: "System:\n  You're an expert Python programmer and mathematician.   Help the user to solve this problem using code when necessary.   Make sure to put the answer (and only answer) inside \\boxed{}.\n\nUser:\nYou have exactly 537 dollars to spend on party gifts for your rich uncle 's birthday party. You decide to get watches for the ladies at 27.98 dollars each , and beepers for the men at 23.46 dollars each. You know that the number of watches required will be 3 times as much as the number of beepers. How many watches do you buy? How many beepers do you buy? \n\nAssistant:\n"
---
<div align="center">

# TinyLlama-1.1B
</div>

We used this version of TinyLlama as a base model:
https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0

The goal was to improve performance on basic algebra (i.e. solving systems of linear equations).

The base model was fine tuned on 8k rows synthetic solution data generated by [OpenMath-Mistral-7B-v0.1-hf](https://huggingface.co/nvidia/OpenMath-Mistral-7B-v0.1-hf) on [ALG-514](https://paperswithcode.com/sota/math-word-problem-solving-on-alg514).

We used the [NeMo Skills](https://github.com/Kipok/NeMo-Skills) pipeline for inference with code execution and generating the synthetic data. HuggingFace's SFTTrainer was used for fine tuning, as the NeMo Skills pipeline was too difficult to work with. It took 30 minutes to fine tune on an RTX3090.


##### Sample Input:
```
System:
You're an expert Python programmer and mathematician.
Help the user to solve this problem using code when necessary.
Make sure to put the answer (and only answer) inside \boxed{}.

User:
High school graduating class is made up of 466 students. There are 212 more girls than boys. How many boys are in the class?

Assistant:

```

###### Sample Output:
```
Let's solve this problem using Python code.
<llm-code>
total_students = 466
more_girls = 212
number_of_girls = total_students - more_girls
number_of_boys = number_of_girls / 2
number_of_boys
</llm-code>
<llm-code-output>
127.0
</llm-code-output>
Thus the number of boys in the class is \boxed{127}.
```
The code interpreter halts generation at `</llm-code>`, inserts the result of the executed code in `<llm-code-output></llm-code-output>`, then continues generation. If there is no code interpreter, it will just guess numbers for the output.


Notes from previous model cards:
> We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.

#### Eval

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64388bdd43d932c4623e4983/H07dGzwOfzcvP1GFA1GUq.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64388bdd43d932c4623e4983/Qr7rvIms3AL67jltHBXnr.png)

| Checkpoint | Coverage  |
|------------|-----------|
| 1600       | 0.890244  |
| 2200       | 0.890244  |
| 2400       | 0.890244  |
| **2600**       | 0.878049  |
| 1200       | 0.878049  |
| 2800       | 0.853659  |
| 2000       | 0.853659  |
| 800        | 0.841463  |
| 1000       | 0.829268  |
| 1800       | 0.829268  |
| 1400       | 0.817073  |
| mistral    | 0.804878  |
| 3000       | 0.780488  |
| 600        | 0.768293  |
| 400        | 0.731707  |
| 200        | 0.682927  |
| 0          | 0.000000  |

Note that `checkpoint_0` is the base model and `checkpoint_mistral` is OpenMath-Mistral-7B-v0.1-hf. Also note that after 800 steps the fine tuned model had better coverage than the much larger teacher model.

The zero shot performance is _not good_™, but this model could be used to quickly generate synthetic data since the coverage is decent. The uploaded model is checkpoint-2.6k (best zero-shot performance and top 4 coverage).


People involved in creating this fine tune: 
- Coulton Theuer [theuerc@umich.edu]
- Bret Ellenbogen [bretelle@umich.edu]
- Victoria Chang [vgc@umich.edu]