Edit model card

Author of this model: Microsoft, 2024. License: MIT.

Link to the original card: https://huggingface.co/microsoft/rho-math-7b-interpreter-v0.1

Prompt template: ChatML (according to llama.cpp's server)? Mistral (according to tokenizer_config.json)? Alpaca (according to text-generation-webui)? All three seem to work.

Context length: ?

According to their paper on arXiv, rho-math-7b-v0.1 is a continued pretraining on Mistral-7B, while their 1B model is a continued pretraining on TinyLlama-1.1B.

My experience

Unlike the non-interpreter variant of this model, this -interpreter variant always answers with Python code, example:

Input: Width of circle is 3cm, what is its area?

with settings { presence_penalty: 0, frequency_penalty: 0, top_p: 1, min_p: 0, top_k: 0, temperature: 0.8 }

outputs (using unquantized gguf):

```python
from sympy import pi, Rational

def circle_area():
    """Width of circle is 3cm, what is its area?"""
    r = Rational(3, 2)  # Radius of the circle
    area = pi * r**2  # Area of the circle

    return area

result = circle_area()
print(result)
```
```output
27*pi/4
```
The area of the circle is $\boxed{\frac{27\pi}{4}}$ square cm.

??? It should have been 9*pi/4. Am I using this model wrong? Same result with temperature=0.0,top_k=1.

In comparison, the Q6_K of the non-interpreter variant of this model with temp=0.0 outputs:

The area of a circle is given by the formula A = πr^2, where r is the radius of the circle. Since the width of the circle is 3cm, the radius is half of that, or 1.5cm. Substituting this value into the formula, we get:

A = π(1.5)^2
A = π(2.25)
A = 6.9887654321

So, the area of the circle is approximately 6.99 cm^2.

(close to the expected 7.06 which deepseek-math-7b-rl.Q8_0.gguf outputs)

imatrix

I created imatrix with

./imatrix --mlock --verbosity 2 -m /tmp/rho-math-7b-interpreter-v0.1.f32.gguf -f ~/Downloads/groups_merged_forkOfArzeth.txt -c 32768 -o rho-math-7b-interpreter-v0.1.f32.ctx32768imatrix.dat

which took 1665 seconds (28 minutes) on my GTX 1660 Super and used only 1 thread on Ryzen 2600 downclocked to 3000MHz. imatrix consumed 35685 MiB of RAM (3200MHz) and 3158 MiB of VRAM.

quantize

Quantized with llama.cpp b2661 (2024-04-12), compiled with LLAMA_CUDA_FORCE_MMQ=1 (full cmd: make -j6 LLAMA_CUDA_FORCE_MMQ=1 LLAMA_CUDA=1 LLAMA_FAST=1 LLAMA_OPENBLAS=1 LLAMA_BLAS_VENDOR=OpenBLAS) for a big speed up (GTX 1660 Super doesn't have tensor cores, so it's better to use MMQ than nothing).

Downloads last month
61
GGUF
Model size
7.24B params
Architecture
llama

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for arzeth/rho-math-7b-interpreter-v0.1.imatrix-GGUF

Quantized
(4)
this model