Congratulations!

by TomGrc - opened Feb 3, 2024

Discussion

TomGrc

Feb 3, 2024

Congratulations! Average 80.48

Light4Bear

Feb 3, 2024

@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.

DKRacingFan

Feb 3, 2024

First model to reach 80%!

LoneStriker

Feb 3, 2024

@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.

Qwen 72B is not yet supported by exl2. I'll quantize the model if/when it is supported; I've been wanting to run it with exl2 myself since it came out...

ehartford

Feb 3, 2024

Nice!

Ont

Feb 3, 2024

•

edited Feb 3, 2024

This model's derived from Qwen-72B, so take the scores with a grain of salt. Qwen is one of those base models that likely included test data in their pretraining, so apply a handicap to other models for a fair comparison.

Regardless, thanks for sharing this new model @ArkaAbacus and team @abacusai! :)

If you've the spare compute to take requests / challenges, I'm very curious to see if your training method can improve upon https://huggingface.co/allenai/tulu-2-dpo-70b, a Llama-2-70b type model, for a more direct comparison of efficacy in pushing the envelope.

gblazex

Feb 3, 2024

•

edited Feb 3, 2024

@Ont Qwen-72 is doing really good on EQ bench which is definitely not the result of training on test data.
https://eqbench.com/

Just ran the fresh correlations to Arena Elo and EQ looks really promising.

Spearman Correlations:
EQ-bench v2: 0.863
MT-bench: 0.891
Alpaca v2: 0.899

Kendall's Tau:
EQ-bench v2: 0.730
MT-bench: 0.759
Alpaca v2: 0.759

Now does this mean that the base model does well on everything? Definitely not, but it shows that it's not simply a number gymnastics model. Although whoever tried Qwen knows this already probably.

(Also notice the lot of dolphins up there on that leaderboard. I don't know how much contribution @ehartford had to this model, but Qwen + the marine biologist guy looks like a good combination to me).

Light4Bear

Feb 4, 2024

@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.

Qwen 72B is not yet supported by exl2. I'll quantize the model if/when it is supported; I've been wanting to run it with exl2 myself since it came out...

I think this is a llamafied version. It just uses a different tokenizer, so it cannot be converted to gguf but possibly exl2?

LoneStriker

Feb 4, 2024

ex2 quant fails unfortunately. Even with the llama.cpp GGUF conversion, I was able to get the model to convert, but the resulting GGUF file was not loadable for me, so I took my GGUF quants offline for now until I can figure out why it's not loading.

arvindabacus

Abacus.AI, Inc. org Feb 4, 2024

(Also notice the lot of dolphins up there on that leaderboard. I don't know how much contribution @ehartford had to this model, but Qwen + the marine biologist guy looks like a good combination to me).

This work is unrelated - led by @ArkaAbacus

Ont

Feb 4, 2024

@gblazex Thanks for mentioning EQ-Bench. Regarding the pretrained Qwen model, I was referring to some of the older tests used on the Open LLM Leaderboard. Tests created after the models have been trained offer a fair comparison.

I wonder how this Smaug-72B-v0.1 compares on EQ-Bench.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment