It's incredible!
Nice work, thanx for sharing!
Can u share code?
glad you like it! you can replicate it with wassname's notebook: https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af
Thanx for answer! What is the difference between your models?
just kept retrying until there was only 2 failures out of 128, I plan to do one with ~500 inst too
released both 500 and 1000 now, though have not had the time yet to try them and see if the quality of the responses scales together with the instruction amount:
https://huggingface.co/Apel-sin/llama-3-8B-ortho-by-edgerunners-v4_exl2_8.0bpw
Sometimes (stably on some queries) the model suddenly cuts the message. But in my short and subjective tests, this is the best version so far
Have you ever used cognitivecomputations/Llama-3-8B-Instruct-abliterated-v2 model? In my short tests, it works more stable and predictable.
What GPU are you using for experiments?
I did not use it; h100 sxm
Try it, it's really interesting. Based on: https://huggingface.co/failspy/llama-3-70B-Instruct-abliterated/blob/main/ortho_cookbook.ipynb
Unfortunately, I don't understand how it works :(
on quick glance; that's just wassname's notebook copy pasted, it uses the default csv instead of 3000 instructs.
Can u share your "3000 instructs" dataset? It's very interesting to see :)
it's not my own for now, I just concat'd bunch of toxic DPO datasets; working on a dedicated one however and will open source it when it's ready