Robert collins's picture

15 17

Robert collins

robbiemu

·

robbiemu

AI & ML interests

None yet

Recent Activity

new activity 1 day ago

bartowski/Sailor2-20B-Chat-GGUF:Question on target-language quantization

new activity 4 days ago

sail/Sailor2-1B:context-size is 32k right?

new activity 6 days ago

sailor2/sailor2-pretrain-data-stage1:Language samples

View all activity

Organizations

robbiemu's activity

New activity in bartowski/Sailor2-20B-Chat-GGUF 1 day ago

Question on target-language quantization

#1 opened 24 days ago by

New activity in sail/Sailor2-1B 4 days ago

context-size is 32k right?

#2 opened 6 days ago by

New activity in sailor2/sailor2-pretrain-data-stage1 6 days ago

Language samples

#1 opened 6 days ago by

reacted to bartowski's post with 👍 13 days ago

Post

15738

Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

15 replies

·

New activity in sail/Sailor2-20B-Chat 13 days ago

Hi Sail team! a request for data for quantization of your model

#2 opened 13 days ago by

liked a model about 1 month ago

Qwen/QwQ-32B-Preview

Text Generation • Updated about 11 hours ago • 134k • 1.53k

liked a Space about 2 months ago

Qwen2.5 Turbo 1M Demo

liked 2 models about 2 months ago

black-forest-labs/FLUX.1-dev

Text-to-Image • Updated Aug 16, 2024 • 1.25M • • 7.93k

lucasnewman/f5-tts-mlx

Updated 30 days ago • 38

liked a Space about 2 months ago

Running on CPU Upgrade

European Leaderboard

liked 3 models 2 months ago

infly/OpenCoder-1.5B-Instruct

Text Generation • Updated Nov 14, 2024 • 3.97k • 37

bartowski/Qwen2.5-Coder-32B-Instruct-GGUF

Text Generation • Updated Nov 10, 2024 • 36.8k • 48

infly/OpenCoder-8B-Instruct

Text Generation • Updated Nov 14, 2024 • 2.97k • 180

New activity in BSC-LT/salamandra-7b 2 months ago

possible issue with tokenizer

#2 opened 3 months ago by

New activity in pkupie/mc2_corpus 2 months ago

Fine tuning question

#2 opened 2 months ago by

liked a dataset 2 months ago

pkupie/mc2_corpus

Viewer • Updated Jun 15, 2024 • 504k • 33 • 6

New activity in bartowski/Qwen2.5-32B-Instruct-GGUF 3 months ago

this model in Ollama

#5 opened 3 months ago by

updated a collection 3 months ago

salamandra

2 items • Updated Oct 18, 2024