convert to GGUF format?

#8
by kalle07 - opened

any chance to do ? ;)

HITsz-Text Machine Group org

We apologize for our limited familiarity with the GGUF format model.
If you would like to perform a format conversion of GGUF, please feel free to communicate or commit a pull.

all home users use GGUF :)

this is the main branch for converting
https://github.com/ggml-org/llama.cpp

you need a compatibe format.

so llama has gguf
https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF
and much more mistral, qwen, deepseek ...

and embedders too
https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF
https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF

i can not help that much ...

HITsz-Text Machine Group org

this is the main branch for converting
https://github.com/ggml-org/llama.cpp

Thank you for your suggestion. I will consider attempting a conversion to GGUF.

HITsz-Text Machine Group org

I attempted the conversion, but encountered an error: ERROR:hf-to-gguf:Model Qwen2Model is not supported
It appears that currently llama.cpp only supports the Qwen2ForCausalLM model and does not support the Qwen2Model without the lm_head.

I will need some time to investigate how to modify it before attempting the conversion again.

all fine :)

HITsz-Text Machine Group org

@kalle07

hi, Mueller

We have uploaded several models in GGUF format under the project HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5-GGUF (different files involve different precision conversion formats).

We converted the KaLM-embedding-multilingual-mini-instruct-v1.5 Huggingface model from Qwen2Model to Qwen2ForCausalLM, and then used the llama.cpp script to convert it to GGUF format. Therefore, we are not yet certain if this model architecture functions correctly for embedding loading (although other models like GTE seem to use a similar architecture).

Could you please test and evaluate whether the model parameters are functioning correctly?

@YanshekWoo

you are fast ;)

Unfortunately, it doesn't work straight away ...
it takes a while to get in touch with the programmers to find out whether it's the model or the programming . . .
it is recognized but the embedding process does not start
maybe it is depend on qwen... all other embedder are "bert"

if you have time you can also take a look here, its an gguf header viewer/editor (i dont know about it)
https://huggingface.co/spaces/CISCai/gguf-editor

you can compare yours with some of my collection... maybe you get an idea
https://huggingface.co/kalle07/embedder_collection

HITsz-Text Machine Group org

Unfortunately, it doesn't work straight away ...
it takes a while to get in touch with the programmers to find out whether it's the model or the programming . . .
it is recognized but the embedding process does not start
maybe it is depend on qwen... all other embedder are "bert"

Thank you for your efforts.
We may need to conduct further testing and validation.
Additionally, we will also explore other model formats as much as possible.

HITsz-Text Machine Group org

Well, I have conducted a preliminary test on the GGUF model in this repository.
After installing llama.cpp, it runs correctly. I obtained the embedding vectors by running the test with the following command:

./llama-embedding \
  --batch-size 512 \
  --ctx-size 512 \
  -m KaLM-embedding-multilingual-mini-instruct-v1.5-GGUF/model.f32.gguf \
  --pooling mean \
  -p "this is a test sentence for llama cpp"

The vectors encoded by this method have a cumulative error of 0.00021034753131761276 across all dimensions compared to the model from Huggingface, which might be an acceptable level of discrepancy I think.

HITsz-Text Machine Group org
./llama-embedding \
  --batch-size 512 \
  --ctx-size 512 \
  -m KaLM-embedding-multilingual-mini-instruct-v1.5-GGUF/model.f32.gguf \
  --pooling mean \
  -p "this is a test sentence for llama cpp"

GGUF

The following is the runtime log of llama.cpp:

register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz)
load_backend: failed to find ggml_backend_init in llama.cpp/build/bin/libggml-cpu.so
build: 4826 (3ccbfe5a) with cc (GCC) 8.5.0 20210514 (TencentOS 8.5.0-18) for x86_64-redhat-linux (debug)
llama_model_loader: loaded meta data with 25 key-value pairs and 290 tensors from model.f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   4:                           general.basename str              = Qwen2
llama_model_loader: - kv   5:                         general.size_label str              = 0.5B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 24
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 0
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {% for message in messages %}{% if lo...
llama_model_loader: - kv  24:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  290 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = all F32
print_info: file size   = 1.84 GiB (32.00 BPW) 
load: special tokens cache size = 3
load: token to piece cache size = 0.9308 MB
print_info: arch             = qwen2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 896
print_info: n_layer          = 24
print_info: n_head           = 14
print_info: n_head_kv        = 2
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 7
print_info: n_embd_k_gqa     = 128
print_info: n_embd_v_gqa     = 128
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 4864
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 1B
print_info: model params     = 494.03 M
print_info: vocab type       = BPE
print_info: n_vocab          = 151936
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151643 '<|endoftext|>'
print_info: EOT token        = 151643 '<|endoftext|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors:   CPU_Mapped model buffer size =  1884.59 MiB
..........................................................................
llama_init_from_model: n_seq_max     = 1
llama_init_from_model: n_ctx         = 512
llama_init_from_model: n_ctx_per_seq = 512
llama_init_from_model: n_batch       = 512
llama_init_from_model: n_ubatch      = 512
llama_init_from_model: flash_attn    = 0
llama_init_from_model: freq_base     = 1000000.0
llama_init_from_model: freq_scale    = 1
llama_init_from_model: n_ctx_per_seq (512) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 512, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1
llama_kv_cache_init:        CPU KV buffer size =     6.00 MiB
llama_init_from_model: KV self size  =    6.00 MiB, K (f16):    3.00 MiB, V (f16):    3.00 MiB
llama_init_from_model:        CPU  output buffer size =     0.00 MiB
llama_init_from_model:        CPU compute buffer size =   302.26 MiB
llama_init_from_model: graph nodes  = 849
llama_init_from_model: graph splits = 1
common_init_from_params: setting dry_penalty_last_n to ctx_size = 512
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 5 (n_threads_batch = 5) / 10 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VNNI = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 
main: last token in the prompt is not SEP
main: 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header
batch_decode: n_tokens = 8, n_seq = 1

embedding 0: -0.005617 -0.019870  0.007216 -0.018676 -0.015974 -0.007706 -0.030192 -0.015380 -0.021032  0.019494 -0.034109  0.007551  0.032696 -0.026132  0.005154  0.011971 -0.084249  0.052203 -0.038795  0.013623 -0.068079  0.040654 -0.020324  0.022545 -0.037819  0.020438  0.001824 -0.064099 -0.047595  0.038570 -0.010898 -0.010271 -0.011100 -0.044097  0.033119  0.037557 -0.002348  0.032553  0.011929  0.005502  0.025088  0.003978  0.006904 -0.035083 -0.010470 -0.034227 -0.013858 -0.035703  0.014180 -0.025907 -0.026851  0.010911 -0.021055  0.000067 -0.002218 -0.052512 -0.057080  0.013732  0.012660  0.045274 -0.026766  0.240441  0.034132  0.011154  0.226650 -0.024238 -0.007814  0.027466  0.009281 -0.021542 -0.004340 -0.021386  0.004752  0.033506  0.014920  0.003992  0.009111  0.025031  0.020139 -0.036925 -0.012059  0.000137  0.011283  0.018176 -0.010696  0.042692 -0.006491  0.013400  0.001548  0.011421 -0.025938  0.044228  0.020244  0.005695  0.029718 -0.017172 -0.014550 -0.000964 -0.021772  0.007865 -0.031828 -0.010778  0.003041  0.013125  0.009505 -0.003231  0.036203 -0.019547  0.004541 -0.007354  0.022178  0.006011  0.027195 -0.004928  0.022300 -0.018572  0.005070  0.002185 -0.014515  0.012633  0.016932  0.001676  0.056015 -0.012552  0.009997  0.019066  0.009447  0.020058 -0.013706  0.020886  0.027766 -0.006052 -0.022245 -0.021614 -0.014305 -0.020005  0.020341  0.014045 -0.014578 -0.008030  0.012796  0.006812 -0.007478 -0.001743 -0.016854  0.034287 -0.031516  0.005336 -0.003554 -0.003763 -0.021620 -0.006162  0.010351 -0.003973  0.019446  0.008245  0.113991 -0.006209 -0.066679 -0.006042 -0.001526 -0.013945 -0.038080 -0.030242  0.011369  0.010969 -0.056905 -0.013191  0.018406 -0.012196  0.100413 -0.003803  0.047638 -0.052544  0.016229 -0.032888 -0.008713 -0.012568 -0.010421  0.030263  0.005434 -0.003263 -0.023139  0.011722 -0.018269 -0.017781 -0.017527 -0.029290  0.018505  0.001415  0.014066  0.012369 -0.002930  0.001423  0.041169  0.025884  0.037923  0.030894 -0.028028  0.051546 -0.005923 -0.011091 -0.021656  0.011613 -0.027665 -0.013124 -0.036293 -0.014018 -0.063891 -0.027315 -0.026443 -0.006250  0.014383 -0.001479 -0.056613  0.001095  0.026627 -0.000944 -0.030141 -0.001225  0.008129  0.248631  0.015677 -0.031157  0.021244 -0.015295 -0.013410  0.029758 -0.012815  0.004726  0.046954  0.014392  0.028247  0.028521 -0.023322 -0.004665  0.003804 -0.009556 -0.022984 -0.001934  0.018092 -0.002682 -0.017452 -0.007649 -0.021728  0.027157 -0.024658 -0.007079 -0.015441 -0.003006 -0.004402 -0.028493  0.006587 -0.007931 -0.022351  0.005302 -0.021761 -0.031655  0.029501  0.025734  0.000642 -0.001442 -0.022538  0.034000  0.029379  0.010521 -0.050287 -0.004443 -0.000685 -0.009870 -0.001232 -0.007148  0.015507  0.009327  0.026938 -0.019117  0.026526  0.010721  0.029157  0.032474 -0.051724 -0.014679  0.005629  0.001107 -0.005225 -0.001200 -0.020244  0.020685 -0.000165 -0.012582  0.014664  0.005036 -0.019009 -0.006418 -0.003324 -0.045237 -0.003849  0.005666 -0.015978 -0.011380 -0.022220 -0.002280  0.019090 -0.024317 -0.020747 -0.027213  0.024720  0.018327 -0.023268 -0.006169 -0.027234  0.001093 -0.015938  0.032284  0.001322 -0.023150 -0.007599 -0.012956 -0.006195  0.014020  0.001698  0.002744  0.005406  0.006648  0.003490 -0.006738 -0.014581  0.010782 -0.013065 -0.036367 -0.001319  0.012296  0.035180  0.039817  0.000123  0.001552  0.017111  0.012783  0.028645  0.017416 -0.008514  0.003262 -0.020166 -0.015713 -0.042732 -0.002901 -0.002392  0.003525  0.010508 -0.026582  0.007347 -0.002226 -0.014452  0.032119  0.004608 -0.011655  0.008659  0.004121 -0.009991  0.027236  0.032909 -0.013406 -0.010420  0.030114 -0.040390 -0.013009  0.045277  0.003138  0.015846 -0.004817 -0.014542 -0.016995 -0.015571 -0.022336  0.000652  0.008659  0.003647  0.030347 -0.004236  0.019518 -0.007802  0.005945 -0.029225  0.020781  0.008442  0.023562  0.010636  0.017550 -0.009493  0.013518  0.041164 -0.021737 -0.005231 -0.015161  0.008169 -0.008486 -0.020019 -0.025472 -0.009455  0.028613  0.002884  0.001704  0.002021  0.006508  0.034076 -0.032708  0.002081  0.011794 -0.358627 -0.003396 -0.012406  0.021330  0.011522  0.027008 -0.006293 -0.019328  0.003104  0.006715  0.007459  0.002751  0.024373  0.005379 -0.029243  0.015721  0.012940 -0.017350  0.023347  0.011787 -0.000529  0.079726 -0.009040 -0.000276  0.000789 -0.001583  0.005701  0.005396  0.015551  0.013393 -0.019744  0.007403  0.014565  0.037828 -0.019929 -0.009646 -0.021970 -0.014830  0.002508 -0.014607 -0.001000 -0.077304  0.035121 -0.002140  0.015090 -0.024016 -0.029027  0.011078  0.010452  0.023174  0.000090 -0.025405  0.024855 -0.013592 -0.216733  0.045035  0.005948 -0.004701  0.033120  0.009768 -0.000656 -0.000624 -0.040595  0.003741  0.002131  0.071467 -0.011703 -0.011033 -0.002513  0.007965 -0.004810 -0.015782 -0.013540  0.000663  0.006253  0.022323 -0.038397 -0.004674  0.007017  0.011784  0.006644  0.007504  0.000000  0.002546  0.037059  0.019085  0.019502  0.016024  0.024690  0.036413 -0.038088  0.007672  0.026648  0.010480 -0.006988 -0.031500  0.002951  0.040907  0.033623  0.028169  0.036273 -0.003436  0.007460  0.020984  0.050378  0.077265  0.012230 -0.036533 -0.015271  0.030179 -0.042372  0.019406  0.003865  0.002644 -0.012561 -0.028489 -0.000402 -0.012689 -0.020233  0.017512  0.027022 -0.014631 -0.000434 -0.036729  0.049228 -0.002222  0.023333 -0.022051  0.021496 -0.024179  0.020848 -0.037961  0.004548 -0.000319 -0.027322  0.026532  0.021305 -0.029285 -0.012464 -0.002898 -0.017293  0.029080 -0.001370 -0.000069 -0.017206 -0.015163  0.017233 -0.023382 -0.002775  0.025073 -0.049416 -0.046540 -0.014044 -0.056435  0.027473  0.010392  0.012139 -0.001258 -0.001119  0.012418 -0.022989 -0.018151 -0.028062  0.008998 -0.011936 -0.009467 -0.008216 -0.027132  0.046030  0.027485  0.059422 -0.016048  0.030681  0.032383  0.019369 -0.017205  0.013314 -0.004900 -0.009567 -0.002062 -0.045611 -0.022293 -0.038517  0.003023  0.018243  0.007150 -0.007390  0.015162 -0.024271  0.029279 -0.007550 -0.030207  0.004360 -0.365857  0.005882 -0.005050 -0.036975  0.011628  0.021058 -0.023035  0.012861  0.028630  0.010338 -0.012606 -0.017875 -0.028207  0.006685  0.031675 -0.011309 -0.023728 -0.004065  0.035603 -0.030251  0.029831  0.022795 -0.012442  0.011129  0.007191 -0.000478 -0.009001 -0.012683 -0.022595  0.019913 -0.024347  0.017669  0.034818 -0.020829  0.002344  0.007547 -0.020343 -0.012401 -0.001203 -0.015515  0.005628 -0.008277 -0.021448 -0.012650  0.003852 -0.011257 -0.013688  0.006220 -0.001710 -0.002581 -0.007556 -0.023357 -0.032169 -0.051290 -0.037534 -0.015768  0.011715  0.014255 -0.039310 -0.001915 -0.007593 -0.015476 -0.000399 -0.005438 -0.005709 -0.028625 -0.002236 -0.015581  0.044567 -0.002666  0.013548 -0.016670  0.007760 -0.017653  0.014772  0.030649  0.026002 -0.003641  0.001947 -0.009069  0.003349  0.000091  0.027386 -0.020086  0.009576  0.010376  0.002962 -0.039752 -0.002895  0.156908  0.021292 -0.014902  0.009720  0.022725  0.008652 -0.010268  0.021037  0.023687 -0.013690  0.003067 -0.021830  0.016671 -0.028010 -0.012623  0.031568  0.016489  0.021678  0.009317  0.013080  0.010384 -0.015328 -0.004561 -0.041641  0.006372 -0.032361 -0.031051 -0.013893  0.022432 -0.013150 -0.022239 -0.010641  0.011278  0.013376  0.011085  0.025439 -0.043333  0.053947 -0.034277 -0.000973 -0.010825 -0.014607  0.018868 -0.017369 -0.039586 -0.000656 -0.041166 -0.039249  0.018274  0.011828  0.005536 -0.030555 -0.014202 -0.027146  0.042323  0.003639  0.042261  0.020552  0.001367 -0.020050 -0.027661  0.033719 -0.017982 -0.008464  0.011777 -0.006478 -0.018065  0.005417  0.013769 -0.010551 -0.010975  0.015174 -0.018635  0.001440  0.008257  0.023767  0.026802  0.016065 -0.028166  0.018631 -0.004433 -0.001053  0.035733 -0.011873 -0.010595 -0.020404 -0.044702  0.000888  0.063007 -0.006687 -0.013402  0.002370 -0.036532 -0.005683  0.022250  0.003164  0.015157  0.015208  0.003077  0.054498 -0.027395  0.004026  0.008425 -0.032174 -0.015680  0.002199 -0.043691  0.003855 -0.030771 -0.031354 -0.021120 -0.011773  0.010275  0.016070  0.012510 -0.027697 -0.017227  0.028703  0.024185 -0.017291 -0.036084  0.023574 -0.032079 -0.011543 -0.047453 -0.014749 -0.006997 -0.031346 -0.000659  0.010592 -0.006548  0.026844  0.017979 -0.000166  0.033410 -0.021006  0.007278 -0.014654 -0.028600 -0.039145 -0.003032 -0.010971 -0.011285 -0.012691 -0.000897 -0.004542 -0.016813  0.005858  0.013192 -0.010779  0.016515 -0.020045  0.010913  0.003377  0.016918 -0.058516  0.015476  0.007361  0.005792  0.018753 -0.001358  0.136355 -0.005914  0.013665 -0.010866  0.013500 -0.028329 -0.035792 -0.029685 -0.015896  0.001282 -0.003509  0.004344 -0.004689  0.024120  0.023163 -0.026339 -0.019659 -0.014681  0.016899  0.025041 -0.008244  0.005899  0.025252  0.008044  0.035538  0.001999  0.035375 -0.016333  0.060742 -0.004311  0.007077  0.046187 -0.015406 -0.012033 -0.014907 -0.004995  0.011808  0.028273  0.016329  0.035711 -0.028299 -0.012884 -0.046488  0.033066 -0.017859 

llama_perf_context_print:        load time =   68518.11 ms
llama_perf_context_print: prompt eval time =     548.03 ms /     8 tokens (   68.50 ms per token,    14.60 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =     555.06 ms /     9 tokens

Huggingface & sentence-transformers

The encoded result of huggingface is:

-5.57697145e-03 -1.98730174e-02  7.22556980e-03 -1.86812021e-02  -1.59614719e-02 -7.72509351e-03 -3.01991403e-02 -1.53750358e-02  -2.10340135e-02  1.94924530e-02 -3.40934135e-02  7.53673865e-03   3.26957628e-02 -2.61022113e-02  5.15349489e-03  1.19782509e-02  -8.42278823e-02  5.22327758e-02 -3.88104953e-02  1.36200106e-02  -6.81082755e-02  4.06441055e-02 -2.03027818e-02  2.25604996e-02  -3.78140658e-02  2.04304792e-02  1.82246906e-03 -6.40756190e-02  -4.75939251e-02  3.85923944e-02 -1.08847823e-02 -1.02679981e-02  -1.10795191e-02 -4.40871045e-02  3.31121385e-02  3.75693068e-02  -2.34073424e-03  3.25505771e-02  1.19109955e-02  5.49513660e-03   2.51086671e-02  3.98109481e-03  6.89815078e-03 -3.51020843e-02  -1.04470123e-02 -3.42466533e-02 -1.38593521e-02 -3.57159562e-02   1.41777918e-02 -2.58986428e-02 -2.68382281e-02  1.09213693e-02  -2.10487247e-02  6.54289979e-05 -2.22613523e-03 -5.24990335e-02  -5.70895709e-02  1.37300855e-02  1.26756206e-02  4.52725738e-02  -2.67530158e-02  2.40445167e-01  3.41307297e-02  1.11620296e-02   2.26649165e-01 -2.42452864e-02 -7.80050224e-03  2.74598412e-02   9.29638185e-03 -2.15656590e-02 -4.36038198e-03 -2.13836581e-02   4.75854799e-03  3.35093886e-02  1.49079422e-02  4.01665224e-03   9.10849217e-03  2.50420943e-02  2.01584082e-02 -3.69223393e-02  -1.20561458e-02  1.42877238e-04  1.12698451e-02  1.81805976e-02  -1.06973574e-02  4.26982976e-02 -6.49857661e-03  1.34049142e-02   1.55478483e-03  1.14426091e-02 -2.59461738e-02  4.42498401e-02   2.02334449e-02  5.69382403e-03  2.96979416e-02 -1.71532203e-02  -1.45623656e-02 -9.82111203e-04 -2.17724387e-02  7.87042640e-03  -3.18450890e-02 -1.07864179e-02  3.06761451e-03  1.31238783e-02   9.51057021e-03 -3.21750576e-03  3.62032242e-02 -1.95494760e-02   4.54380969e-03 -7.36792153e-03  2.21951380e-02  6.02101116e-03   2.71941982e-02 -4.93130973e-03  2.22848412e-02 -1.85675044e-02   5.07238507e-03  2.18385761e-03 -1.44953895e-02  1.26400217e-02   1.69351175e-02  1.68947864e-03  5.60117327e-02 -1.25632258e-02   9.99260694e-03  1.90427378e-02  9.44491569e-03  2.00470090e-02  -1.37149058e-02  2.08880212e-02  2.77626570e-02 -6.05098717e-03  -2.22213659e-02 -2.16036811e-02 -1.43106114e-02 -1.99965239e-02   2.03542132e-02  1.40532572e-02 -1.45753669e-02 -8.04468989e-03   1.27925063e-02  6.81551127e-03 -7.47127272e-03 -1.72407413e-03  -1.68585423e-02  3.42764221e-02 -3.15014087e-02  5.33294166e-03  -3.54403700e-03 -3.77018354e-03 -2.16424670e-02 -6.17139181e-03   1.03439959e-02 -3.99222784e-03  1.94566939e-02  8.26581661e-03   1.13969833e-01 -6.20758440e-03 -6.66790679e-02 -6.02290640e-03  -1.52268517e-03 -1.39446221e-02 -3.80835496e-02 -3.02304719e-02   1.13672093e-02  1.09620495e-02 -5.69039956e-02 -1.31847709e-02   1.84113085e-02 -1.21928398e-02  1.00435428e-01 -3.81564256e-03   4.76389900e-02 -5.25613427e-02  1.62280183e-02 -3.28798182e-02  -8.71803053e-03 -1.25687737e-02 -1.04282741e-02  3.02694496e-02   5.44682797e-03 -3.25071113e-03 -2.31300630e-02  1.17300088e-02  -1.82678606e-02 -1.77552402e-02 -1.75458454e-02 -2.92972699e-02   1.84895042e-02  1.42284064e-03  1.40535347e-02  1.23894867e-02  -2.92930100e-03  1.43549195e-03  4.11476716e-02  2.59145107e-02   3.79197225e-02  3.08965296e-02 -2.80330051e-02  5.15487082e-02  -5.93384681e-03 -1.11153303e-02 -2.16616243e-02  1.16072027e-02  -2.76754610e-02 -1.31209679e-02 -3.62827852e-02 -1.40160443e-02  -6.38801977e-02 -2.73119807e-02 -2.64322199e-02 -6.26668520e-03   1.43819461e-02 -1.50322844e-03 -5.66346124e-02  1.08333665e-03   2.66205892e-02 -9.54671821e-04 -3.01309880e-02 -1.22544065e-03   8.13654158e-03  2.48654991e-01  1.56702194e-02 -3.11396271e-02   2.12434344e-02 -1.52876657e-02 -1.33979376e-02  2.97607984e-02  -1.28122494e-02  4.73988149e-03  4.69535738e-02  1.43949287e-02   2.82487664e-02  2.85314545e-02 -2.33043935e-02 -4.67049237e-03   3.79528967e-03 -9.54367407e-03 -2.29835305e-02 -1.93948357e-03   1.80765744e-02 -2.69613485e-03 -1.74651984e-02 -7.64335738e-03  -2.17167195e-02  2.71764621e-02 -2.46457700e-02 -7.06881797e-03  -1.54379904e-02 -2.99909920e-03 -4.41800524e-03 -2.84979250e-02   6.58861920e-03 -7.92315044e-03 -2.23399810e-02  5.30284783e-03  -2.17359960e-02 -3.16483602e-02  2.94856615e-02  2.57354528e-02   6.39239326e-04 -1.44539052e-03 -2.25404408e-02  3.39910388e-02   2.93597039e-02  1.05090365e-02 -5.03189862e-02 -4.44520824e-03  -6.77050499e-04 -9.86775383e-03 -1.24587992e-03 -7.14954175e-03   1.55252069e-02  9.32221208e-03  2.69465167e-02 -1.91160124e-02   2.65186951e-02  1.07092094e-02  2.91562509e-02  3.24639529e-02  -5.17363064e-02 -1.46750053e-02  5.63439261e-03  1.09133881e-03  -5.22624934e-03 -1.20662327e-03 -2.02620197e-02  2.06801500e-02  -1.51681132e-04 -1.25444150e-02  1.46678602e-02  5.04902471e-03  -1.90128945e-02 -6.43163687e-03 -3.32771125e-03 -4.52281870e-02  -3.84433335e-03  5.67629794e-03 -1.59876905e-02 -1.13902800e-02  -2.22079065e-02 -2.28079525e-03  1.90855637e-02 -2.43371930e-02  -2.07333602e-02 -2.72013601e-02  2.47170068e-02  1.83353201e-02  -2.32759248e-02 -6.18695887e-03 -2.72174906e-02  1.09003845e-03  -1.59440972e-02  3.22777629e-02  1.31914346e-03 -2.31631175e-02  -7.60893244e-03 -1.29640587e-02 -6.21106150e-03  1.40329050e-02   1.70334743e-03  2.72474578e-03  5.38880145e-03  6.65341737e-03   3.48450197e-03 -6.72638509e-03 -1.45758139e-02  1.07695153e-02  -1.30698550e-02 -3.63578424e-02 -1.30930054e-03  1.22959446e-02   3.51713449e-02  3.98006961e-02  1.13533039e-04  1.55860325e-03   1.71262696e-02  1.27939945e-02  2.86516659e-02  1.74097605e-02  -8.49448424e-03  3.24302749e-03 -2.01688763e-02 -1.56975500e-02  -4.27202769e-02 -2.89990613e-03 -2.39464082e-03  3.52220028e-03   1.05191972e-02 -2.65851356e-02  7.35319313e-03 -2.22821161e-03  -1.44366790e-02  3.21129337e-02  4.58458113e-03 -1.16603561e-02   8.65670480e-03  4.11597127e-03 -9.98925045e-03  2.72387732e-02   3.29088941e-02 -1.34018203e-02 -1.04332659e-02  3.01088933e-02  -4.03900556e-02 -1.30102905e-02  4.52716239e-02  3.14603955e-03   1.58288386e-02 -4.82115103e-03 -1.45483082e-02 -1.69961210e-02  -1.55637544e-02 -2.23341957e-02  6.54461386e-04  8.66429787e-03   3.65598709e-03  3.03569399e-02 -4.23529977e-03  1.95291899e-02  -7.78836804e-03  5.91803342e-03 -2.92320661e-02  2.07669064e-02   8.44980590e-03  2.35637836e-02  1.06343143e-02  1.75589714e-02  -9.50045232e-03  1.35114659e-02  4.11546752e-02 -2.17181705e-02  -5.23178140e-03 -1.51565159e-02  8.15375336e-03 -8.51122383e-03  -2.00185869e-02 -2.54807323e-02 -9.47080553e-03  2.86119413e-02   2.86588143e-03  1.70071109e-03  2.02810438e-03  6.50638249e-03   3.40775736e-02 -3.26975323e-02  2.07829406e-03  1.18065933e-02  -3.58599395e-01 -3.39135784e-03 -1.23927398e-02  2.13247854e-02   1.15310391e-02  2.69975830e-02 -6.29391195e-03 -1.93247944e-02   3.12104821e-03  6.72180951e-03  7.45858345e-03  2.76091881e-03   2.43706349e-02  5.37990080e-03 -2.92320792e-02  1.57182608e-02   1.29471524e-02 -1.73353795e-02  2.33381428e-02  1.17946249e-02  -5.29733661e-04  7.97420144e-02 -9.04254336e-03 -2.79240601e-04   7.88676727e-04 -1.58570660e-03  5.68120042e-03  5.37936389e-03   1.55465584e-02  1.33815510e-02 -1.97546165e-02  7.38881808e-03   1.45640429e-02  3.78354378e-02 -1.99110936e-02 -9.63829085e-03  -2.19621602e-02 -1.48309292e-02  2.51102610e-03 -1.46127148e-02  -9.82812606e-04 -7.72897527e-02  3.51236053e-02 -2.14141072e-03   1.50942262e-02 -2.39966065e-02 -2.90010180e-02  1.10786464e-02   1.04462150e-02  2.31839232e-02  5.59888649e-05 -2.54088473e-02   2.48623230e-02 -1.35960579e-02 -2.16761321e-01  4.50234152e-02   5.94861899e-03 -4.71180864e-03  3.31092663e-02  9.78821050e-03  -6.63879968e-04 -6.18885329e-04 -4.05981988e-02  3.74979666e-03   2.13280693e-03  7.14484230e-02 -1.16921822e-02 -1.10173663e-02  -2.51139514e-03  7.95071386e-03 -4.78974218e-03 -1.57661233e-02  -1.35513758e-02  6.82382088e-04  6.25080802e-03  2.23193001e-02  -3.83937359e-02 -4.69671935e-03  7.01672817e-03  1.17895491e-02   6.64133811e-03  7.51076080e-03  2.50098539e-14  2.54897564e-03   3.70785557e-02  1.90741308e-02  1.95058845e-02  1.60104912e-02   2.46973895e-02  3.64090763e-02 -3.80794182e-02  7.68633978e-03   2.66483147e-02  1.04762921e-02 -6.99653057e-03 -3.15089896e-02   2.95955292e-03  4.09021638e-02  3.36203054e-02  2.81690136e-02   3.62439863e-02 -3.42736510e-03  7.44277053e-03  2.09930670e-02   5.03731929e-02  7.72599056e-02  1.22150583e-02 -3.65557708e-02  -1.52818095e-02  3.01804915e-02 -4.23824079e-02  1.93953887e-02   3.88219743e-03  2.64858897e-03 -1.25439689e-02 -2.84983255e-02  -3.97911062e-04 -1.26786912e-02 -2.02478636e-02  1.75319891e-02   2.70273983e-02 -1.46281980e-02 -4.23700607e-04 -3.67127731e-02   4.92322929e-02 -2.22426746e-03  2.33252328e-02 -2.20560282e-02   2.15004981e-02 -2.41807438e-02  2.08528414e-02 -3.79475206e-02   4.53150412e-03 -3.24005348e-04 -2.73147672e-02  2.65358333e-02   2.13151220e-02 -2.92801894e-02 -1.24357874e-02 -2.90694879e-03  -1.72976572e-02  2.90884636e-02 -1.37843634e-03 -6.86925996e-05  -1.72013827e-02 -1.51781812e-02  1.72352884e-02 -2.33864207e-02  -2.77984189e-03  2.50768010e-02 -4.94102165e-02 -4.65542860e-02  -1.40231699e-02 -5.64215742e-02  2.74814367e-02  1.04006650e-02   1.21208513e-02 -1.25906966e-03 -1.11691828e-03  1.24153467e-02  -2.29914784e-02 -1.81472190e-02 -2.80725472e-02  8.99037533e-03  -1.19393393e-02 -9.47025791e-03 -8.22779816e-03 -2.71231271e-02   4.60346229e-02  2.74902172e-02  5.94225451e-02 -1.60337370e-02   3.06872092e-02  3.23893726e-02  1.93719789e-02 -1.72075666e-02   1.32973436e-02 -4.88887448e-03 -9.56196245e-03 -2.06263294e-03  -4.56209891e-02 -2.22842079e-02 -3.84920128e-02  3.02932481e-03   1.82594880e-02  7.15171825e-03 -7.40471948e-03  1.51616344e-02  -2.42563076e-02  2.92783268e-02 -7.54585722e-03 -3.02437153e-02   4.35039308e-03 -3.65870476e-01  5.87632041e-03 -5.03976736e-03  -3.69793139e-02  1.16269719e-02  2.10643746e-02 -2.30154060e-02   1.28473220e-02  2.86443066e-02  1.03484886e-02 -1.26074171e-02  -1.78815778e-02 -2.82099359e-02  6.69456320e-03  3.16779166e-02  -1.13213286e-02 -2.37054154e-02 -4.06096270e-03  3.55964489e-02  -3.02624647e-02  2.98325606e-02  2.27947813e-02 -1.24319419e-02   1.11329453e-02  7.17289560e-03 -4.68577928e-04 -9.01663955e-03  -1.26789752e-02 -2.26100180e-02  1.99131910e-02 -2.43605748e-02   1.76652055e-02  3.48193459e-02 -2.08304971e-02  2.34199432e-03   7.54433218e-03 -2.03236714e-02 -1.23901358e-02 -1.18959090e-03  -1.55339455e-02  5.64059475e-03 -8.28771200e-03 -2.14608181e-02  -1.26480907e-02  3.84531706e-03 -1.12481723e-02 -1.37060238e-02   6.21478027e-03 -1.70033018e-03 -2.57943873e-03 -7.56196724e-03  -2.33655628e-02 -3.21789421e-02 -5.12899384e-02 -3.75279635e-02  -1.57932360e-02  1.17192315e-02  1.42671987e-02 -3.93056497e-02  -1.93070737e-03 -7.60392798e-03 -1.54707478e-02 -3.91932437e-04  -5.41904196e-03 -5.71417203e-03 -2.86057293e-02 -2.23317137e-03  -1.55804427e-02  4.45881821e-02 -2.66625150e-03  1.35298995e-02  -1.66769233e-02  7.74697587e-03 -1.76555328e-02  1.47665907e-02   3.06465775e-02  2.59970445e-02 -3.63652571e-03  1.95384631e-03  -9.06683691e-03  3.35231517e-03  1.05711959e-04  2.73783486e-02  -2.00982168e-02  9.59783327e-03  1.03657683e-02  2.96842726e-03  -3.97516638e-02 -2.90771318e-03  1.56910628e-01  2.13067587e-02  -1.49057107e-02  9.71206464e-03  2.27087475e-02  8.65180977e-03  -1.02699650e-02  2.10348349e-02  2.36905068e-02 -1.37017611e-02   3.05452570e-03 -2.18167473e-02  1.66680701e-02 -2.79920790e-02  -1.26106096e-02  3.16077806e-02  1.64948553e-02  2.16694679e-02   9.31371003e-03  1.31008979e-02  1.03718564e-02 -1.52983693e-02  -4.59562521e-03 -4.16474976e-02  6.37389766e-03 -3.23472545e-02  -3.10474560e-02 -1.38889411e-02  2.24262085e-02 -1.31572094e-02  -2.22280659e-02 -1.06591368e-02  1.12740519e-02  1.33866603e-02   1.10798739e-02  2.54484117e-02 -4.33534868e-02  5.39668202e-02  -3.42806503e-02 -9.58397519e-04 -1.08321374e-02 -1.46095892e-02   1.88584216e-02 -1.73665639e-02 -3.95855792e-02 -6.69015397e-04  -4.11577411e-02 -3.92425321e-02  1.82627495e-02  1.18297962e-02   5.53702749e-03 -3.05436235e-02 -1.42100183e-02 -2.71489453e-02   4.22992408e-02  3.63834645e-03  4.22542877e-02  2.05455553e-02   1.34507078e-03 -2.00515632e-02 -2.76522767e-02  3.37025337e-02  -1.79692544e-02 -8.47871974e-03  1.17707048e-02 -6.47335965e-03  -1.80761106e-02  5.42148435e-03  1.37814293e-02 -1.05411997e-02  -1.09713348e-02  1.51661355e-02 -1.86297763e-02  1.43355876e-03   8.26299842e-03  2.37574819e-02  2.67851781e-02  1.60630159e-02  -2.81628817e-02  1.86274555e-02 -4.43611247e-03 -1.05288555e-03   3.57172228e-02 -1.18779130e-02 -1.06139146e-02 -2.04108730e-02  -4.46909070e-02  8.93001445e-04  6.29926771e-02 -6.68423483e-03  -1.34202559e-02  2.37500970e-03 -3.65344584e-02 -5.66629646e-03   2.22534463e-02  3.14968685e-03  1.51576707e-02  1.51978321e-02   3.07043386e-03  5.44956066e-02 -2.73998752e-02  4.02815314e-03   8.43189564e-03 -3.21962237e-02 -1.56753920e-02  2.20673066e-03  -4.37045433e-02  3.87525349e-03 -3.07655055e-02 -3.13471220e-02  -2.11180709e-02 -1.17721204e-02  1.02738086e-02  1.60621069e-02   1.25185885e-02 -2.76972521e-02 -1.72367655e-02  2.87038162e-02   2.41743512e-02 -1.72909312e-02 -3.60901020e-02  2.35837847e-02  -3.20589989e-02 -1.15353176e-02 -4.74469103e-02 -1.47666531e-02  -7.00585777e-03 -3.13338786e-02 -6.51749491e-04  1.05900960e-02  -6.54525030e-03  2.68614050e-02  1.80011615e-02 -1.60778422e-04   3.34317461e-02 -2.09925994e-02  7.26128835e-03 -1.46562336e-02  -2.86063664e-02 -3.91123295e-02 -3.02121881e-03 -1.09604942e-02  -1.12981126e-02 -1.26912091e-02 -9.05065041e-04 -4.55572829e-03  -1.68177411e-02  5.84228011e-03  1.31934509e-02 -1.07685691e-02   1.65246148e-02 -2.00486723e-02  1.08937258e-02  3.38076800e-03   1.69339664e-02 -5.85281625e-02  1.54638644e-02  7.34537002e-03   5.78314485e-03  1.87482703e-02 -1.37373060e-03  1.36345446e-01  -5.91392769e-03  1.36568248e-02 -1.08666597e-02  1.35099553e-02  -2.83424798e-02 -3.57985720e-02 -2.97144018e-02 -1.59036256e-02   1.27062725e-03 -3.50649050e-03  4.35552327e-03 -4.68825456e-03   2.41288748e-02  2.31635626e-02 -2.63379682e-02 -1.96390934e-02  -1.46760615e-02  1.68808531e-02  2.50399020e-02 -8.26567505e-03   5.90568502e-03  2.52416413e-02  8.04648828e-03  3.55328098e-02   2.00533518e-03  3.53588611e-02 -1.63028073e-02  6.07426129e-02  -4.28963779e-03  7.08375126e-03  4.61763442e-02 -1.54176736e-02  -1.20304842e-02 -1.49037121e-02 -4.97936411e-03  1.18117137e-02   2.82803699e-02  1.63390301e-02  3.57139893e-02 -2.82943826e-02  -1.29027087e-02 -4.64942046e-02  3.30686234e-02 -1.78626720e-02

Both results are encoded from sentence this is a test sentence for llama cpp.

if i take a look into logs on my system
i can see that is not handled like an embedder ... so seems it is not recognized from the software (i use LM-Studio with Anything-LLM)

if you have time and are able, you can change the header, which corresponds to the embedding models in the main parts ... but in detail i dont know which are importand ;)

HITsz-Text Machine Group org

if i take a look into logs on my system
i can see that is not handled like an embedder ... so seems it is not recognized from the software (i use LM-Studio with Anything-LLM)

Oh, alright. The model's GGUF was converted using llama.cpp, and I've only tested it with llama.cpp.
I'll test it with LM-Studio when I have some free time to see what the issue might be.

HITsz-Text Machine Group org

I experimented with LM Studio and found that it can successfully call the embedding interface and return results. However, the embedding results seem to be INCOREECT.

It appears that LM Studio cannot specify the pooling method for embeddings. Typically, frameworks default to last-token pooling, whereas our KaLM-Embedding requires mean pooling. Consequently, LM Studio may not be suitable for directly deploying KaLM Embedding.

I will further investigate to confirm the reasons behind the incorrect embedding results from LM Studio. Additionally, we aim to release a last-token embedding model in the future, which will better accommodate various frameworks.

here are my logs (only first lines) if you load a model, maybe it helps ;)

yours:
Developer Logs
2025-03-14 10:07:54 [DEBUG]
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
2025-03-14 10:07:54 [DEBUG]
CUDA : ARCHS = 500,610,750,800 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
2025-03-14 10:07:54 [DEBUG]
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15209 MiB free
2025-03-14 10:07:54 [DEBUG]
llama_model_loader: loaded meta data with 25 key-value pairs and 290 tensors from:

usual LLM model:
Developer Logs
2025-03-14 10:18:51 [DEBUG]
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
2025-03-14 10:18:51 [DEBUG]
CUDA : ARCHS = 500,610,750,800 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
2025-03-14 10:18:51 [DEBUG]
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15209 MiB free
2025-03-14 10:18:51 [DEBUG]
llama_model_loader: loaded meta data with 40 key-value pairs and 339 tensors from

any other (from LM recognized) embedder model (shown in text embedder list):
Developer Logs
2025-03-14 10:10:25 [DEBUG]
[INFO] [PaniniRagEngine] Loading model into embedding engine...
[WARNING] Batch size (512) is < context length (8192). Resetting batch size to context length to avoid unexpected behavior.
2025-03-14 10:10:25 [DEBUG]
[INFO] [LlamaEmbeddingEngine] Loading model from path:

HITsz-Text Machine Group org

It appears that he can operate normally, just like llama.cpp.
However, the root cause lies in the fact that converting GGUF can only use the Qwen2ForCausalLM type, which will be launched as LLM by default in LM Studio.
Although its default pooling method can be called, it seems to be lasttoken pooling and does not support custom modifications.

The best solution at the moment is to raise an issue with LM Studio to see if it can support mean pooling, as this is a capability supported by llama.cpp.

i see ...
Thank you for your research ...

i think you are more qualified to give an accurate description of the error ... if not i will try.
i will then be happy to vote positive!

btw. some other main types are also not supported like Jina,Qwen,Gemma, maybe they start ;)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment