arcee-ai/SuperNova-Medius · llama.cpp convert problem report(about `tokenizer.json`)

Oct 12, 2024

I attempted to convert this model to gguf using the convert_hf_to_gguf.py script from llama.cpp, but encountered an error:

[
FileNotFoundError: File not found: F:\OpensourceAI-models\SuperNova-Medius\tokenizer.model
Exception: data did not match any variant of untagged enum ModelWrapper at line 757443 column 3
]

After downloading tokenizer.json from qwen2.5-14B, replacing the file with the same name in this model's directory with it, I was able to successfully convert the model to gguf.

I made a rough comparison of the two "tokenizer.json" files and found that they are mostly similar except for some formatting differences. This model's tokenizer.json has an additional line "ignore_merges": false, while other parts seem unchanged.

I am unsure of the reason behind this issue, nor do I know if others might encounter a similar problem. Therefore, I report it here for reference.

Crystalcareai

Arcee AI org Oct 12, 2024

I appreciate the report. I’ll loop in @bartowski - as he did our GGUF conversions.

gopi87

Oct 13, 2024

@Crystalcareai i did chat with fp16 gguf but its not doing very well pretty slow tbh

xellDart

Oct 13, 2024

AWQ with dataset calibration?

chargoddard

Arcee AI org Oct 13, 2024

If you update transformers and tokenizers this error should go away.

bartowski

Arcee AI org Oct 13, 2024

I actually did have a problem with the tokenizer but i think because my docker image had a more updated version than my main OS i got past it for the conversion, so yeah tokenizers and/or transformers definitely needs an update

DataSoul

Oct 14, 2024

Thanks for the suggestions. Then, I will close this topic later.😊

DataSoul changed discussion status to closed Oct 14, 2024