llama.cpp convert problem report(about `tokenizer.json`)

#2
by DataSoul - opened

I attempted to convert this model to gguf using the convert_hf_to_gguf.py script from llama.cpp, but encountered an error:

[
FileNotFoundError: File not found: F:\OpensourceAI-models\SuperNova-Medius\tokenizer.model
Exception: data did not match any variant of untagged enum ModelWrapper at line 757443 column 3
]

After downloading tokenizer.json from qwen2.5-14B, replacing the file with the same name in this model's directory with it, I was able to successfully convert the model to gguf.

I made a rough comparison of the two "tokenizer.json" files and found that they are mostly similar except for some formatting differences. This model's tokenizer.json has an additional line "ignore_merges": false, while other parts seem unchanged.

I am unsure of the reason behind this issue, nor do I know if others might encounter a similar problem. Therefore, I report it here for reference.

Arcee AI org

I appreciate the report. I’ll loop in @bartowski - as he did our GGUF conversions.

@Crystalcareai i did chat with fp16 gguf but its not doing very well pretty slow tbh

AWQ with dataset calibration?

Arcee AI org

If you update transformers and tokenizers this error should go away.

Arcee AI org

I actually did have a problem with the tokenizer but i think because my docker image had a more updated version than my main OS i got past it for the conversion, so yeah tokenizers and/or transformers definitely needs an update

Thanks for the suggestions. Then, I will close this topic later.😊

DataSoul changed discussion status to closed

Sign up or log in to comment