nvidia/Llama-3_1-Nemotron-51B-Instruct · Modified llama.cpp to generate GGUFs for Llama-3

After two weeks of on and off hacking, I successfully modified llama.cpp to convert and run Llama-3_1-Nemotron-51.
https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF

Feel free to give it a try and let me know if you find anything abnormal.

By the way, I noticed a typo/bug in tokenizer_config.json line 2055 that
"eos_token": "<|eot_id|>",
should be
"eos_token": "<|end_of_text|>",

While transformers allow config.json to override this typo but llama.cpp cannot, so that increased my debugging time...

nvidia
/

Llama-3_1-Nemotron-51B-Instruct