ChocoLlama/ChocoLlama-2-7B-instruct · Help needed with ChocoLlama-Instruct models

Hello,
thanks for sharing this model! I have been trying to use it with both the provided code and other implementations, but unfortunately, I am encountering some issues. Specifically, the inference time for a single example is excessively long, to the point that I have had to terminate the execution. I have tested the model in different environments and even attempted fine-tuning, but the problem persists.

While I have had success fine-tuning other pre-trained models (like llama2chat and llama3instruct), the fine-tuned versions of chocollama*-instruct remain slow and produce erratic outputs, which appear to be a result of tokenization errors. I am wondering if there could be an issue during the model loading process. Could you kindly double-check the model or guide me further on how to use it? I have experienced the same issue with both this model and Llama-3-ChocoLlama-8B-Instruct.

Thank you in advance for your time and support!