GSAI-ML/LLaDA-8B-Instruct · Model performance

11 days ago

•

I would like to know if anyone done some testing on this model? I have tried to run few sample prompt on it, and it seems not as good as other models like llama 3, 3.1, 3.2 and mistral. I used the same prompt format for the testing, the problem of this model is the instruction following not good, my test case is to summarize a document, I gave some instructions to the model like what information to summarize, use bullet point or list item, focus on something. Others model provided output seems to follow the instruction, some with higher quality but we can see at least the model trying to follow the format and try to capture the info I requested. But not even one bullet point/list items when using llada and not focus on anything I requested, just looks like completely ignore my instruction. Other than that, I tried to ask some coding question, it just reply something like as an AI model, can't help you about that, but others model doesn't have this issue.
Edit: After some prompt adjustment, able to get - before sentence, but the structure is still sentence/paragraph not bullet point list item like others model generate. I wonder if it has another template format I have to follow to make this work. It will be good if someone get good instruction following result and share with us how do you do it, thanks.

nieshen

GSAI-ML org 10 days ago

Thank you very much for your attention!

First of all, I would like to ask whether you are using the Base model or the Instruct model. If it is the Instruct model, the performance of LLaDA is still currently behind that of LLaMA3. There may be the following reasons for this: 1. LLaDA has not used Reinforcement Learning from Human Feedback (RLHF) yet. However, models such as LLaMA and Mistral have used RLHF, which gives these autoregressive models stronger instruction-following capabilities; 2. The data quality of our Supervised Fine-Tuning (SFT) is still lacking.

In any case, thank you very much for your reply. We would be extremely grateful if you could provide some bad cases, as this will be very helpful for us to improve LLaDA. We also look forward to the community working together to enhance the performance of LLaDA.

icoicqico

10 days ago

Thanks for the reply, I guess I will try to fine tune with some of my data.