MiniCPM-V/docs/faqs.md · Demo750/XGBoost_Gaze at daa5a2dcb2b8ee4258618a8fdb7dfe2b35a923ae

FAQs

Q: How to choose between sampling or beam search for inference

In various scenarios, the quality of results obtained from beam search and sampling decoding strategies can vary. You can determine your decoding strategy based on the following aspects:

If you have the following needs, consider using sampling decoding:

You require faster inference speed.
You wish for a streaming generation approach.
Your task necessitates some open-ended responses.

If your task is about providing deterministic answers, you might want to experiment with beam search to see if it can achieve better outcomes.

Q: How to ensure that the model generates results of sufficient length

We've observed that during multi-language inference on MiniCPM-V 2.6, the generation sometimes ends prematurely. You can improve the results by passing a min_new_tokens parameter.

res = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer,
    min_new_tokens=100
)