Spaces:
Running
Running
File size: 1,066 Bytes
569f484 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
### FAQs
<details>
<summary>Q: How to choose between sampling or beam search for inference </summary>
In various scenarios, the quality of results obtained from beam search and sampling decoding strategies can vary. You can determine your decoding strategy based on the following aspects:
If you have the following needs, consider using sampling decoding:
1. You require faster inference speed.
2. You wish for a streaming generation approach.
3. Your task necessitates some open-ended responses.
If your task is about providing deterministic answers, you might want to experiment with beam search to see if it can achieve better outcomes.
</details>
<details>
<summary>Q: How to ensure that the model generates results of sufficient length</summary>
We've observed that during multi-language inference on MiniCPM-V 2.6, the generation sometimes ends prematurely. You can improve the results by passing a `min_new_tokens` parameter.
```python
res = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer,
min_new_tokens=100
)
```
</details>
|