Why can this program successfully predict the next word by only passing in the token generated last time? No complete prompt token was passed in
#21
by
LJUN9988
- opened
i got it,because
llama have cache_k 和 cache -v