How is the performance?
Hi,
Interesting work! have you benchmark its performance? Anything close to usable?
I have not had a chance to bench it yet, as I have been focusing on building a basic interface for interacting with RKLLM that is a bit more functional than the official Rockchip implementation.
If you want something that's pretty snappy and usable out of the box for this model, I would recommend: https://huggingface.co/happyme531/MiniCPM-V-2_6-rkllm
I do plan on integrated multimodal models into my app, just have not had a chance to do so yet. If you want to play around with it, you can use my repo and configure the model, although the code will need a bit of tweaking to use image input. Here is my RKLLM Gradio app: https://github.com/c0zaut/RKLLM-Gradio/
I was not able to use this model. I think it's because of the tiny context size !?
I posted an simple screenshot and got
[20:07:58.866] meet unkown shape, op name: matmul_qkv_rkllm_spilt_1, shape: 64, 5888, 128 2features matmul matmul run failed
So this is just the language component. I am going to be doing a full vision flow after this latest run. Trying to do a straight rgb string will definitely exceed the context length as text.
Try this implementation for a quick CLI app: https://huggingface.co/happyme531/MiniCPM-V-2_6-rkllm