AttributeError: 'FalconMambaCausalLMOutput' object has no attribute 'past_key_values'
Running on FastChat, I get this:
AttributeError: 'FalconMambaCausalLMOutput' object has no attribute 'past_key_values'
Hi
@surak
Thanks for the issue !
This is because fastchat seems that they did not add support for Falconmamba models. Feel free to raise an issue directly there: https://github.com/lm-sys/FastChat so that they can add the support for FalconMamba architecture
Thanks! The same is true with the latest vLLM:
ERROR 09-26 12:12:14 worker_base.py:464] ValueError: Model architectures ['FalconMambaForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OlmoeForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'SolarForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'LlavaOnevisionForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'UltravoxModel', 'MllamaForConditionalGeneration', 'BartModel', 'BartForConditionalGeneration'] [repeated 2x across cluster]
Which inference engine do you recommend?
Hi
@surak
There is an ongoing issue to add Falcon Mamba support on vLLM: https://github.com/vllm-project/vllm/issues/7478 I will post a message there to see what is the current status. For other inference engines we do support llama.cpp
Looks like same error with tgi
too.
2024-10-08T09:22:10.199681Z ERROR warmup{max_input_length=4095 max_prefill_tokens=4145 max_total_tokens=4096 max_batch_size=None}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:54: Server error: 'FalconMambaCausalLMOutput' object has no attribute 'past_key_values'
Error: Backend(Warmup(Generation("'FalconMambaCausalLMOutput' object has no attribute 'past_key_values'")))
Falcon Mamba support is on its way to vLLM thanks to @DhiyaEddine :) https://github.com/vllm-project/vllm/pull/9325 !
Falcon Mamba is now supported on vLLM - for now you need to install vLLM from source in order to use it with falcon mamba