always getting 0 in output

#3
by xubuild - opened

the model always respond with token id 0 for any input, while same prompt can get correct response from https://huggingface.co/casperhansen/mixtral-instruct-awq

tested with vllm
llm = LLM(model=model_path, quantization="awq", trust_remote_code=True, dtype="auto", enforce_eager=True, max_model_len=12288)

same for me!

Is there any fix for that?

Also happening for me. Seems to be an issue with this model. Other 8x7b awq models work perfectly fine, for example dolphin-2.7-8x7b.

@TheBloke Any update about this issue? Always get an empty output as well...

FYI,

python3 -m vllm.entrypoints.openai.api_server --model "$MODEL_NAME" --host 0.0.0.0 --port 8181 --quantization awq --dtype auto
    "choices": [
        {
            "index": 0,
            "text": "",
            "logprobs": null,
            "finish_reason": "length"
        }
    ],

same, maybe a prompt template issue?

Any update ?

ไฟบไนŸไธ€ๆ ท

Also seeing this in testing. Our vLLM setup:

sampling_params = SamplingParams(temperature=0.1, top_p=0.95)
model = 'TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ'

llm = LLM(
  model=model, 
  gpu_memory_utilization=0.7, 
  max_model_len=2048,
)

prompt็š„ๅŽŸๅ› ๏ผŒๅฏไปฅๅฐ่ฏ•ๆ”นๆˆf"""USER:{prompt}\nAssistant:"""

Same here...

Iโ€™ve heard that this version works fine with vLLM: https://huggingface.co/casperhansen/mixtral-instruct-awq

@umarbutler can confirm that's what I resorted to using instead

@jcole-laivly ๐Ÿ‘๐Ÿป I can confirm that it works for me as well.

Yes, I uploaded it since this repository has a corrupted model (somehow). Please requantize if you experience any problem.

Sign up or log in to comment