ibm-granite/granite-vision-3.1-2b-preview · Use Negative Feature Layer Indices

4 days ago

There is a misalignment in the feature layers that are currently being used between transformers and vLLM (current values are correct for vLLM and off by one for transformers); In transformers, the first entry is the input embedding (here). However, in vLLM, this is not the case for the way the hidden states pool are formed (here).

In other words, the hidden states array in transformers contains 28 entries:
[emb, h0, h1, ..., h27]

while the hidden states pool in vLLM contains 27 entries:
[h0, h1, ..., h27]

The config reflects the correct values for what is used in vLLM, but is off by one in transformers. Both projects support negative indexing into the hidden states (with offset handling in vLLM, since only the deepest feature layer needed it loaded) - this PR changes the vision feature layers to use negative indices, which will fix the misalignment in transformers without changing the output in vLLM (no code changes needed).

I will also submit a PR to vLLM to add the embeddings to the hidden state pool if all hidden states are requested from the visual encoder.

Use Negative Feature Layer Indicesb04f90bf

aarbelle

IBM Granite org 3 days ago

Thanks @abrooks9944

aarbelle changed pull request status to merged 3 days ago

abrooks9944

about 10 hours ago

The model uses negative feature layers now that this PR was merged, but here is the link to the relevant PR for vLLM for positive feature layers - https://github.com/vllm-project/vllm/pull/13514

aarbelle

IBM Granite org about 10 hours ago

@abrooks9944
So future models can also use positive features?
Do we need to update anything in this model?