llava-hf
/

LLaVA-NeXT-Video-7B-hf

Video-Text-to-Text

llava_next_video

Inference Endpoints

Model card Files Files and versions Community

loss and grad.norm are 0?

#10

by arunkarkii - opened Nov 24

Nov 24

I've been attempting to fine-tune the LLaVA-NeXT-Video-7B-hf using LoRa and have been noticing that loss and grad.norm are always 0. Initially I was under the impression that it might've been my hyperparameters but despite all my efforts, these values never increment nor decrement.

I've noticed this Google Colab using the same methods also has the same problem as I do (loss = 0, grad.norm = 0): https://colab.research.google.com/drive/139XypY8_wdLgyLXYE_Zve7Hjd809fVpK?usp=sharing

I vividly remember attempting to fine-tune the model last week and experienced no difficulties.

Any input would be greatly appreciated.

RaushanTurganbay

Llava Hugging Face org Nov 24

@arunkarkii hey! Can you try to increase the max length in your training script as we updated yesterday the llava configs so that the total length of input now includes the image embeddings. So each input needs at least 1k or more tokens to not truncate the video tokens from the input

Also please make sure you have the latest transformers installed, at least v4.46. See for more info https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/discussions/37#674304aa5809de4a7b8d7c44

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment