clip_feat_dim unexpected by AsymmetricAttention.__init__()

#3
by panopstor - opened

Hi, I'm trying to hack on the code a bit to see if I can get this to run on a single <48Gb node and ran into a few problems.

image.png

clip_feat_dim from the yaml isn't expected but it is passed by **block_kwargs into AsymmetricAttention.init which won't accept it.

I was able to work around this by adding **block_kwargs to the init function so any extra kwargs are effectively ignored, but just a heads up.

Also this assert fails in t2v_synth_mochi.py:

    assert y_feat[-1].shape == (B, MAX_T5_TOKEN_LENGTH, 4096)

image.png

It seemingly matches, but the tensor.shape function returns a tensor, not a tuple:

    print(f"y_feat[-1].shape = {y_feat[-1].shape}")

Output:

    (T2VSynthMochiModel pid=3652095) y_feat[-1].shape = torch.Size([2, 256, 4096])

I temporarily corrected it to this, which works:

assert y_feat[-1].shape == torch.zeros(B, MAX_T5_TOKEN_LENGTH, 4096).shape

Edit: for anyone else trying to downsize, set num_workers = 1 in infer.py line 35, if I can get it working I'll share code. Trying 16bit and possibly bitsandbytes to see if I can get it down a bit...

Fork live here:
https://github.com/victorchall/genmoai-smol
Going to close since I got it working.

panopstor changed discussion status to closed

Both issues should now be fixed! Thanks for the detailed bug report, and lmk if you run into anything else.

Sign up or log in to comment