clip_feat_dim unexpected by AsymmetricAttention.__init__()
Hi, I'm trying to hack on the code a bit to see if I can get this to run on a single <48Gb node and ran into a few problems.
clip_feat_dim from the yaml isn't expected but it is passed by **block_kwargs into AsymmetricAttention.init which won't accept it.
I was able to work around this by adding **block_kwargs to the init function so any extra kwargs are effectively ignored, but just a heads up.
Also this assert fails in t2v_synth_mochi.py:
assert y_feat[-1].shape == (B, MAX_T5_TOKEN_LENGTH, 4096)
It seemingly matches, but the tensor.shape function returns a tensor, not a tuple:
print(f"y_feat[-1].shape = {y_feat[-1].shape}")
Output:
(T2VSynthMochiModel pid=3652095) y_feat[-1].shape = torch.Size([2, 256, 4096])
I temporarily corrected it to this, which works:
assert y_feat[-1].shape == torch.zeros(B, MAX_T5_TOKEN_LENGTH, 4096).shape
Edit: for anyone else trying to downsize, set num_workers = 1 in infer.py line 35, if I can get it working I'll share code. Trying 16bit and possibly bitsandbytes to see if I can get it down a bit...
Fork live here:
https://github.com/victorchall/genmoai-smol
Going to close since I got it working.
Both issues should now be fixed! Thanks for the detailed bug report, and lmk if you run into anything else.