Thank you!

#2
by Marcophono - opened

I am researching since days to find a better (a much better) SER model than the standard one from Speechbrain. Happy that I found this one! :-)

By the way: Moving to cuda isn't possible, I think?

Thank you for the positive feedback!

If you have a GPU with enough VRAM and cuda/pytorch installed, you should be able to run on GPU with a simple: "model = model.cuda()", after loading the model.

Also, be sure to move your data to GPU as well for example:

# load model
model = model.cuda()
# load data

with torch.no_grad():
        wavs = wavs.cuda(non_blocking=True).float()
        mask = mask.cuda(non_blocking=True).float()
        pred = model(wavs, mask)

Thanks again, 3loi! In the meantime I solved it with

mask = torch.ones(1, len(norm_wav)).to(device)
wavs = torch.tensor(norm_wav).unsqueeze(0).to(device)

what seems to work.

Do you think I could use runpod.io to run this model? Curious to hear your thoughts.

@shaamil101

I am unfamiliar with runpod.io. But, the model should be able to run on any cloud computing service, assuming its setup correctly. So, I don't see any reason why it shouldn't.

It seems they have 24GB up to 192GB VRAM GPU cloud computing service, which is more than enough. I am able to run this model on a RTX 3090 with 24GB VRAM just fine, with single file inference.

Sign up or log in to comment