How can I fine tune this further?
I'd like to fine tune this on my own spectrograms with more diverse data sets - particular interested in tuning it with more vocal music, as the results for this aren't brilliant at the moment.
I also have the same question, how can we fine tune this?
I also have the same question, how can we fine tune this?
I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1U
I also have the same question, how can we fine tune this?
I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1U
Using dreambooth? It might be okay, but we of course need an extra step of making spectrograms.
Interesting! Thanks for sharing the video.
I won't lie I have zero intution for how much compute it takes to fine tune.
Looks like the guy in the video uses Colab to fine tune for an hour - guess we'll just have to try this ourselves :)
It feels like it would be quicker to fine-tune on top of the existing riffusion model, rather than fine tune stable diffusion from scratch. I don't know if they've published how long it took to fine-tune it?
I also have the same question, how can we fine tune this?
I'm going to guess you just have to fine-tune it like Stable Diffusion.
so this might help https://youtu.be/g9ibLuhXi1UUsing dreambooth? It might be okay, but we of course need an extra step of making spectrograms.
This might also work https://www.sonicvisualiser.org/
yeah I've already got a few spectrograms up my sleeve... can share some Python code that generates spectrograms from WAVs if anyone's interested?
Currently trying to train a model to score vocal performances in a singing competition that his a big historical score database, so have a big dataset of WAVs (and scores).
yeah I've already got a few spectrograms up my sleeve... can share some Python code that generates spectrograms from WAVs if anyone's interested?
Currently trying to train a model to score vocal performances in a singing competition that his a big historical score database, so have a big dataset of WAVs (and scores).
You can use librosa
. There are some very good examples on kaggle.
Additional fine tuning and data information has been added to the model-card. This was trained using approaches similar to hugging face examples, but fine-tuning can be achieved with very small datasets using a dreambooth approach.
Thanks for your replies and guidance. How did you made spectrograms? Because I found they're a little bit different than the output of typical audio visualization software I used before.