Nate's Test Org

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

nateraw authored a paper over 1 year ago

Generative Disco: Text-to-Video Generation for Music Visualization

nateraw updated a model about 3 years ago

nates-test-org/cspdarknet53

nateraw updated a model about 3 years ago

nates-test-org/convit_tiny

View all activity

nates-test-org's activity

nateraw

posted an update 8 months ago

Post

3797

I just shared a blogpost on https://nateraw.com explaining the motivation + process of training nateraw/musicgen-songstarter-v0.2 - including training details, WandB logs, hparams, and notes on previous experiments.

Check it out here ⤵️
https://nateraw.com/posts/training_musicgen_songstarter.html

:) still kinda a WIP so if there's anything else you want to see, let me know.

3 replies

nateraw

posted an update 8 months ago

Post

4246

Turns out if you do a cute little hack, you can make nateraw/musicgen-songstarter-v0.2 work on vocal inputs. 👀

Now, you can hum an idea for a song and get a music sample generated with AI 🔥🔥

Give it a try: ➡️ nateraw/singing-songstarter ⬅️

It'll take your voice and try to autotune it (because let's be real, you're no michael jackson), then pass it along to the model to condition on the melody. It works surprisingly well!

ylacombe

posted an update 9 months ago

Post

6483

Yesterday, we released Parler-TTS and Data-Speech, fully open-source reproduction of work from the paper: Natural language guidance of high-fidelity text-to-speech with synthetic annotations (2402.01912)

Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).

https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-models-66164ad285ba03e8ffde214c

Parler-TTS Mini v0.1, is the first iteration Parler-TTS model trained using 10k hours of narrated audiobooks. It generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).

To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data to 50k hours of speech. The v1 release of the model will be trained on this data, as well as inference optimisations, such as flash attention and torch compile.

parler-tts/parler_tts_mini_v0.1

Data-Speech can be used for annotating speech characteristics in a large-scale setting.

parler-tts/open-source-speech-datasets-annotated-using-data-speech-661648ffa0d3d76bfa23d534

This work is both scalable and easily modifiable and will hopefully help the TTS research community explore new ways of conditionning speech synthesis.

All of the datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models.

nateraw

authored a paper over 1 year ago

Generative Disco: Text-to-Video Generation for Music Visualization

Paper • 2304.08551 • Published Apr 17, 2023 • 7

nateraw

updated 16 models about 3 years ago

AI & ML interests

Recent Activity

Team members 2

nates-test-org's activity