Gpagejr12 commited on
Commit
984c220
·
verified ·
1 Parent(s): 90b9b20

Delete demos_audiogen_demo.ipynb

Browse files
Files changed (1) hide show
  1. demos_audiogen_demo.ipynb +0 -175
demos_audiogen_demo.ipynb DELETED
@@ -1,175 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {},
6
- "source": [
7
- "# AudioGen\n",
8
- "Welcome to AudioGen's demo jupyter notebook. Here you will find a series of self-contained examples of how to use AudioGen in different settings.\n",
9
- "\n",
10
- "First, we start by initializing AudioGen. For now, we provide only a medium sized model for AudioGen: `facebook/audiogen-medium` - 1.5B transformer decoder. \n",
11
- "\n",
12
- "**Important note:** This variant is different from the original AudioGen model presented at [\"AudioGen: Textually-guided audio generation\"](https://arxiv.org/abs/2209.15352) as the model architecture is similar to MusicGen with a smaller frame rate and multiple streams of tokens, allowing to reduce generation time."
13
- ]
14
- },
15
- {
16
- "cell_type": "code",
17
- "execution_count": null,
18
- "metadata": {},
19
- "outputs": [],
20
- "source": [
21
- "from audiocraft.models import AudioGen\n",
22
- "\n",
23
- "model = AudioGen.get_pretrained('facebook/audiogen-medium')"
24
- ]
25
- },
26
- {
27
- "cell_type": "markdown",
28
- "metadata": {},
29
- "source": [
30
- "Next, let us configure the generation parameters. Specifically, you can control the following:\n",
31
- "* `use_sampling` (bool, optional): use sampling if True, else do argmax decoding. Defaults to True.\n",
32
- "* `top_k` (int, optional): top_k used for sampling. Defaults to 250.\n",
33
- "* `top_p` (float, optional): top_p used for sampling, when set to 0 top_k is used. Defaults to 0.0.\n",
34
- "* `temperature` (float, optional): softmax temperature parameter. Defaults to 1.0.\n",
35
- "* `duration` (float, optional): duration of the generated waveform. Defaults to 10.0.\n",
36
- "* `cfg_coef` (float, optional): coefficient used for classifier free guidance. Defaults to 3.0.\n",
37
- "\n",
38
- "When left unchanged, AudioGen will revert to its default parameters."
39
- ]
40
- },
41
- {
42
- "cell_type": "code",
43
- "execution_count": null,
44
- "metadata": {},
45
- "outputs": [],
46
- "source": [
47
- "model.set_generation_params(\n",
48
- " use_sampling=True,\n",
49
- " top_k=250,\n",
50
- " duration=5\n",
51
- ")"
52
- ]
53
- },
54
- {
55
- "cell_type": "markdown",
56
- "metadata": {},
57
- "source": [
58
- "Next, we can go ahead and start generating sound using one of the following modes:\n",
59
- "* Audio continuation using `model.generate_continuation`\n",
60
- "* Text-conditional samples using `model.generate`"
61
- ]
62
- },
63
- {
64
- "cell_type": "markdown",
65
- "metadata": {},
66
- "source": [
67
- "### Audio Continuation"
68
- ]
69
- },
70
- {
71
- "cell_type": "code",
72
- "execution_count": null,
73
- "metadata": {},
74
- "outputs": [],
75
- "source": [
76
- "import math\n",
77
- "import torchaudio\n",
78
- "import torch\n",
79
- "from audiocraft.utils.notebook import display_audio\n",
80
- "\n",
81
- "def get_bip_bip(bip_duration=0.125, frequency=440,\n",
82
- " duration=0.5, sample_rate=16000, device=\"cuda\"):\n",
83
- " \"\"\"Generates a series of bip bip at the given frequency.\"\"\"\n",
84
- " t = torch.arange(\n",
85
- " int(duration * sample_rate), device=\"cuda\", dtype=torch.float) / sample_rate\n",
86
- " wav = torch.cos(2 * math.pi * 440 * t)[None]\n",
87
- " tp = (t % (2 * bip_duration)) / (2 * bip_duration)\n",
88
- " envelope = (tp >= 0.5).float()\n",
89
- " return wav * envelope"
90
- ]
91
- },
92
- {
93
- "cell_type": "code",
94
- "execution_count": null,
95
- "metadata": {},
96
- "outputs": [],
97
- "source": [
98
- "# Here we use a synthetic signal to prompt the generated audio.\n",
99
- "res = model.generate_continuation(\n",
100
- " get_bip_bip(0.125).expand(2, -1, -1), \n",
101
- " 16000, ['Whistling with wind blowing', \n",
102
- " 'Typing on a typewriter'], \n",
103
- " progress=True)\n",
104
- "display_audio(res, 16000)"
105
- ]
106
- },
107
- {
108
- "cell_type": "code",
109
- "execution_count": null,
110
- "metadata": {},
111
- "outputs": [],
112
- "source": [
113
- "# You can also use any audio from a file. Make sure to trim the file if it is too long!\n",
114
- "prompt_waveform, prompt_sr = torchaudio.load(\"../assets/sirens_and_a_humming_engine_approach_and_pass.mp3\")\n",
115
- "prompt_duration = 2\n",
116
- "prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\n",
117
- "output = model.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr, progress=True)\n",
118
- "display_audio(output, sample_rate=16000)"
119
- ]
120
- },
121
- {
122
- "cell_type": "markdown",
123
- "metadata": {},
124
- "source": [
125
- "### Text-conditional Generation"
126
- ]
127
- },
128
- {
129
- "cell_type": "code",
130
- "execution_count": null,
131
- "metadata": {},
132
- "outputs": [],
133
- "source": [
134
- "from audiocraft.utils.notebook import display_audio\n",
135
- "\n",
136
- "output = model.generate(\n",
137
- " descriptions=[\n",
138
- " 'Subway train blowing its horn',\n",
139
- " 'A cat meowing',\n",
140
- " ],\n",
141
- " progress=True\n",
142
- ")\n",
143
- "display_audio(output, sample_rate=16000)"
144
- ]
145
- },
146
- {
147
- "cell_type": "code",
148
- "execution_count": null,
149
- "metadata": {},
150
- "outputs": [],
151
- "source": []
152
- }
153
- ],
154
- "metadata": {
155
- "kernelspec": {
156
- "display_name": "Python 3 (ipykernel)",
157
- "language": "python",
158
- "name": "python3"
159
- },
160
- "language_info": {
161
- "codemirror_mode": {
162
- "name": "ipython",
163
- "version": 3
164
- },
165
- "file_extension": ".py",
166
- "mimetype": "text/x-python",
167
- "name": "python",
168
- "nbconvert_exporter": "python",
169
- "pygments_lexer": "ipython3",
170
- "version": "3.9.7"
171
- }
172
- },
173
- "nbformat": 4,
174
- "nbformat_minor": 2
175
- }