Spaces:
Running
on
Zero
Running
on
Zero
Update README.md
Browse files
README.md
CHANGED
@@ -9,3 +9,100 @@ app_file: app.py
|
|
9 |
pinned: false
|
10 |
short_description: SText to Audio(Sound SFX) Generator
|
11 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
pinned: false
|
10 |
short_description: SText to Audio(Sound SFX) Generator
|
11 |
---
|
12 |
+
## TangoFlux: Text-to-Audio Generation System
|
13 |
+
|
14 |
+
TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts.
|
15 |
+
|
16 |
+
### Key Features
|
17 |
+
|
18 |
+
**1. Advanced Audio Generation**
|
19 |
+
- Converts detailed text descriptions into realistic audio
|
20 |
+
- Supports complex soundscapes with multiple elements
|
21 |
+
- Generates audio up to 30 seconds in duration
|
22 |
+
- Produces 44.1kHz high-quality audio output
|
23 |
+
|
24 |
+
**2. Flexible Generation Controls**
|
25 |
+
- **Steps (10-100)**: Controls generation quality vs speed
|
26 |
+
- **Guidance Scale (1-10)**: Adjusts how closely the audio follows the prompt
|
27 |
+
- **Duration (1-30s)**: Sets the length of generated audio
|
28 |
+
|
29 |
+
**3. Diverse Audio Capabilities**
|
30 |
+
- Natural sounds (ocean waves, thunder, rain)
|
31 |
+
- Animal sounds (dogs barking, cats meowing, birds singing)
|
32 |
+
- Human sounds (laughter, speaking, whistling, snoring)
|
33 |
+
- Mechanical sounds (engines, vehicles, machinery)
|
34 |
+
- Complex soundscapes (multiple layered sounds)
|
35 |
+
|
36 |
+
**4. Technical Architecture**
|
37 |
+
- Uses flow matching for efficient generation
|
38 |
+
- CLAP-ranked preference optimization for quality
|
39 |
+
- GPU-accelerated inference with CUDA support
|
40 |
+
- Transformer-based text encoding
|
41 |
+
- Optimized for fast generation with @spaces.GPU
|
42 |
+
|
43 |
+
### How It Works
|
44 |
+
|
45 |
+
1. **Text Input**: Describe the desired audio in natural language
|
46 |
+
2. **Parameter Adjustment**: Fine-tune generation settings
|
47 |
+
3. **AI Processing**: The model interprets text and generates corresponding audio
|
48 |
+
4. **Audio Output**: Download or play the generated WAV file
|
49 |
+
|
50 |
+
### Example Use Cases
|
51 |
+
- **Film & Video Production**: Create custom sound effects and ambiences
|
52 |
+
- **Game Development**: Generate dynamic environmental sounds
|
53 |
+
- **Podcast Production**: Add realistic background sounds
|
54 |
+
- **Music Production**: Create unique sound textures and effects
|
55 |
+
- **Educational Content**: Generate illustrative audio examples
|
56 |
+
- **Accessibility**: Convert text descriptions to audio experiences
|
57 |
+
|
58 |
+
The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes.
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
## TangoFlux: ν
μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν
|
63 |
+
|
64 |
+
TangoFluxλ ν
μ€νΈ μ€λͺ
μ κ³ νμ§ μ€λμ€λ‘ λ³ννλ μ΅μ²¨λ¨ ν
μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν
μ
λλ€. νλ‘μ° λ§€μΉκ³Ό CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν κΈ°μ μ κΈ°λ°μΌλ‘ ꡬμΆλμ΄, μμ°μ΄ ν둬ννΈλ‘λΆν° λΉ λ₯΄κ³ μ νν μ€λμ€ ν©μ±μ μ 곡ν©λλ€.
|
65 |
+
|
66 |
+
### μ£Όμ κΈ°λ₯
|
67 |
+
|
68 |
+
**1. κ³ κΈ μ€λμ€ μμ±**
|
69 |
+
- μμΈν ν
μ€νΈ μ€λͺ
μ νμ€μ μΈ μ€λμ€λ‘ λ³ν
|
70 |
+
- μ¬λ¬ μμκ° ν¬ν¨λ 볡μ‘ν μ¬μ΄λμ€μΌμ΄ν μ§μ
|
71 |
+
- μ΅λ 30μ΄ κΈΈμ΄μ μ€λμ€ μμ±
|
72 |
+
- 44.1kHz κ³ νμ§ μ€λμ€ μΆλ ₯
|
73 |
+
|
74 |
+
**2. μ μ°ν μμ± μ μ΄**
|
75 |
+
- **Steps (10-100)**: μμ± νμ§ λ μλ μ‘°μ
|
76 |
+
- **Guidance Scale (1-10)**: ν둬ννΈ μ€μλ μ‘°μ
|
77 |
+
- **Duration (1-30μ΄)**: μμ± μ€λμ€ κΈΈμ΄ μ€μ
|
78 |
+
|
79 |
+
**3. λ€μν μ€λμ€ μμ± λ₯λ ₯**
|
80 |
+
- μμ°μ (νλ, μ²λ₯, λΉ)
|
81 |
+
- λλ¬Ό μ리 (κ° μ§λ μ리, κ³ μμ΄ μΈμ, μ μ§μ κ·)
|
82 |
+
- μΈκ° μ리 (μμ, λ§νκΈ°, ννλ, μ½κ³¨μ΄)
|
83 |
+
- κΈ°κ³μ (μμ§, μ°¨λ, κΈ°κ³λ₯)
|
84 |
+
- λ³΅ν© μ¬μ΄λμ€μΌμ΄ν (μ¬λ¬ μΈ΅μ μ리 μ‘°ν©)
|
85 |
+
|
86 |
+
**4. κΈ°μ μ ꡬ쑰**
|
87 |
+
- ν¨μ¨μ μΈ μμ±μ μν νλ‘μ° λ§€μΉ μ¬μ©
|
88 |
+
- νμ§ ν₯μμ μν CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν
|
89 |
+
- CUDA μ§μ GPU κ°μ μΆλ‘
|
90 |
+
- νΈλμ€ν¬λ¨Έ κΈ°λ° ν
μ€νΈ μΈμ½λ©
|
91 |
+
- @spaces.GPUλ‘ λΉ λ₯Έ μμ± μ΅μ ν
|
92 |
+
|
93 |
+
### μλ λ°©μ
|
94 |
+
|
95 |
+
1. **ν
μ€νΈ μ
λ ₯**: μνλ μ€λμ€λ₯Ό μμ°μ΄λ‘ μ€λͺ
|
96 |
+
2. **λ§€κ°λ³μ μ‘°μ **: μμ± μ€μ λ―ΈμΈ μ‘°μ
|
97 |
+
3. **AI μ²λ¦¬**: λͺ¨λΈμ΄ ν
μ€νΈλ₯Ό ν΄μνκ³ ν΄λΉ μ€λμ€ μμ±
|
98 |
+
4. **μ€λμ€ μΆλ ₯**: μμ±λ WAV νμΌ λ€μ΄λ‘λ λλ μ¬μ
|
99 |
+
|
100 |
+
### νμ© μμ
|
101 |
+
- **μν λ° λΉλμ€ μ μ**: λ§μΆ€ν μ¬μ΄λ ν¨κ³Ό λ° λΆμκΈ°μ μμ±
|
102 |
+
- **κ²μ κ°λ°**: λμ νκ²½μ μμ±
|
103 |
+
- **νμΊμ€νΈ μ μ**: νμ€μ μΈ λ°°κ²½μ μΆκ°
|
104 |
+
- **μμ
μ μ**: λ
νΉν μ¬μ΄λ ν
μ€μ²μ ν¨κ³Ό μμ±
|
105 |
+
- **κ΅μ‘ μ½ν
μΈ **: μ€λͺ
μ© μ€λμ€ μμ μμ±
|
106 |
+
- **μ κ·Όμ±**: ν
μ€νΈ μ€λͺ
μ μ€λμ€ κ²½νμΌλ‘ λ³ν
|
107 |
+
|
108 |
+
μ΄ μμ€ν
μ λ¨μν λ¨μΌ μ리λΆν° 볡μ‘ν λ€μΈ΅ μ¬μ΄λμ€μΌμ΄νκΉμ§ λ€μν μ€λμ€ μμ± κΈ°λ₯μ 보μ¬μ£Όλ 20κ° μ΄μμ μ¬μ ꡬμ±λ μμ λ₯Ό ν¬ν¨νκ³ μμ΅λλ€.
|