fantaxy commited on
Commit
72a850a
Β·
verified Β·
1 Parent(s): 7abbc1b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md CHANGED
@@ -9,3 +9,100 @@ app_file: app.py
9
  pinned: false
10
  short_description: SText to Audio(Sound SFX) Generator
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  short_description: SText to Audio(Sound SFX) Generator
11
  ---
12
+ ## TangoFlux: Text-to-Audio Generation System
13
+
14
+ TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts.
15
+
16
+ ### Key Features
17
+
18
+ **1. Advanced Audio Generation**
19
+ - Converts detailed text descriptions into realistic audio
20
+ - Supports complex soundscapes with multiple elements
21
+ - Generates audio up to 30 seconds in duration
22
+ - Produces 44.1kHz high-quality audio output
23
+
24
+ **2. Flexible Generation Controls**
25
+ - **Steps (10-100)**: Controls generation quality vs speed
26
+ - **Guidance Scale (1-10)**: Adjusts how closely the audio follows the prompt
27
+ - **Duration (1-30s)**: Sets the length of generated audio
28
+
29
+ **3. Diverse Audio Capabilities**
30
+ - Natural sounds (ocean waves, thunder, rain)
31
+ - Animal sounds (dogs barking, cats meowing, birds singing)
32
+ - Human sounds (laughter, speaking, whistling, snoring)
33
+ - Mechanical sounds (engines, vehicles, machinery)
34
+ - Complex soundscapes (multiple layered sounds)
35
+
36
+ **4. Technical Architecture**
37
+ - Uses flow matching for efficient generation
38
+ - CLAP-ranked preference optimization for quality
39
+ - GPU-accelerated inference with CUDA support
40
+ - Transformer-based text encoding
41
+ - Optimized for fast generation with @spaces.GPU
42
+
43
+ ### How It Works
44
+
45
+ 1. **Text Input**: Describe the desired audio in natural language
46
+ 2. **Parameter Adjustment**: Fine-tune generation settings
47
+ 3. **AI Processing**: The model interprets text and generates corresponding audio
48
+ 4. **Audio Output**: Download or play the generated WAV file
49
+
50
+ ### Example Use Cases
51
+ - **Film & Video Production**: Create custom sound effects and ambiences
52
+ - **Game Development**: Generate dynamic environmental sounds
53
+ - **Podcast Production**: Add realistic background sounds
54
+ - **Music Production**: Create unique sound textures and effects
55
+ - **Educational Content**: Generate illustrative audio examples
56
+ - **Accessibility**: Convert text descriptions to audio experiences
57
+
58
+ The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes.
59
+
60
+ ---
61
+
62
+ ## TangoFlux: ν…μŠ€νŠΈ-투-μ˜€λ””μ˜€ 생성 μ‹œμŠ€ν…œ
63
+
64
+ TangoFluxλŠ” ν…μŠ€νŠΈ μ„€λͺ…을 κ³ ν’ˆμ§ˆ μ˜€λ””μ˜€λ‘œ λ³€ν™˜ν•˜λŠ” μ΅œμ²¨λ‹¨ ν…μŠ€νŠΈ-투-μ˜€λ””μ˜€ 생성 μ‹œμŠ€ν…œμž…λ‹ˆλ‹€. ν”Œλ‘œμš° λ§€μΉ­κ³Ό CLAP μˆœμœ„ 기반 μ„ ν˜Έλ„ μ΅œμ ν™” κΈ°μˆ μ„ 기반으둜 κ΅¬μΆ•λ˜μ–΄, μžμ—°μ–΄ ν”„λ‘¬ν”„νŠΈλ‘œλΆ€ν„° λΉ λ₯΄κ³  μ •ν™•ν•œ μ˜€λ””μ˜€ 합성을 μ œκ³΅ν•©λ‹ˆλ‹€.
65
+
66
+ ### μ£Όμš” κΈ°λŠ₯
67
+
68
+ **1. κ³ κΈ‰ μ˜€λ””μ˜€ 생성**
69
+ - μƒμ„Έν•œ ν…μŠ€νŠΈ μ„€λͺ…을 ν˜„μ‹€μ μΈ μ˜€λ””μ˜€λ‘œ λ³€ν™˜
70
+ - μ—¬λŸ¬ μš”μ†Œκ°€ ν¬ν•¨λœ λ³΅μž‘ν•œ μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„ 지원
71
+ - μ΅œλŒ€ 30초 길이의 μ˜€λ””μ˜€ 생성
72
+ - 44.1kHz κ³ ν’ˆμ§ˆ μ˜€λ””μ˜€ 좜λ ₯
73
+
74
+ **2. μœ μ—°ν•œ 생성 μ œμ–΄**
75
+ - **Steps (10-100)**: 생성 ν’ˆμ§ˆ λŒ€ 속도 쑰절
76
+ - **Guidance Scale (1-10)**: ν”„λ‘¬ν”„νŠΈ μ€€μˆ˜λ„ μ‘°μ •
77
+ - **Duration (1-30초)**: 생성 μ˜€λ””μ˜€ 길이 μ„€μ •
78
+
79
+ **3. λ‹€μ–‘ν•œ μ˜€λ””μ˜€ 생성 λŠ₯λ ₯**
80
+ - μžμ—°μŒ (νŒŒλ„, 천λ‘₯, λΉ„)
81
+ - 동물 μ†Œλ¦¬ (개 μ§–λŠ” μ†Œλ¦¬, 고양이 울음, μƒˆ 지저귐)
82
+ - 인간 μ†Œλ¦¬ (μ›ƒμŒ, λ§ν•˜κΈ°, 휘파람, 코골이)
83
+ - κΈ°κ³„μŒ (μ—”μ§„, μ°¨λŸ‰, 기계λ₯˜)
84
+ - 볡합 μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„ (μ—¬λŸ¬ 측의 μ†Œλ¦¬ μ‘°ν•©)
85
+
86
+ **4. 기술적 ꡬ쑰**
87
+ - 효율적인 생성을 μœ„ν•œ ν”Œλ‘œμš° λ§€μΉ­ μ‚¬μš©
88
+ - ν’ˆμ§ˆ ν–₯상을 μœ„ν•œ CLAP μˆœμœ„ 기반 μ„ ν˜Έλ„ μ΅œμ ν™”
89
+ - CUDA 지원 GPU 가속 μΆ”λ‘ 
90
+ - 트랜슀포머 기반 ν…μŠ€νŠΈ 인코딩
91
+ - @spaces.GPU둜 λΉ λ₯Έ 생성 μ΅œμ ν™”
92
+
93
+ ### μž‘λ™ 방식
94
+
95
+ 1. **ν…μŠ€νŠΈ μž…λ ₯**: μ›ν•˜λŠ” μ˜€λ””μ˜€λ₯Ό μžμ—°μ–΄λ‘œ μ„€λͺ…
96
+ 2. **λ§€κ°œλ³€μˆ˜ μ‘°μ •**: 생성 μ„€μ • λ―Έμ„Έ μ‘°μ •
97
+ 3. **AI 처리**: λͺ¨λΈμ΄ ν…μŠ€νŠΈλ₯Ό ν•΄μ„ν•˜κ³  ν•΄λ‹Ή μ˜€λ””μ˜€ 생성
98
+ 4. **μ˜€λ””μ˜€ 좜λ ₯**: μƒμ„±λœ WAV 파일 λ‹€μš΄λ‘œλ“œ λ˜λŠ” μž¬μƒ
99
+
100
+ ### ν™œμš© μ˜ˆμ‹œ
101
+ - **μ˜ν™” 및 λΉ„λ””μ˜€ μ œμž‘**: λ§žμΆ€ν˜• μ‚¬μš΄λ“œ 효과 및 λΆ„μœ„κΈ°μŒ 생성
102
+ - **κ²Œμž„ 개발**: 동적 ν™˜κ²½μŒ 생성
103
+ - **팟캐슀트 μ œμž‘**: ν˜„μ‹€μ μΈ 배경음 μΆ”κ°€
104
+ - **μŒμ•… μ œμž‘**: λ…νŠΉν•œ μ‚¬μš΄λ“œ ν…μŠ€μ²˜μ™€ 효과 생성
105
+ - **ꡐ윑 μ½˜ν…μΈ **: μ„€λͺ…μš© μ˜€λ””μ˜€ 예제 생성
106
+ - **μ ‘κ·Όμ„±**: ν…μŠ€νŠΈ μ„€λͺ…을 μ˜€λ””μ˜€ κ²½ν—˜μœΌλ‘œ λ³€ν™˜
107
+
108
+ 이 μ‹œμŠ€ν…œμ€ λ‹¨μˆœν•œ 단일 μ†Œλ¦¬λΆ€ν„° λ³΅μž‘ν•œ λ‹€μΈ΅ μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„κΉŒμ§€ λ‹€μ–‘ν•œ μ˜€λ””μ˜€ 생성 κΈ°λŠ₯을 λ³΄μ—¬μ£ΌλŠ” 20개 μ΄μƒμ˜ 사전 κ΅¬μ„±λœ 예제λ₯Ό ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.