Nathan9 commited on
Commit
b194e86
·
verified ·
1 Parent(s): a906c0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -3
README.md CHANGED
@@ -1,3 +1,125 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - audio
6
+ - music
7
+ - codec
8
+ - neural-audio
9
+ - audio-compression
10
+ license: apache-2.0
11
+ pipeline_tag: audio-to-audio
12
+ inference: false
13
+ ---
14
+
15
+ # XCodec Mini - Neural Audio Codec
16
+
17
+ ## Model Description
18
+
19
+ XCodec Mini is a state-of-the-art neural audio codec designed for high-quality music compression and reconstruction. It combines semantic and acoustic encoding approaches to achieve efficient compression while maintaining audio quality.
20
+
21
+ ### Key Features
22
+
23
+ - **Dual Encoding Architecture**
24
+ - Semantic encoder for high-level musical features
25
+ - Acoustic encoder for detailed sound information
26
+ - Multi-scale processing for efficient compression
27
+
28
+ - **Advanced Compression**
29
+ - Multiple codebooks for flexible quality/size tradeoff
30
+ - Support for 44.1kHz high-fidelity audio
31
+ - Separate processing paths for vocals and instrumentals
32
+
33
+ - **Technical Specifications**
34
+ - Input: Raw audio at 44.1kHz
35
+ - Output: Compressed representations and reconstructed audio
36
+ - Model Size: [Add total size]
37
+ - Compression Ratio: [Add typical ratio]
38
+
39
+ ## Intended Uses
40
+
41
+ - High-quality music compression
42
+ - Audio archival and storage
43
+ - Music streaming applications
44
+ - Audio processing pipelines
45
+
46
+ ## Training Data
47
+
48
+ The model was trained on a diverse dataset of music, including:
49
+ - Various genres and styles
50
+ - Vocal and instrumental tracks
51
+ - High-quality studio recordings
52
+
53
+ ## Performance and Limitations
54
+
55
+ ### Strengths
56
+ - High-quality audio reconstruction
57
+ - Efficient compression ratios
58
+ - Separate handling of vocals and instrumentals
59
+ - Support for high sample rates
60
+
61
+ ### Limitations
62
+ - Computationally intensive for real-time applications
63
+ - Requires significant GPU memory
64
+ - Best suited for offline processing
65
+ - May introduce artifacts in extreme compression settings
66
+
67
+ ## Technical Specifications
68
+
69
+ ### Model Architecture
70
+ 1. **Semantic Encoder**
71
+ - Based on HuBERT architecture
72
+ - Captures high-level musical features
73
+ - Outputs semantic tokens
74
+
75
+ 2. **Acoustic Encoder**
76
+ - Multi-scale convolutional architecture
77
+ - Processes detailed sound information
78
+ - Generates acoustic tokens
79
+
80
+ 3. **Dual Decoders**
81
+ - Separate decoders for vocals and instrumentals
82
+ - Multi-stage reconstruction process
83
+ - Quality-focused design
84
+
85
+ ### Input Requirements
86
+ - Audio Format: WAV/MP3
87
+ - Sample Rate: 44.1kHz
88
+ - Channels: Mono/Stereo
89
+ - Bit Depth: 16-bit
90
+
91
+ ### Output Format
92
+ - Reconstructed Audio: 44.1kHz WAV
93
+ - Intermediate Representations: Compressed tokens
94
+
95
+ ## Usage Guidelines
96
+
97
+ ### Hardware Requirements
98
+ - GPU: NVIDIA GPU with 8GB+ VRAM
99
+ - RAM: 16GB+ recommended
100
+ - Storage: SSD recommended for faster processing
101
+
102
+ ### Software Requirements
103
+ - Python 3.8+
104
+ - PyTorch 2.0+
105
+ - CUDA 11.0+
106
+ - Additional dependencies listed in installation guide
107
+
108
+ ## Ethical Considerations
109
+
110
+ - **Copyright**: Users should ensure they have proper rights to process copyrighted material
111
+ - **Attribution**: Proper attribution should be given when using this model
112
+ - **Data Privacy**: Consider data privacy implications when processing sensitive audio
113
+
114
+
115
+ ## Additional Information
116
+
117
+ ### Model Weights
118
+ The model requires several checkpoint files:
119
+ - Semantic Encoder: `semantic_ckpts/hf_1_325000/pytorch_model.bin`
120
+ - Vocal Decoder: `decoders/decoder_131000.pth`
121
+ - Instrumental Decoder: `decoders/decoder_151000.pth`
122
+ - Final Checkpoint: `final_ckpt/ckpt_00360000.pth`
123
+
124
+ ### Contact
125
+ For issues and questions, please use the GitHub repository's issue tracker.