Triangle104 commited on
Commit
b06d4b7
·
verified ·
1 Parent(s): 17f4402

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -124
README.md CHANGED
@@ -26,79 +26,29 @@ advanced capabilities in text generation, coding, mathematics, and
26
  long-context understanding. It is optimized for a wide variety of use
27
  cases, including conversational AI, structured data interpretation, and
28
  multilingual applications. It outperforms Ava 1.5 in many aspects making
29
- Athena-1 the superior model.
30
-
31
-
32
-
33
-
34
-
35
-
36
-
37
-
38
- Key Features
39
-
40
-
41
-
42
-
43
-
44
-
45
-
46
-
47
-
48
- 🚀 Enhanced Capabilities
49
-
50
-
51
-
52
-
53
- Instruction Following: Athena 1 has been fine-tuned
54
- for superior adherence to user prompts, making it ideal for chatbots,
55
- virtual assistants, and guided workflows.
56
- Coding and Mathematics: Specialized fine-tuning enhances coding problem-solving and mathematical reasoning.
57
- Long-Context Understanding: Handles input contexts up to 128K tokens and generates up to 8K tokens.
58
-
59
-
60
-
61
-
62
-
63
-
64
-
65
- 🌐 Multilingual Support
66
-
67
 
 
 
68
 
 
 
 
69
 
 
 
70
  Supports 29+ languages, including:
71
 
72
-
73
  English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
74
  Japanese, Korean, Vietnamese, Thai, Arabic, and more.
75
 
76
-
77
-
78
-
79
-
80
-
81
-
82
- 📊 Structured Data & Outputs
83
-
84
-
85
-
86
-
87
  Structured Data Interpretation: Understands and processes structured formats like tables and JSON.
88
  Structured Output Generation: Generates well-formatted outputs, including JSON, XML, and other structured formats.
89
 
90
-
91
-
92
-
93
-
94
-
95
-
96
-
97
- Model Details
98
-
99
-
100
-
101
-
102
  Base Model: Qwen/Qwen2.5-14B-Instruct
103
  Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
104
  Parameters: 14.7B total (13.1B non-embedding).
@@ -106,21 +56,10 @@ Layers: 48
106
  Attention Heads: 40 for Q, 8 for KV.
107
  Context Length: Up to 131,072 tokens.
108
 
109
-
110
-
111
-
112
-
113
-
114
-
115
-
116
- Applications
117
-
118
-
119
-
120
-
121
  Athena 1 is designed for a wide range of use cases:
122
 
123
-
124
  Conversational AI and chatbots.
125
  Code generation, debugging, and explanation.
126
  Mathematical problem-solving.
@@ -128,21 +67,10 @@ Large-document summarization and analysis.
128
  Multilingual text generation and translation.
129
  Structured data processing (e.g., tables, JSON).
130
 
131
-
132
-
133
-
134
-
135
-
136
-
137
-
138
- Quickstart
139
-
140
-
141
-
142
-
143
  Below is an example of how to use Athena 1 for text generation:
144
 
145
-
146
  huggingface-cli login
147
 
148
  # Use a pipeline as a high-level helper
@@ -160,56 +88,26 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
160
  tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-14B")
161
  model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-14B")
162
 
163
-
164
-
165
-
166
-
167
-
168
-
169
-
170
- Performance
171
-
172
-
173
-
174
-
175
  Athena 1 has been optimized for efficiency and performance on modern
176
  GPUs. For detailed evaluation metrics (e.g., throughput, accuracy, and
177
  memory requirements), refer to the Qwen2.5 performance benchmarks.
178
 
179
-
180
-
181
-
182
-
183
-
184
-
185
-
186
- Requirements
187
-
188
-
189
-
190
-
191
  To use Athena 1, ensure the following:
192
 
193
-
194
  Python >= 3.8
195
  Transformers >= 4.37.0 (to support Qwen models)
196
  PyTorch >= 2.0
197
  GPU with BF16 support for optimal performance.
198
 
199
-
200
-
201
-
202
-
203
-
204
-
205
- Citation
206
-
207
-
208
-
209
 
210
  If you use Athena 1 in your research or projects, please cite its base model Qwen2.5 as follows:
211
 
212
-
213
  @misc{qwen2.5,
214
  title = {Qwen2.5: A Party of Foundation Models},
215
  url = {https://qwenlm.github.io/blog/qwen2.5/},
 
26
  long-context understanding. It is optimized for a wide variety of use
27
  cases, including conversational AI, structured data interpretation, and
28
  multilingual applications. It outperforms Ava 1.5 in many aspects making
29
+ Athena-1 the superior model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
+ Key Features
32
+ -
33
 
34
+ 🚀 Enhanced Capabilities
35
+ -
36
+ Instruction Following: Athena 1 has been fine-tuned for superior adherence to user prompts, making it ideal for chatbots, virtual assistants, and guided workflows. Coding and Mathematics: Specialized fine-tuning enhances coding problem-solving and mathematical reasoning. Long-Context Understanding: Handles input contexts up to 128K tokens and generates up to 8K tokens.
37
 
38
+ 🌐 Multilingual Support
39
+ -
40
  Supports 29+ languages, including:
41
 
 
42
  English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
43
  Japanese, Korean, Vietnamese, Thai, Arabic, and more.
44
 
45
+ 📊 Structured Data & Outputs
46
+ -
 
 
 
 
 
 
 
 
 
47
  Structured Data Interpretation: Understands and processes structured formats like tables and JSON.
48
  Structured Output Generation: Generates well-formatted outputs, including JSON, XML, and other structured formats.
49
 
50
+ Details
51
+ -
 
 
 
 
 
 
 
 
 
 
52
  Base Model: Qwen/Qwen2.5-14B-Instruct
53
  Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
54
  Parameters: 14.7B total (13.1B non-embedding).
 
56
  Attention Heads: 40 for Q, 8 for KV.
57
  Context Length: Up to 131,072 tokens.
58
 
59
+ Applications
60
+ -
 
 
 
 
 
 
 
 
 
 
61
  Athena 1 is designed for a wide range of use cases:
62
 
 
63
  Conversational AI and chatbots.
64
  Code generation, debugging, and explanation.
65
  Mathematical problem-solving.
 
67
  Multilingual text generation and translation.
68
  Structured data processing (e.g., tables, JSON).
69
 
70
+ Quickstart
71
+ -
 
 
 
 
 
 
 
 
 
 
72
  Below is an example of how to use Athena 1 for text generation:
73
 
 
74
  huggingface-cli login
75
 
76
  # Use a pipeline as a high-level helper
 
88
  tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-14B")
89
  model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-14B")
90
 
91
+ Performance
92
+ -
 
 
 
 
 
 
 
 
 
 
93
  Athena 1 has been optimized for efficiency and performance on modern
94
  GPUs. For detailed evaluation metrics (e.g., throughput, accuracy, and
95
  memory requirements), refer to the Qwen2.5 performance benchmarks.
96
 
97
+ Requirements
98
+ -
 
 
 
 
 
 
 
 
 
 
99
  To use Athena 1, ensure the following:
100
 
 
101
  Python >= 3.8
102
  Transformers >= 4.37.0 (to support Qwen models)
103
  PyTorch >= 2.0
104
  GPU with BF16 support for optimal performance.
105
 
106
+ Citation
107
+ -
 
 
 
 
 
 
 
 
108
 
109
  If you use Athena 1 in your research or projects, please cite its base model Qwen2.5 as follows:
110
 
 
111
  @misc{qwen2.5,
112
  title = {Qwen2.5: A Party of Foundation Models},
113
  url = {https://qwenlm.github.io/blog/qwen2.5/},