Spaces:
Sleeping
Sleeping
Commit
·
766c9b2
1
Parent(s):
91f17ff
Minor update
Browse files
app.py
CHANGED
@@ -9,12 +9,21 @@ demo = gr.Interface(fn=gpt.get_response, inputs=["textbox",
|
|
9 |
gr.Slider(0.1, 2.0, value=1.0),
|
10 |
gr.Dropdown(
|
11 |
["mike-chat", "mike-code", "mike-code-600m"], value="mike-chat"),
|
12 |
-
], outputs=gr.Markdown(line_breaks=True), title="Mike Chat", article="""
|
|
|
|
|
|
|
13 |
block_size: 512
|
14 |
n_layers: 12
|
15 |
n_heads: 12
|
16 |
d_model: 768
|
17 |
-
(Same as gpt-2 but without weight tying)
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
|
20 |
if __name__ == "__main__":
|
|
|
9 |
gr.Slider(0.1, 2.0, value=1.0),
|
10 |
gr.Dropdown(
|
11 |
["mike-chat", "mike-code", "mike-code-600m"], value="mike-chat"),
|
12 |
+
], outputs=gr.Markdown(line_breaks=True), title="Mike Chat", article="""
|
13 |
+
Notice: if you have a GPU, I would highly recommend cloning the space and running it locally. The CPU provided by spaces isn't very fast.
|
14 |
+
|
15 |
+
Mike is a small GPT-style language model. It was trained for about 8 hrs on my PC using fineweb-edu and open orca datasets. While it hallucinates a lot, it seems to be about on par with other LMs of its size (about 160M params). Model details:
|
16 |
block_size: 512
|
17 |
n_layers: 12
|
18 |
n_heads: 12
|
19 |
d_model: 768
|
20 |
+
(Same as gpt-2 but without weight tying)
|
21 |
+
|
22 |
+
Architecture for Mike-Code-600m:
|
23 |
+
block_size: 256
|
24 |
+
n_layers: 16
|
25 |
+
n_heads: 12
|
26 |
+
d_model: 1536""")
|
27 |
|
28 |
|
29 |
if __name__ == "__main__":
|