Christoph Holthaus
commited on
Commit
·
b8c846d
1
Parent(s):
d65f135
switch over to gradio "native"
Browse files- README.md +4 -8
- gradio_app.py → app.py +2 -2
- requirements.txt +2 -1
README.md
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
---
|
2 |
title: Test
|
3 |
emoji: 🔥
|
4 |
-
colorFrom:
|
5 |
colorTo: yellow
|
6 |
-
sdk:
|
7 |
pinned: false
|
8 |
-
license:
|
9 |
---
|
10 |
|
11 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
@@ -14,17 +14,13 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
|
|
14 |
This is a test ...
|
15 |
|
16 |
TASKS:
|
17 |
-
- for fast debug: Add a debug mode that enables me to run direct cli commands? -> Never for prod!
|
18 |
-
- prod harden docker with proper users etc. OR mention this is only a dev build an intended for messing with, no readonly filesystem etc.
|
19 |
- rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
|
20 |
- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
|
21 |
- test multimodal with llama?
|
22 |
-
- can i use swap in docker to maximize usable memory?
|
23 |
- proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
|
24 |
-
- maybe run as webserver locally and gradio only uses the webserver as backend? (better for async but maybe worse to control - just an idea)
|
25 |
- check ho wmuch parallel generation is possible or only one que and set
|
26 |
- move model to DL into env-var with proper error handling
|
27 |
-
- chore: cleanup ignore,
|
28 |
- update all deps to one up to date version, then PIN them!
|
29 |
- make a short info on how to clone and run custom 7b models in separate spaces
|
30 |
- make a pr for popular repos to include in their readme etc.
|
|
|
1 |
---
|
2 |
title: Test
|
3 |
emoji: 🔥
|
4 |
+
colorFrom: red
|
5 |
colorTo: yellow
|
6 |
+
sdk: gradio
|
7 |
pinned: false
|
8 |
+
license: mit
|
9 |
---
|
10 |
|
11 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
14 |
This is a test ...
|
15 |
|
16 |
TASKS:
|
|
|
|
|
17 |
- rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
|
18 |
- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
|
19 |
- test multimodal with llama?
|
|
|
20 |
- proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
|
|
|
21 |
- check ho wmuch parallel generation is possible or only one que and set
|
22 |
- move model to DL into env-var with proper error handling
|
23 |
+
- chore: cleanup ignore, etc.
|
24 |
- update all deps to one up to date version, then PIN them!
|
25 |
- make a short info on how to clone and run custom 7b models in separate spaces
|
26 |
- make a pr for popular repos to include in their readme etc.
|
gradio_app.py → app.py
RENAMED
@@ -5,8 +5,8 @@ import gradio as gr
|
|
5 |
import psutil
|
6 |
|
7 |
# Initing things
|
8 |
-
print("
|
9 |
-
llm = Llama(model_path="./model.bin")
|
10 |
llama_model_name = "TheBloke/dolphin-2.2.1-AshhLimaRP-Mistral-7B-GGUF"
|
11 |
print("! INITING DONE !")
|
12 |
|
|
|
5 |
import psutil
|
6 |
|
7 |
# Initing things
|
8 |
+
print("debug: init model")
|
9 |
+
llm = Llama(model_path="./model.bin") # LLaMa model
|
10 |
llama_model_name = "TheBloke/dolphin-2.2.1-AshhLimaRP-Mistral-7B-GGUF"
|
11 |
print("! INITING DONE !")
|
12 |
|
requirements.txt
CHANGED
@@ -1,2 +1,3 @@
|
|
1 |
psutil
|
2 |
-
gradio
|
|
|
|
1 |
psutil
|
2 |
+
gradio
|
3 |
+
llama_cpp
|