Upload 2 files
Browse files- .gitattributes +1 -0
- README.md +143 -0
- numind.NuExtract-v1.5.Q5_K_M.llamafile +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
numind.NuExtract-v1.5.Q5_K_M.llamafile filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,143 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# llamafile
|
2 |
+
|
3 |
+
[](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml)<br/>
|
4 |
+
|
5 |
+
|
6 |
+
**llamafile lets you distribute and run LLMs with a single file. ([announcement blog post](https://hacks.mozilla.org/2023/11/introducing-llamafile/))**
|
7 |
+
|
8 |
+
llamafile aims to make open LLMs much more
|
9 |
+
accessible to both developers and end users. They're doing that by
|
10 |
+
combining [llama.cpp](https://github.com/ggerganov/llama.cpp) with [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) into one
|
11 |
+
framework that collapses all the complexity of LLMs down to
|
12 |
+
a single-file executable (called a "llamafile") that runs
|
13 |
+
locally on most computers, with no installation.<br/><br/>
|
14 |
+
|
15 |
+
llamafile is a Mozilla Builders project.
|
16 |
+
## Quickstart
|
17 |
+
|
18 |
+
The easiest way to try it for yourself is to download the example
|
19 |
+
llamafile for the [numind.NuExtract](numind/NuExtract-1.5) model (license: [mit],
|
20 |
+
[OpenAI](https://openai.com/policies/terms-of-use)). With llamafile, this you can run this model locally while consuming comparitively less resources and having better performance in CPU alone.
|
21 |
+
|
22 |
+
1. Download [numind.NuExtract-v1.5.Q5_K_M.llamafile](https://huggingface.co/Devarui379/numind.NuExtract-v1.5-Q5_K_M-llamafile/resolve/main/numind.NuExtract-v1.5.Q5_K_M.llamafile?download=true) (2.78 GB).
|
23 |
+
|
24 |
+
2. Open your computer's terminal.
|
25 |
+
|
26 |
+
3. If you're using macOS, Linux, or BSD, you'll need to grant permission
|
27 |
+
for your computer to execute this new file. (You only need to do this
|
28 |
+
once.)
|
29 |
+
|
30 |
+
```sh
|
31 |
+
chmod +x numind.NuExtract-v1.5.Q5_K_M.llamafile
|
32 |
+
```
|
33 |
+
|
34 |
+
4. If you're on Windows, rename the file by adding ".exe" on the end.
|
35 |
+
|
36 |
+
5. Run the llamafile. e.g.:
|
37 |
+
|
38 |
+
```sh
|
39 |
+
./numind.NuExtract-v1.5.Q5_K_M.llamafile
|
40 |
+
```
|
41 |
+
|
42 |
+
6. Your browser should open automatically and display a chat interface.
|
43 |
+
(If it doesn't, just open your browser and point it at http://localhost:8080)
|
44 |
+
|
45 |
+
7. When you're done chatting, return to your terminal and hit
|
46 |
+
`Control-C` to shut down llamafile.
|
47 |
+
|
48 |
+
**Having trouble? See the "Gotchas" section in the official github page of [llamafile](https://github.com/Mozilla-Ocho/llamafile) **
|
49 |
+
|
50 |
+
## Distribution
|
51 |
+
|
52 |
+
One good way to share a llamafile with your friends is by posting it on
|
53 |
+
Hugging Face. If you do that, then it's recommended that you mention in
|
54 |
+
your Hugging Face commit message what git revision or released version
|
55 |
+
of llamafile you used when building your llamafile. That way everyone
|
56 |
+
online will be able verify the provenance of its executable content. If
|
57 |
+
you've made changes to the llama.cpp or cosmopolitan source code, then
|
58 |
+
the Apache 2.0 license requires you to explain what changed. One way you
|
59 |
+
can do that is by embedding a notice in your llamafile using `zipalign`
|
60 |
+
that describes the changes, and mention it in your Hugging Face commit.
|
61 |
+
|
62 |
+
## Documentation
|
63 |
+
|
64 |
+
There's a manual page for each of the llamafile programs installed when you
|
65 |
+
run `sudo make install`. The command manuals are also typeset as PDF
|
66 |
+
files that you can download from the GitHub releases page. Lastly, most
|
67 |
+
commands will display that information when passing the `--help` flag.
|
68 |
+
|
69 |
+
## Running llamafile with models downloaded by third-party applications
|
70 |
+
|
71 |
+
This section answers the question *"I already have a model downloaded locally by application X, can I use it with llamafile?"*. The general answer is "yes, as long as those models are locally stored in GGUF format" but its implementation can be more or less hacky depending on the application. A few examples (tested on a Mac) follow.
|
72 |
+
|
73 |
+
### LM Studio
|
74 |
+
[LM Studio](https://lmstudio.ai/) stores downloaded models in `~/.cache/lm-studio/models`, in subdirectories with the same name of the models (following HuggingFace's `account_name/model_name` format), with the same filename you saw when you chose to download the file.
|
75 |
+
|
76 |
+
So if you have downloaded e.g. the `llama-2-7b.Q2_K.gguf` file for `TheBloke/Llama-2-7B-GGUF`, you can run llamafile as follows:
|
77 |
+
|
78 |
+
```
|
79 |
+
cd ~/.cache/lm-studio/models/TheBloke/Llama-2-7B-GGUF
|
80 |
+
llamafile -m llama-2-7b.Q2_K.gguf
|
81 |
+
```
|
82 |
+
|
83 |
+
### Ollama
|
84 |
+
|
85 |
+
When you download a new model with [ollama](https://ollama.com), all its metadata will be stored in a manifest file under `~/.ollama/models/manifests/registry.ollama.ai/library/`. The directory and manifest file name are the model name as returned by `ollama list`. For instance, for `llama3:latest` the manifest file will be named `.ollama/models/manifests/registry.ollama.ai/library/llama3/latest`.
|
86 |
+
|
87 |
+
The manifest maps each file related to the model (e.g. GGUF weights, license, prompt template, etc) to a sha256 digest. The digest corresponding to the element whose `mediaType` is `application/vnd.ollama.image.model` is the one referring to the model's GGUF file.
|
88 |
+
|
89 |
+
Each sha256 digest is also used as a filename in the `~/.ollama/models/blobs` directory (if you look into that directory you'll see *only* those sha256-* filenames). This means you can directly run llamafile by passing the sha256 digest as the model filename. So if e.g. the `llama3:latest` GGUF file digest is `sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29`, you can run llamafile as follows:
|
90 |
+
|
91 |
+
```
|
92 |
+
cd ~/.ollama/models/blobs
|
93 |
+
llamafile -m sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
|
94 |
+
```
|
95 |
+
|
96 |
+
|
97 |
+
|
98 |
+
## Security
|
99 |
+
|
100 |
+
llamafile adds pledge() and SECCOMP sandboxing to llama.cpp. This is
|
101 |
+
enabled by default. It can be turned off by passing the `--unsecure`
|
102 |
+
flag. Sandboxing is currently only supported on Linux and OpenBSD on
|
103 |
+
systems without GPUs; on other platforms it'll simply log a warning.
|
104 |
+
|
105 |
+
Our approach to security has these benefits:
|
106 |
+
|
107 |
+
1. After it starts up, your HTTP server isn't able to access the
|
108 |
+
filesystem at all. This is good, since it means if someone discovers
|
109 |
+
a bug in the llama.cpp server, then it's much less likely they'll be
|
110 |
+
able to access sensitive information on your machine or make changes
|
111 |
+
to its configuration. On Linux, we're able to sandbox things even
|
112 |
+
further; the only networking related system call the HTTP server will
|
113 |
+
allowed to use after starting up, is accept(). That further limits an
|
114 |
+
attacker's ability to exfiltrate information, in the event that your
|
115 |
+
HTTP server is compromised.
|
116 |
+
|
117 |
+
2. The main CLI command won't be able to access the network at all. This
|
118 |
+
is enforced by the operating system kernel. It also won't be able to
|
119 |
+
write to the file system. This keeps your computer safe in the event
|
120 |
+
that a bug is ever discovered in the GGUF file format that lets
|
121 |
+
an attacker craft malicious weights files and post them online. The
|
122 |
+
only exception to this rule is if you pass the `--prompt-cache` flag
|
123 |
+
without also specifying `--prompt-cache-ro`. In that case, security
|
124 |
+
currently needs to be weakened to allow `cpath` and `wpath` access,
|
125 |
+
but network access will remain forbidden.
|
126 |
+
|
127 |
+
Therefore your llamafile is able to protect itself against the outside
|
128 |
+
world, but that doesn't mean you're protected from llamafile. Sandboxing
|
129 |
+
is self-imposed. If you obtained your llamafile from an untrusted source
|
130 |
+
then its author could have simply modified it to not do that. In that
|
131 |
+
case, you can run the untrusted llamafile inside another sandbox, such
|
132 |
+
as a virtual machine, to make sure it behaves how you expect.
|
133 |
+
|
134 |
+
## Licensing
|
135 |
+
|
136 |
+
While the llamafile project is Apache 2.0-licensed, the changes
|
137 |
+
to llama.cpp are licensed under MIT (just like the llama.cpp project
|
138 |
+
itself) so as to remain compatible and upstreamable in the future,
|
139 |
+
should that be desired.
|
140 |
+
|
141 |
+
|
142 |
+
|
143 |
+
[](https://star-history.com/#Mozilla-Ocho/llamafile&Date)
|
numind.NuExtract-v1.5.Q5_K_M.llamafile
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5e60f226e3832221302892fd2b64ec9045c4c2b758a85a10bede9dd42fc3a116
|
3 |
+
size 2985572436
|