doberst commited on
Commit
302d3c0
·
verified ·
1 Parent(s): c3a0616

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -34
README.md CHANGED
@@ -1,53 +1,34 @@
1
  ---
2
  license: apache-2.0
3
  inference: false
4
- tags: [green, llmware-rag, p1, onnx]
5
  ---
6
 
7
- # bling-tiny-llama-onnx
8
 
9
- <!-- Provide a quick summary of what the model is/does. -->
10
-
11
- **bling-tiny-llama-onnx** is an ONNX int4 quantized version of BLING Tiny-Llama 1B, providing a very fast, very small inference implementation, optimized for AI PCs using Intel GPU, CPU and NPU.
12
-
13
- [**bling-tiny-llama**](https://huggingface.co/llmware/bling-tiny-llama-v0) is a fact-based question-answering model, optimized for complex business documents.
14
-
15
- Get started right away
16
-
17
- 1. Install dependencies
18
-
19
- ```
20
- pip3 install llmware
21
- pip3 install onnxruntime_genai
22
- ```
23
-
24
- 2. Hello World
25
-
26
- ```python
27
- from llmware.models import ModelCatalog
28
- model = ModelCatalog().load_model("bling-tiny-llama-onnx")
29
- response = model.inference("The stock price is $45.\nWhat is the stock price?")
30
- print("response: ", response)
31
- ```
32
-
33
- Looking for AI PC solutions and demos, contact us at [llmware](https://www.llmware.ai)
34
 
 
35
 
36
  ### Model Description
37
 
38
  - **Developed by:** llmware
39
  - **Model type:** tinyllama
40
- - **Parameters:** 1.1 billion
41
- - **Model Parent:** llmware/bling-tiny-llama-v0
 
42
  - **Language(s) (NLP):** English
43
  - **License:** Apache 2.0
44
- - **Uses:** Fact-based question-answering
45
  - **RAG Benchmark Accuracy Score:** 86.5
46
- - **Quantization:** int4
47
-
48
 
49
- ## Model Card Contact
 
 
 
50
 
51
- [llmware on hf](https://www.huggingface.co/llmware)
52
 
 
 
 
53
  [llmware website](https://www.llmware.ai)
 
1
  ---
2
  license: apache-2.0
3
  inference: false
4
+ tags: [green, llmware-rag, p1, ov]
5
  ---
6
 
7
+ # bling-tiny-llama-ov
8
 
9
+ **bling-tiny-llama-ov** is a very small, very fast fact-based question-answering model, designed for retrieval augmented generation (RAG) with complex business documents, and quantized and packaged in OpenVino int4 for AI PCs using Intel GPU, CPU and NPU.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
+ This model is one of the smallest and fastest in the series. For higher accuracy, look at larger models in the BLING/DRAGON series.
12
 
13
  ### Model Description
14
 
15
  - **Developed by:** llmware
16
  - **Model type:** tinyllama
17
+ - **Parameters:** 1.1 billion
18
+ - **Quantization:** int4
19
+ - **Model Parent:** [llmware/bling-tiny-llama-v0](https://www.huggingface.co/llmware/bling-tiny-llama-v0)
20
  - **Language(s) (NLP):** English
21
  - **License:** Apache 2.0
22
+ - **Uses:** Fact-based question-answering, RAG
23
  - **RAG Benchmark Accuracy Score:** 86.5
 
 
24
 
25
+
26
+ Get started right away with [OpenVino](https://github.com/openvinotoolkit/openvino)
27
+
28
+ Looking for AI PC solutions, contact us at [llmware](https://www.llmware.ai)
29
 
 
30
 
31
+ ## Model Card Contact
32
+ [llmware on github](https://www.github.com/llmware-ai/llmware)
33
+ [llmware on hf](https://www.huggingface.co/llmware)
34
  [llmware website](https://www.llmware.ai)