munish0838 commited on
Commit
27d73ad
·
verified ·
1 Parent(s): 55fcd2a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: apache-2.0
5
+ datasets:
6
+ - nvidia/ChatQA-Training-Data
7
+ language:
8
+ - en
9
+ base_model:
10
+ - meta-llama/Llama-3.2-3B
11
+ pipeline_tag: text-generation
12
+ library_name: transformers
13
+
14
+ ---
15
+
16
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
17
+
18
+
19
+ # QuantFactory/OneLLM-Doey-V1-Llama-3.2-3B-GGUF
20
+ This is quantized version of [DoeyLLM/OneLLM-Doey-V1-Llama-3.2-3B](https://huggingface.co/DoeyLLM/OneLLM-Doey-V1-Llama-3.2-3B) created using llama.cpp
21
+
22
+ # Original Model Card
23
+
24
+ ## **Model Summary**
25
+ This model is a fine-tuned version of **LLaMA 3.2-3B**, optimized using **LoRA (Low-Rank Adaptation)** on the [NVIDIA ChatQA-Training-Data](https://huggingface.co/datasets/nvidia/ChatQA-Training-Data). It is tailored for conversational AI, question answering, and other instruction-following tasks, with support for sequences up to 1024 tokens.
26
+
27
+ ---
28
+
29
+ ## **Key Features**
30
+ - **Base Model**: LLaMA 3.2-3B
31
+ - **Fine-Tuning Framework**: LoRA
32
+ - **Dataset**: NVIDIA ChatQA-Training-Data
33
+ - **Max Sequence Length**: 1024 tokens
34
+ - **Use Case**: Instruction-based tasks, question answering, conversational AI.
35
+
36
+ ## **Model Usage**
37
+ This fine-tuned model is suitable for:
38
+ - **Conversational AI**: Chatbots and dialogue agents with improved contextual understanding.
39
+ - **Question Answering**: Generating concise and accurate answers to user queries.
40
+ - **Instruction Following**: Responding to structured prompts.
41
+ - **Long-Context Tasks**: Processing sequences up to 1024 tokens for long-text reasoning.
42
+
43
+ # **How to Use DoeyLLM / OneLLM-Doey-V1-Llama-3.2-3B-Instruct**
44
+
45
+ This guide explains how to use the **DoeyLLM** model on both app (iOS) and PC platforms.
46
+
47
+ ---
48
+
49
+ ## **App (iOS): Use with OneLLM**
50
+
51
+ OneLLM brings versatile large language models (LLMs) to your device—Llama, Gemma, Qwen, Mistral, and more. Enjoy private, offline GPT and AI tools tailored to your needs.
52
+
53
+ With OneLLM, experience the capabilities of leading-edge language models directly on your device, all without an internet connection. Get fast, reliable, and intelligent responses, while keeping your data secure with local processing.
54
+
55
+ ### **Quick Start for iOS**
56
+
57
+ Follow these steps to integrate the **DoeyLLM** model using the OneLLM app:
58
+
59
+ 1. **Download OneLLM**
60
+ Get the app from the [App Store](https://apps.apple.com/us/app/onellm-private-ai-gpt-llm/id6737907910) and install it on your iOS device.
61
+
62
+ 2. **Load the DoeyLLM Model**
63
+ Use the OneLLM interface to load the DoeyLLM model directly into the app:
64
+ - Navigate to the **Model Library**.
65
+ - Search for `DoeyLLM`.
66
+ - Select the model and tap **Download** to store it locally on your device.
67
+ 3. **Start Conversing**
68
+ Once the model is loaded, you can begin interacting with it through the app's chat interface. For example:
69
+ - Tap the **Chat** tab.
70
+ - Type your question or prompt, such as:
71
+ > "Explain the significance of AI in education."
72
+ - Receive real-time, intelligent responses generated locally.
73
+
74
+ ### **Key Features of OneLLM**
75
+ - **Versatile Models**: Supports various LLMs, including Llama, Gemma, and Qwen.
76
+ - **Private & Secure**: All processing occurs locally on your device, ensuring data privacy.
77
+ - **Offline Capability**: Use the app without requiring an internet connection.
78
+ - **Fast Performance**: Optimized for mobile devices, delivering low-latency responses.
79
+
80
+ For more details or support, visit the [OneLLM App Store page](https://apps.apple.com/us/app/onellm-private-ai-gpt-llm/id6737907910).
81
+
82
+ ## **PC: Use with Transformers**
83
+
84
+ The DoeyLLM model can also be used on PC platforms through the `transformers` library, enabling robust and scalable inference for various NLP tasks.
85
+
86
+ ### **Quick Start for PC**
87
+ Follow these steps to use the model with Transformers:
88
+
89
+ 1. **Install Transformers**
90
+ Ensure you have `transformers >= 4.43.0` installed. Update or install it via pip:
91
+
92
+ ```bash
93
+ pip install --upgrade transformers
94
+
95
+ 2. **Load the Model**
96
+ Use the transformers library to load the model and tokenizer:
97
+
98
+ Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
99
+
100
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
101
+
102
+ ```python
103
+ import torch
104
+ from transformers import pipeline
105
+
106
+ model_id = "OneLLM-Doey-V1-Llama-3.2-3B"
107
+ pipe = pipeline(
108
+ "text-generation",
109
+ model=model_id,
110
+ torch_dtype=torch.bfloat16,
111
+ device_map="auto",
112
+ )
113
+ messages = [
114
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
115
+ {"role": "user", "content": "Who are you?"},
116
+ ]
117
+ outputs = pipe(
118
+ messages,
119
+ max_new_tokens=256,
120
+ )
121
+ print(outputs[0]["generated_text"][-1])
122
+ ```
123
+
124
+
125
+
126
+ ## Responsibility & Safety
127
+
128
+ As part of our responsible release strategy, we adopted a three-pronged approach to managing trust and safety risks:
129
+
130
+ Enable developers to deploy helpful, safe, and flexible experiences for their target audience and the use cases supported by the model.
131
+ Protect developers from adversarial users attempting to exploit the model’s capabilities to potentially cause harm.
132
+ Provide safeguards for the community to help prevent the misuse of the model.