Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,9 @@ tags:
|
|
16 |
- text-generation-inference
|
17 |
- safetensors
|
18 |
---
|
19 |
-
### Acrux-500M-o1-Journey Model Files
|
|
|
|
|
20 |
|
21 |
| **File Name** | **Size** | **Description** | **Upload Status** |
|
22 |
|----------------------------|----------------|-------------------------------------------|--------------------|
|
@@ -32,4 +34,62 @@ tags:
|
|
32 |
| `tokenizer_config.json` | 7.73 kB | Additional tokenizer settings. | Uploaded |
|
33 |
| `vocab.json` | 2.78 MB | Vocabulary for the tokenizer. | Uploaded |
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
---
|
|
|
16 |
- text-generation-inference
|
17 |
- safetensors
|
18 |
---
|
19 |
+
### Acrux-500M-o1-Journey Model Files
|
20 |
+
|
21 |
+
The **Acrux-500M-o1-Journey** is a lightweight, instruction-tuned language model fine-tuned from the **Qwen2.5-0.5B-Instruct** base model. With a size of 500 million parameters, it is designed for **cost-effective deployment** and **fast text generation** while maintaining quality performance for instruction-following tasks.
|
22 |
|
23 |
| **File Name** | **Size** | **Description** | **Upload Status** |
|
24 |
|----------------------------|----------------|-------------------------------------------|--------------------|
|
|
|
34 |
| `tokenizer_config.json` | 7.73 kB | Additional tokenizer settings. | Uploaded |
|
35 |
| `vocab.json` | 2.78 MB | Vocabulary for the tokenizer. | Uploaded |
|
36 |
|
37 |
+
---
|
38 |
+
### **Key Features:**
|
39 |
+
|
40 |
+
1. **Compact Size with Efficient Performance:**
|
41 |
+
The smaller parameter count (500M) ensures faster inference and reduced hardware requirements.
|
42 |
+
|
43 |
+
2. **Instruction Optimization:**
|
44 |
+
Fine-tuned to follow prompts effectively, making it suitable for interactive applications and prompt-based tasks.
|
45 |
+
|
46 |
+
3. **Domain-Specific Training:**
|
47 |
+
Trained on the **GAIR/o1-journey** dataset, providing tailored capabilities for specific use cases.
|
48 |
+
|
49 |
+
---
|
50 |
+
|
51 |
+
### **Training Details:**
|
52 |
+
- **Base Model:** [Qwen2.5-0.5B-Instruct](#)
|
53 |
+
- **Dataset Used for Fine-Tuning:** [GAIR/o1-journey](#)
|
54 |
+
- A compact dataset focusing on instruction-driven generation with 1.42k samples.
|
55 |
+
|
56 |
+
---
|
57 |
+
### **Capabilities:**
|
58 |
+
|
59 |
+
1. **Instruction Following:**
|
60 |
+
- Generates accurate and coherent responses to user instructions.
|
61 |
+
- Handles summarization, question-answering, and conversational tasks.
|
62 |
+
|
63 |
+
2. **Fast Inference:**
|
64 |
+
- Ideal for real-time applications due to reduced latency from its smaller size.
|
65 |
+
|
66 |
+
3. **Interactive AI Development:**
|
67 |
+
- Suitable for chatbots, virtual assistants, and instructional interfaces.
|
68 |
+
|
69 |
+
---
|
70 |
+
### **Usage Instructions:**
|
71 |
+
|
72 |
+
1. **Setup:**
|
73 |
+
Download all model files, ensuring compatibility with the Hugging Face Transformers library.
|
74 |
+
|
75 |
+
2. **Loading the Model:**
|
76 |
+
```python
|
77 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
78 |
+
|
79 |
+
model_name = "prithivMLmods/Acrux-500M-o1-Journey"
|
80 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
81 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
82 |
+
```
|
83 |
+
3. **Sample Generate Text:**
|
84 |
+
```python
|
85 |
+
input_text = "Explain the concept of machine learning in simple terms."
|
86 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
87 |
+
outputs = model.generate(**inputs, max_length=100, temperature=0.7)
|
88 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
89 |
+
```
|
90 |
+
4. **Optimize Generation:**
|
91 |
+
Adjust parameters in `generation_config.json` for better control of output, such as:
|
92 |
+
- `temperature` for randomness.
|
93 |
+
- `top_p` for sampling diversity.
|
94 |
+
- `max_length` for output size.
|
95 |
---
|