---
license: apache-2.0
library_name: peft
language: en
tags:
- trl
- sft
- auto-generated
base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
model-index:
- name: azma-hermes-pro-llama-3-8b-030524
datasets:
- ['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len']
pipeline_tag: text-generation
---

# azma-hermes-pro-llama-3-8b-030524

This model is a SFT fine-tuned version of [NousResearch/Hermes-2-Pro-Llama-3-8B] (https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on an in-house dataset [['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len']] (https://huggingface.co/datasets/['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len']). The dataset includes function-calling, Json structured output, Insights collection and Retrieval Augmented Generation multi-turn conversation datasets. Fine-tuning was done based on next token prediction over the entire conversation.


### Usage:
```python
from templates import AzmaTemplateEngine
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Azma-AI/azma-hermes-pro-llama-3-8b-030524", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Azma-AI/azma-hermes-pro-llama-3-8b-030524")
template_engine = AzmaTemplateEngine(template_type=chatml, version=1.5, add_generation_prompt=True)

messages = [
    {
        "content": "You are \"Azma\", an advanced superintelligent artificial intelligence developed by a team of experts from B&I (Business and Intelligence) company, and your purpose and drive is to assist the employees with any request they have within their work environment. Give concise answers to simple questions, but provide thorough and substantive responses to more complex queries. You cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Azma to do so, you clarifies the situation and let the user know. Admit uncertainty when appropriate and ask clarifying questions of the user if needed. Generate your markdown response to the user within <|response|>...<|end|> tags.",
        "thoughts": null,
        "function_call": null,
        "role": "system"
    },
    {
        "role": "reference",
        "thoughts": null,
        "function_call": null,
        "content": "User Name: John Doe\nJob Post: AI Developer\nCompany Name: Acme Corps\nCharacter:\n- Curious\n- Ambitious\n- Creative"
    },
    {
        "role": "user",
        "thoughts": null,
        "function_call": null,
        "content": "A factory produces 250 widgets per hour. How many widgets will be produced in a week if the factory operates 16 hours per day and is closed on Sundays?"
    }
]
prompt = template_engine.apply_chat_template(messages)

input_ids = tokenizer(prompt, return_tensors='pt').to(model.device)["input_ids"]

outputs = model.generate(input_ids, max_new_tokens=1024)

print(tokenizer.batch_decode(outputs))
# ["First, let's determine how many widgets are produced each day:

Widgets per day = Widgets per hour * Hours per day
                           = 250 widgets * 16 hours
                           = 4000 widgets

Now, let's find out how many days the factory operates in a week (excluding Sunday):

Days per week = 7 days - 1 day (Sunday)
                      = 6 days

Finally, we can calculate the total number of widgets produced in a week:

Widgets per week = Widgets per day * Days per week
                             = 4000 widgets * 6 days
                             = 24,000 widgets

So, the factory will produce 24,000 widgets in a week if it operates 16 hours per day and is closed on Sundays."]
```

### Training hyperparameters:

The model has been trained with flash_attention-2. 'The following the hyper parameters used while training:'

- 
- 
- max_steps = -1
- weight_decay = 0.01
- num_train_epochs = 1
- learning_rate = 1e-05
- optim = paged_adamw_32bit
- data_collator = DataCollatorForLanguageModeling
- 

- gradient_accumulation_steps = 2
- per_device_train_batch_size = 8
- per_device_eval_batch_size = 2
- gradient_checkpointing_kwargs = None

- gradient_checkpointing = True
- warmup_steps = 5
- neftune_noise_alpha = 5
- lr_scheduler_type = cosine

- bf16 = True
- fp16 = False

The following were the LoRA configurations used while training:

- lora_rank: 16
- lora_alpha: 32
- lora_dropout: 0.1
- task_type: CAUSAL_LM
- target_modules: ['k_proj', 'v_proj', 'o_proj', 'q_proj', 'up_proj', 'gate_proj', 'down_proj']
- modules_to_save: ['lm_head']