--- license: apache-2.0 library_name: peft language: en tags: - trl - sft - auto-generated base_model: NousResearch/Hermes-2-Pro-Llama-3-8B model-index: - name: azma-hermes-pro-llama-3-8b-030524 datasets: - ['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len'] pipeline_tag: text-generation --- # azma-hermes-pro-llama-3-8b-030524 This model is a SFT fine-tuned version of [NousResearch/Hermes-2-Pro-Llama-3-8B] (https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on an in-house dataset [['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len']] (https://huggingface.co/datasets/['Azma-AI/azma-mermaid-dataset-single-turn-chatml', 'Azma-AI/azma-dataset-v2-mermaid-without-thoughts-final-chatml-8192-seq-len']). The dataset includes function-calling, Json structured output, Insights collection and Retrieval Augmented Generation multi-turn conversation datasets. Fine-tuning was done based on next token prediction over the entire conversation. ### Usage: ```python from templates import AzmaTemplateEngine from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("Azma-AI/azma-hermes-pro-llama-3-8b-030524", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("Azma-AI/azma-hermes-pro-llama-3-8b-030524") template_engine = AzmaTemplateEngine(template_type=chatml, version=1.5, add_generation_prompt=True) messages = [ { "content": "You are \"Azma\", an advanced superintelligent artificial intelligence developed by a team of experts from B&I (Business and Intelligence) company, and your purpose and drive is to assist the employees with any request they have within their work environment. Give concise answers to simple questions, but provide thorough and substantive responses to more complex queries. You cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Azma to do so, you clarifies the situation and let the user know. Admit uncertainty when appropriate and ask clarifying questions of the user if needed. Generate your markdown response to the user within <|response|>...<|end|> tags.", "thoughts": null, "function_call": null, "role": "system" }, { "role": "reference", "thoughts": null, "function_call": null, "content": "User Name: John Doe\nJob Post: AI Developer\nCompany Name: Acme Corps\nCharacter:\n- Curious\n- Ambitious\n- Creative" }, { "role": "user", "thoughts": null, "function_call": null, "content": "A factory produces 250 widgets per hour. How many widgets will be produced in a week if the factory operates 16 hours per day and is closed on Sundays?" } ] prompt = template_engine.apply_chat_template(messages) input_ids = tokenizer(prompt, return_tensors='pt').to(model.device)["input_ids"] outputs = model.generate(input_ids, max_new_tokens=1024) print(tokenizer.batch_decode(outputs)) # ["First, let's determine how many widgets are produced each day: Widgets per day = Widgets per hour * Hours per day = 250 widgets * 16 hours = 4000 widgets Now, let's find out how many days the factory operates in a week (excluding Sunday): Days per week = 7 days - 1 day (Sunday) = 6 days Finally, we can calculate the total number of widgets produced in a week: Widgets per week = Widgets per day * Days per week = 4000 widgets * 6 days = 24,000 widgets So, the factory will produce 24,000 widgets in a week if it operates 16 hours per day and is closed on Sundays."] ``` ### Training hyperparameters: The model has been trained with flash_attention-2. 'The following the hyper parameters used while training:' - - - max_steps = -1 - weight_decay = 0.01 - num_train_epochs = 1 - learning_rate = 1e-05 - optim = paged_adamw_32bit - data_collator = DataCollatorForLanguageModeling - - gradient_accumulation_steps = 2 - per_device_train_batch_size = 8 - per_device_eval_batch_size = 2 - gradient_checkpointing_kwargs = None - gradient_checkpointing = True - warmup_steps = 5 - neftune_noise_alpha = 5 - lr_scheduler_type = cosine - bf16 = True - fp16 = False The following were the LoRA configurations used while training: - lora_rank: 16 - lora_alpha: 32 - lora_dropout: 0.1 - task_type: CAUSAL_LM - target_modules: ['k_proj', 'v_proj', 'o_proj', 'q_proj', 'up_proj', 'gate_proj', 'down_proj'] - modules_to_save: ['lm_head']