--- license: apache-2.0 datasets: - jeiku/Writing - FourOhFour/RP_Phase - anthracite-core/full-opus-chosen-hermes-rejected-kto-v1 language: - en base_model: - IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml --- ## Aura-MoE-2x4B-v2 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/626dfb8786671a29c715f8a9/zyGqa-iH77dgU9D8WvoXY.png) ## Introduction **Aura-MoE-2x4B-v2** is a state of the art dedicated roleplaying model designed to fulfill your every desire. The finetunes used in this merge saw several hundreds of millions of tokens of instruction data. The merge was then healed on 150 million tokens of roleplaying data. A Kahneman-Tversky Optimization was applied to the healed model to give it a unique output style. By the numbers, this should be a direct improvement over **[Aura-MoE-2x4B](https://huggingface.co/AuraIndustries/Aura-MoE-2x4B)** Developed by **Aura Industries**, with contributions from **Anthracite Org** ## Model Details - **Model Name**: Aura-MoE-2x4B-v2 - **Base Model**: [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml) - **Model Type**: Chat Completions - **Prompt Format**: ChatML - **License**: Apache-2.0 - **Language**: English - **Max Context**: 8,192+ tokens ## License This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). ## Quantizations [Static GGUF](https://huggingface.co/mradermacher/Aura-MoE-2x4B-v2-GGUF) [Imatrix GGUF](https://huggingface.co/mradermacher/Aura-MoE-2x4B-v2-i1-GGUF) # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Coming soon... | Metric |Value| |-------------------|----:| |Avg. | N/A| |IFEval (0-Shot) | N/A| |BBH (3-Shot) | N/A| |MATH Lvl 5 (4-Shot)| N/A| |GPQA (0-shot) | N/A| |MuSR (0-shot) | N/A| |MMLU-PRO (5-shot) | N/A| ## Training Configuration
Click here for Mergekit and Axolotl configs MoE Merge ```yaml base_model: FourOhFour/Zenith_4B gate_mode: random dtype: bfloat16 experts_per_token: 1 experts: - source_model: FourOhFour/Luxe_4B - source_model: FourOhFour/Zenith_4B ``` SFT ```yaml base_model: jeiku/MoEv2 model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false datasets: - path: FourOhFour/RP_Phase type: chat_template chat_template: chatml roles_to_train: ["gpt"] field_messages: conversations message_field_role: from message_field_content: value train_on_eos: turn - path: jeiku/Writing type: completion field: text chat_template: chatml shuffle_merged_datasets: true dataset_prepared_path: val_set_size: 0.01 output_dir: ./output/out hub_model_id: jeiku/Aura-MoEv2 hub_strategy: "all_checkpoints" push_dataset_to_hub: hf_use_auth_token: true sequence_len: 8192 sample_packing: true eval_sample_packing: false pad_to_sequence_len: wandb_project: Aura-MoEv2 wandb_entity: wandb_watch: wandb_name: Aura-MoEv2 wandb_log_model: gradient_accumulation_steps: 16 micro_batch_size: 2 num_epochs: 2 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.00005 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 10 evals_per_epoch: 2 eval_table_size: eval_max_new_tokens: saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.05 fsdp: fsdp_config: special_tokens: pad_token: <|finetune_right_pad_id|> ``` KTO ```yaml base_model: jeiku/Aura-MoEv2 model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false hub_model_id: jeiku/moekto hub_strategy: "all_checkpoints" push_dataset_to_hub: hf_use_auth_token: true chat_template: chatml rl: kto rl_beta: 0.2 kto_desirable_weight: 0.2 datasets: - path: anthracite-core/full-opus-chosen-hermes-rejected-kto-v1 type: chatml.argilla shuffle_merged_datasets: true val_set_size: 0.0 output_dir: ./outputs/out sequence_len: 8192 sample_packing: false eval_sample_packing: false pad_to_sequence_len: false wandb_project: moekto wandb_entity: wandb_watch: wandb_name: moekto wandb_log_model: gradient_accumulation_steps: 16 micro_batch_size: 2 num_epochs: 2 max_steps: 500 optimizer: adamw_8bit lr_scheduler: cosine learning_rate: 0.00001 weight_decay: 0.05 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true remove_unused_columns: false early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 10 evals_per_epoch: 2 eval_table_size: eval_max_new_tokens: saves_per_epoch: 1 debug: deepspeed: fsdp: fsdp_config: fsdp: fsdp_config: special_tokens: pad_token: <|finetune_right_pad_id|> ```