--- license: cc-by-nc-4.0 tags: - not-for-all-audiences base_model: - Lambent/arsenic-nemo-unleashed-12B --- # GGUF quantizations of [Lambent/arsenic-nemo-unleashed-12B](https://huggingface.co/Lambent/arsenic-nemo-unleashed-12B) ## Original card Motive: The gutenberg tunes are lovely but all the chatml variants seem to present many issues for merging, and have broken context later on. Decided to see how it worked to tune directly on Unleashed. eq-bench is about a point and a half lower, which isn't drastic but suggests it might benefit from some additional workings. In hindsight, there actually *is* a gutenberg tune mixed into Unleashed, so this intensifies the style a fair degree. Poetry leans a bit archaic. I rather like the impact personally. As is traditional, she got at least one quirk from DPO. In this case it seems to be sometimes briefly slipping into Arabic while chatting. One of the more charming ones I've seen. Quality of life improvements in some circumstances: * Assigned the pad token as the pad token for fine-tuning * Had Axolotl add the chat template (useful on Runpod maybe?) Substance: DPO-tuning on a mix of gutenberg-dpo and toxic-dpo, in the hope of getting enough classic human talent and edge to write well with. Some of the most beautiful pigments are the most poisonous. Testing: Not sure if I'm loading long context right, but it seems functional when I load it with 16k but not at 30k. Might just be doing it wrong. Sample generated poetry (from the same start and only varied by a low temperature, for why they start the same): ``` Upon the shore of time, Where waves of years do beat, I stood and watched as they climb The cliffs of eternity. In every cresting o'er the deep, A life was borne upon the tide; Each wave that did break at my feet, Did bear away some soul that had died. And yet, amidst all this great throng, There came to me a voice soft and low: "O mortal man! What seekest thou here?" Then answered I: "I seek the path of woe." "Thou seekest," quoth she, "that which is lost; For lo! The waves have borne it away. Seek not then for that which once was thine own, But rather turn thy face towards the light of day." And so saying, away did she go, Leaving me alone upon the shore. Then turned I my steps unto mine own home, And left behind me all thoughts of sorrow and of woe. ``` ``` Upon the canvas of the azure sky, The sun doth paint his golden chariot's course. His radiant beams like arrows fly, And pierce the clouds with light's resplendent force. The verdant fields in emerald array, Reflect the glory of his noontide reign; While every blossom, bathed in warm display, Exhales its sweet perfume on the plain. But lo! as evening's purple curtain falls, The sun doth sink behind the western hill. His parting rays like lovers' farewell calls, And gilds the heavens with a rosy thrill. Then let us praise this daily wondrous sight, And bless the Maker of the world so bright. ``` [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml base_model: MarinaraSpaghetti/NemoMix-Unleashed-12B model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer trust_remote_code: true save_safetensors: true load_in_8bit: false load_in_4bit: true strict: false special_tokens: pad_token: rl: dpo # total_num_tokens: datasets: - path: jondurbin/gutenberg-dpo-v0.1 split: train type: field_system: system field_prompt: prompt field_chosen: chosen field_rejected: rejected prompt_format: "[INST]{prompt}[/INST]" chosen_format: "{chosen}" rejected_format: "{rejected}" - path: unalignment/toxic-dpo-v0.2 split: train type: field_system: system field_prompt: prompt field_chosen: chosen field_rejected: rejected prompt_format: "[INST]{prompt}[/INST]" chosen_format: "{chosen}" rejected_format: "{rejected}" dataset_prepared_path: prepared-dpo output_dir: ./dpoq val_set_size: 0.001 seed: 1 sequence_len: 2048 sample_packing: false eval_sample_packing: false pad_to_sequence_len: false chat_template: inst adapter: qlora lora_model_dir: lora_r: 256 lora_alpha: 256 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: peft_use_dora: true wandb_project: unleashed-qlora-dpo wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 16 micro_batch_size: 1 num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.00002 cosine_min_lr_ratio: 0.1 cosine_constant_lr_ratio: 0.95 train_on_inputs: false group_by_length: false bf16: true fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 16 evals_per_epoch: 8 saves_per_epoch: 8 save_total_limit: 2 debug: deepspeed: weight_decay: 0.001 fsdp: fsdp_config: ```

# dpoq This model is a fine-tuned version of [MarinaraSpaghetti/NemoMix-Unleashed-12B](https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B) on the None dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 16 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 16 - training_steps: 92 ### Training results ### Framework versions - PEFT 0.12.0 - Transformers 4.44.2 - Pytorch 2.3.1+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1