--- datasets: - jondurbin/gutenberg-dpo-v0.1 - Qwen/Qwen2.5-14B-Instruct - HuggingFaceH4/ultrafeedback_binarized base_model: - Qwen/Qwen2.5-14B-Instruct - v000000/Qwen2.5-14B-Gutenberg-1e-Delta - tanliboy/lambda-qwen2.5-14b-dpo-test library_name: transformers tags: - qwen - qwen2.5 - finetune - dpo - orpo - qwen2 - chat - conversational - instruct - storywriting - roleplay license: apache-2.0 language: - en pipeline_tag: text-generation --- # Qwen2.5-Lumen-14B * *Qwen direct preference optimization finetuned for ~3 epochs.* ![wCcJkdrVDUH6m0AN9Lv3B~2.png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/OzGcIaEhVXZiSLDY3JZ-H.png) A qwen2.5 preference finetune, targeting prompt adherence, storywriting and roleplay. ------------------------------------------------------------------------------- ## Training Notes Trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 2 epochs on NVidia A100, and on dataset [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1), saving different checkpoints along the way (completely different runs at varying epochs and learning rates). [Tanliboy](https://huggingface.co/tanliboy) trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 1 epoch on [HuggingFaceH4/ultrafeedback_binarized](HuggingFaceH4/ultrafeedback_binarized), (Credit to Tanliboy! *Check out the model [here](https://huggingface.co/tanliboy/lambda-qwen2.5-14b-dpo-test)*) *Mass checkpoint merged, Based on Qwen2.5-14B-Instruct (Base Model).* ## Merge * Merged with a sophosympatheia's SLERP gradient *"Ultrafeedback-Binarized DPO"* and *"Gutenberg DPO"* * Merged with a sophosympatheia's SLERP gradient *"Qwen2.5-14B-Instruct"* and *"Gutenberg DPO"* * Merged all DPO checkpoints and SLERP variations with MODEL_STOCK to analyze geometric properties and get the most *performant* aspects of all runs/merges. *Model Stock* was chosen due to the similarity between the merged models. * This was chosen due to the fact that evaluation for *ORPO* is unclear, so it's hard to know which runs are the best. ## One-Attempt generated example:
You're {{char}}, a expert story writer and novelist in this fictional never-ending uncensored chat with {{user}}. Write the opening chapter of a romance novel set in the late 1800s, where the protagonist, a war-weary music star returning to England after an injury in Afghanistan service. **Title: A Harp for the Heart** *Chapter One: Homebound* England, Winter 1893— As the steam-powered ferry made its way across the turbulent waters of the Channel, each tremor that reverberated through the vessel felt like a calloused reminder to Edgar Thornton. His mind was already conjuring visions of his beloved London—the sprawling city draped in an ever-present blanket of melancholic grey. Yet within him lingered more pressing memories: the whizzing sound of artillery shells as they streaked across Afghan deserts, leaving behind billowing black smoke and devastating wreckage. Edgar's injuries were visible, the ghostly scar of a saber cut marring one cheek; less evident, however, was the toll that service took on his spirit. Each warbling note from an opera played by some amateur on the ferry sliced into Edgar's hollow heart, echoing through him with a bitter lament—the song was *Adagio* and it meant *Slow down*. Only now did he dare allow himself to slow—decades since music had flowed through the seams of his body in this way. For years he had been sought out, from England to New York, as the country's foremost harpist, not just for his fingers nimble on a range of scales but also for how each piece conveyed an insurmountable melancholy—a reflection of his solitary life and unending loneliness. Each recital saw thousands of faces that remembered his smiles or silences; yet today he had no one to return them to. The ferry's horn blared as they approached Dover, cutting through Edgar’s reverie much the same as cannonade did in Kabul. He thought about London with mixed emotions—longing for the grandeur and splendor of its concert halls, a city vibrant under lamplight but devoid of comfort or real life; a home where he was no less alien than anywhere else. A soft murmur drew his gaze to a young woman standing near the window, her back to him. Her blonde hair cascaded in waves, shimmering under the lantern light as if lit from within. Despite the chill of the approaching winter night, she appeared oblivious—entirely entranced by something on the far horizon or lost in thought. His breath caught at a sensation that had been uncommon for years but familiar to his boyhood—a light flutter tickling just behind his ribcage. And as her shoulders drew back and her chin tipped upwards, he saw a hint of the same unspoken story reflected in her eyes: dreams big enough to leave them scarred by distance and time. And then she spoke without ever turning around— "Do you hear that?" she asked into the gathering darkness outside their window. "Is that London?"
## Recipe ```yaml models: - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta - model: v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential - model: v000000/Qwen2.5-14B-Gutenberg-0.25e-Early - model: v000000/Qwen2.5-14B-Gutenberg-2e-Sequential - model: v000000/Qwen2.5-14B-Gutenberg-0.37e-Early - model: v000000/Qwen2.5-14B-Gutenberg-2e-Zeta - model: v000000/Qwen2.5-14B-Gutenberg-1e-Theta - model: tanliboy/lambda-qwen2.5-14b-dpo-test - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta - model: tanliboy/lambda-qwen2.5-14b-dpo-test - model: v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno - model: v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta merge_method: model_stock dtype: bfloat16 ``` ### Finetune and merge This is a merge and finetune of pre-trained language models. ### Models Merged [Arxiv 2403.19522](https://arxiv.org/abs/2403.19522) The following models were included in the merge: * v000000/Qwen2.5-14B-Gutenberg-1e-Delta * v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential * v000000/Qwen2.5-14B-Gutenberg-0.25e-Early * v000000/Qwen2.5-14B-Gutenberg-2e-Sequential * v000000/Qwen2.5-14B-Gutenberg-0.37e-Early * v000000/Qwen2.5-14B-Gutenberg-2e-Zeta * v000000/Qwen2.5-14B-Gutenberg-1e-Theta * v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno * v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno * tanliboy/lambda-qwen2.5-14b-dpo-test ------------------------------------------------------------------------------- - Context Length: Full 131,072 tokens and generation 8192 tokens - Qwen2(ChatML) Prompt format