gghfez/experimental_R1-8x22b

This is an attempt to train WizardLM2-8x22b on R1 outputs, particularly for creative writing.

It didn't really take too well as it tends to ignore it's CoT trace when generating the final response. For example, when prompted to draft the next chapter in a story (I omitted a lot of the output for brevity):

<think>
... I need to ensure the story maintains its gritty, realistic tone without becoming overly melodramatic. The characters' growth should be subtle but significant. Also, the ending should leave a sense of hope but not be too neat—their redemption is fragile, and the future is uncertain.

Let me outline the next few chapters:

Chapter 5: Nightmares and Trust ...
</think>

And then the output includes the usual tropes:

... “I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead.

It uses the Vicuna tempalate, the same as WizardLM2.

gghfez
/

experimental_R1-8x22b

Model tree for gghfez/experimental_R1-8x22b

Collection including gghfez/experimental_R1-8x22b

Experiments