add lora and readme.md

Browse files

Files changed (5) hide show

README.md +79 -0
adapter_config.json +28 -0
adapter_model.bin +3 -0
dataset_tags.json +76 -0
training_parameters.json +37 -0

README.md ADDED Viewed

	@@ -0,0 +1,79 @@

+---
+language:
+- en
+tags:
+- llama-2
+- instruct
+- instruction
+- writing
+- story
+pipeline_tag: text-generation
+license: other
+---
+huggingface-cli repo create Waxwing-Storytelling-70B-LoRA --type model, dataset, space
+# Waxwing-Storytelling-70B-LoRA model card
+Waxwing is a storytelling lora for Llama 2 70B.
+- Guide the story with Waxwing's turn-based instruction system.
+- Tailor the feel of your story using style tags.
+- Experience storytelling free of ChatGPT's idiosyncrasies, thanks to a "human-generated" dataset of public domain writing. Waxwing avoids GPT-isms like positivity bias, "bond" emphasis, rushed endings and exaggerated stylistic tics.
+Waxwing is available:
+- LoRA: As a LoRA on this branch and can be applied at runtime on any variant of the Llama 2 70B base model.
+- 16fp model: Merged into the base Llama 2 model, in full precision in the [16fp](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/16fp) branch.
+- Quantized for used with Exllama 2:
+  - [2.5bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/2.5bpw)
+  - [3.0bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/3.0bpw)
+  - [4.65bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/4.65bpw)
+  - [6.0bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/6.0bpw)
+  - [8.0bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/8.0bpw)
+By using this model, you take full responsibility for anything done with its outputs.
+## Model Details
+### Model Description
+- **Developed by:** alac
+- **Model Type:** QLoRA
+- **Finetuned from model:** Llama-2 70B
+- **Language(s):** English
+### Dataset
+Waxwing was trained with a small dataset gathered from public domain writing. The exact dataset will remain private, but the code used to generate prompts and metadata is available on [github](https://github.com/alac/txt_to_dataset).
+Upstage's [SOLAR](https://huggingface.co/upstage/SOLAR-0-70b-16bit) model was used to tag the dataset.
+### Prompt Template
+```
+### System:
+A chat between a user and a writing assistant.
+{context}
+### User:
+{style tags}
+Write a scene where: {events that should happen in the next scene}
+### Assistant:
+{output}
+```
+`context` is an optional story synopsis.
+`style tags` should be a string along the lines of:
+```
+Tone: {list of tones}. Writing style: {list of writing styles}.
+Written with {slow|medium|fast} pacing, in moment to moment detail, in {abstract|selective|vivid sensory} detail, from a {First|Third Person (Character)} perspective.
+```
+The exact values it was trained on are in the `dataset_tags.json` file. Anecdotally, it works better with a subset of the style tags used (`Tone: tense`) or with tags that are complementary (`Tone: tense, mysterious. Writing style: dramatic. Written in abstract detail.`). It's unclear how well Waxwing responds to tags that it was not trained on (e.g. 'genre').
+For SillyTavern users, the `style tags` work well in the "Author's Note" field at depth 1. User messages should begin with `Write a scene where: `; to continue a scene, just type `continue`. Most testing was done using the [Genesis](https://github.com/SillyTavern/SillyTavern/blob/8e73882c9ba7301c9163befbe445686a79d4a9a8/public/TextGen%20Settings/NovelAI%20(Genesis).settings) preset.
+### Training
+Waxwing was trained on a single machine with 72GB of VRAM. The training parameters are available in the `training_parameters.json` file of the main branch. The software used to train was FartyPants' [Training_PRO](https://github.com/FartyPants/Training_PRO) extension for the Oobabooga Text Generation WebUI.

adapter_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "models\\Llama-2-70B-fp16",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 8,
+  "lora_dropout": 0.05,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "down_proj",
+    "o_proj",
+    "v_proj",
+    "gate_proj",
+    "k_proj",
+    "up_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b9eadd7f9239cb51f29e3896bca94d02c7639b8ac39b35ba613f33d2a13653d4
+size 828780162

dataset_tags.json ADDED Viewed

	@@ -0,0 +1,76 @@

+{
+  "tone": [
+    "melancholic",
+    "tense",
+    "dramatic",
+    "suspenseful",
+    "mysterious",
+    "humorous",
+    "somber",
+    "philosophical",
+    "bittersweet",
+    "ominous",
+    "whimsical",
+    "determined",
+    "sarcastic",
+    "intense",
+    "nostalgic",
+    "dark",
+    "adventurous",
+    "serious",
+    "emotional",
+    "surreal",
+    "thoughtful",
+    "ironic",
+    "cynical",
+    "desperate",
+    "absurd",
+    "uncertain",
+    "wry",
+    "resigned",
+    "intriguing",
+    "curious",
+    "anxious",
+    "hopeful",
+    "eerie",
+    "romantic",
+    "comedic",
+    "thrilling",
+    "action-packed"
+  ],
+  "writing style": [
+    "descriptive",
+    "detailed",
+    "poetic",
+    "vivid",
+    "imaginative",
+    "introspective",
+    "emotional",
+    "flowery",
+    "philosophical",
+    "conversational",
+    "formal",
+    "immersive",
+    "character-driven",
+    "fast-paced",
+    "evocative",
+    "dramatic",
+    "present tense",
+    "witty",
+    "dialogue-heavy",
+    "lyrical",
+    "narrative",
+    "atmospheric",
+    "analytical"
+  ],
+  "pacing": [
+    "medium",
+    "slow",
+    "fast"
+  ],
+  "sensory detail": [
+    "abstract",
+    "selective",
+    "vivid sensory"
+  ]
+}

training_parameters.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "lora_name": "Waxwing",
+  "always_override": false,
+  "save_steps": 0.0,
+  "micro_batch_size": 3,
+  "batch_size": 0,
+  "epochs": 1.0,
+  "learning_rate": "3e-4",
+  "lr_scheduler_type": "linear",
+  "lora_rank": 16,
+  "lora_alpha": 8,
+  "lora_dropout": 0.05,
+  "cutoff_len": 1280,
+  "dataset": "dataset_11.27.23",
+  "eval_dataset": "None",
+  "format": "t2d_oobabooga_training_format",
+  "eval_steps": 100.0,
+  "raw_text_file": "None",
+  "higher_rank_limit": false,
+  "warmup_steps": 100.0,
+  "optimizer": "adamw_torch_fused",
+  "hard_cut_string": "\\n\\n\\n",
+  "train_only_after": "",
+  "stop_at_loss": 0,
+  "add_eos_token": false,
+  "min_chars": 0.0,
+  "report_to": "None",
+  "precize_slicing_overlap": true,
+  "add_eos_token_type": "Every Block",
+  "save_steps_under_loss": 1.8,
+  "add_bos_token": false,
+  "training_projection": "all",
+  "sliding_window": false,
+  "warmup_ratio": 0,
+  "grad_accumulation": 4,
+  "neft_noise_alpha": 3
+}