alac
commited on
Commit
•
5df80f3
1
Parent(s):
998415c
add lora and readme.md
Browse files- README.md +79 -0
- adapter_config.json +28 -0
- adapter_model.bin +3 -0
- dataset_tags.json +76 -0
- training_parameters.json +37 -0
README.md
ADDED
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- llama-2
|
6 |
+
- instruct
|
7 |
+
- instruction
|
8 |
+
- writing
|
9 |
+
- story
|
10 |
+
pipeline_tag: text-generation
|
11 |
+
license: other
|
12 |
+
---
|
13 |
+
|
14 |
+
huggingface-cli repo create Waxwing-Storytelling-70B-LoRA --type model, dataset, space
|
15 |
+
|
16 |
+
# Waxwing-Storytelling-70B-LoRA model card
|
17 |
+
|
18 |
+
Waxwing is a storytelling lora for Llama 2 70B.
|
19 |
+
- Guide the story with Waxwing's turn-based instruction system.
|
20 |
+
- Tailor the feel of your story using style tags.
|
21 |
+
- Experience storytelling free of ChatGPT's idiosyncrasies, thanks to a "human-generated" dataset of public domain writing. Waxwing avoids GPT-isms like positivity bias, "bond" emphasis, rushed endings and exaggerated stylistic tics.
|
22 |
+
|
23 |
+
Waxwing is available:
|
24 |
+
- LoRA: As a LoRA on this branch and can be applied at runtime on any variant of the Llama 2 70B base model.
|
25 |
+
- 16fp model: Merged into the base Llama 2 model, in full precision in the [16fp](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/16fp) branch.
|
26 |
+
- Quantized for used with Exllama 2:
|
27 |
+
- [2.5bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/2.5bpw)
|
28 |
+
- [3.0bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/3.0bpw)
|
29 |
+
- [4.65bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/4.65bpw)
|
30 |
+
- [6.0bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/6.0bpw)
|
31 |
+
- [8.0bpw](https://huggingface.co/alac/Waxwing-Storytelling-70B-LoRA/tree/8.0bpw)
|
32 |
+
|
33 |
+
By using this model, you take full responsibility for anything done with its outputs.
|
34 |
+
|
35 |
+
|
36 |
+
## Model Details
|
37 |
+
|
38 |
+
### Model Description
|
39 |
+
|
40 |
+
- **Developed by:** alac
|
41 |
+
- **Model Type:** QLoRA
|
42 |
+
- **Finetuned from model:** Llama-2 70B
|
43 |
+
- **Language(s):** English
|
44 |
+
|
45 |
+
|
46 |
+
### Dataset
|
47 |
+
|
48 |
+
Waxwing was trained with a small dataset gathered from public domain writing. The exact dataset will remain private, but the code used to generate prompts and metadata is available on [github](https://github.com/alac/txt_to_dataset).
|
49 |
+
Upstage's [SOLAR](https://huggingface.co/upstage/SOLAR-0-70b-16bit) model was used to tag the dataset.
|
50 |
+
|
51 |
+
|
52 |
+
### Prompt Template
|
53 |
+
|
54 |
+
```
|
55 |
+
### System:
|
56 |
+
A chat between a user and a writing assistant.
|
57 |
+
{context}
|
58 |
+
|
59 |
+
### User:
|
60 |
+
{style tags}
|
61 |
+
Write a scene where: {events that should happen in the next scene}
|
62 |
+
|
63 |
+
### Assistant:
|
64 |
+
{output}
|
65 |
+
```
|
66 |
+
`context` is an optional story synopsis.
|
67 |
+
`style tags` should be a string along the lines of:
|
68 |
+
```
|
69 |
+
Tone: {list of tones}. Writing style: {list of writing styles}.
|
70 |
+
Written with {slow|medium|fast} pacing, in moment to moment detail, in {abstract|selective|vivid sensory} detail, from a {First|Third Person (Character)} perspective.
|
71 |
+
```
|
72 |
+
The exact values it was trained on are in the `dataset_tags.json` file. Anecdotally, it works better with a subset of the style tags used (`Tone: tense`) or with tags that are complementary (`Tone: tense, mysterious. Writing style: dramatic. Written in abstract detail.`). It's unclear how well Waxwing responds to tags that it was not trained on (e.g. 'genre').
|
73 |
+
|
74 |
+
For SillyTavern users, the `style tags` work well in the "Author's Note" field at depth 1. User messages should begin with `Write a scene where: `; to continue a scene, just type `continue`. Most testing was done using the [Genesis](https://github.com/SillyTavern/SillyTavern/blob/8e73882c9ba7301c9163befbe445686a79d4a9a8/public/TextGen%20Settings/NovelAI%20(Genesis).settings) preset.
|
75 |
+
|
76 |
+
|
77 |
+
### Training
|
78 |
+
|
79 |
+
Waxwing was trained on a single machine with 72GB of VRAM. The training parameters are available in the `training_parameters.json` file of the main branch. The software used to train was FartyPants' [Training_PRO](https://github.com/FartyPants/Training_PRO) extension for the Oobabooga Text Generation WebUI.
|
adapter_config.json
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"alpha_pattern": {},
|
3 |
+
"auto_mapping": null,
|
4 |
+
"base_model_name_or_path": "models\\Llama-2-70B-fp16",
|
5 |
+
"bias": "none",
|
6 |
+
"fan_in_fan_out": false,
|
7 |
+
"inference_mode": true,
|
8 |
+
"init_lora_weights": true,
|
9 |
+
"layers_pattern": null,
|
10 |
+
"layers_to_transform": null,
|
11 |
+
"lora_alpha": 8,
|
12 |
+
"lora_dropout": 0.05,
|
13 |
+
"modules_to_save": null,
|
14 |
+
"peft_type": "LORA",
|
15 |
+
"r": 16,
|
16 |
+
"rank_pattern": {},
|
17 |
+
"revision": null,
|
18 |
+
"target_modules": [
|
19 |
+
"q_proj",
|
20 |
+
"down_proj",
|
21 |
+
"o_proj",
|
22 |
+
"v_proj",
|
23 |
+
"gate_proj",
|
24 |
+
"k_proj",
|
25 |
+
"up_proj"
|
26 |
+
],
|
27 |
+
"task_type": "CAUSAL_LM"
|
28 |
+
}
|
adapter_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b9eadd7f9239cb51f29e3896bca94d02c7639b8ac39b35ba613f33d2a13653d4
|
3 |
+
size 828780162
|
dataset_tags.json
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"tone": [
|
3 |
+
"melancholic",
|
4 |
+
"tense",
|
5 |
+
"dramatic",
|
6 |
+
"suspenseful",
|
7 |
+
"mysterious",
|
8 |
+
"humorous",
|
9 |
+
"somber",
|
10 |
+
"philosophical",
|
11 |
+
"bittersweet",
|
12 |
+
"ominous",
|
13 |
+
"whimsical",
|
14 |
+
"determined",
|
15 |
+
"sarcastic",
|
16 |
+
"intense",
|
17 |
+
"nostalgic",
|
18 |
+
"dark",
|
19 |
+
"adventurous",
|
20 |
+
"serious",
|
21 |
+
"emotional",
|
22 |
+
"surreal",
|
23 |
+
"thoughtful",
|
24 |
+
"ironic",
|
25 |
+
"cynical",
|
26 |
+
"desperate",
|
27 |
+
"absurd",
|
28 |
+
"uncertain",
|
29 |
+
"wry",
|
30 |
+
"resigned",
|
31 |
+
"intriguing",
|
32 |
+
"curious",
|
33 |
+
"anxious",
|
34 |
+
"hopeful",
|
35 |
+
"eerie",
|
36 |
+
"romantic",
|
37 |
+
"comedic",
|
38 |
+
"thrilling",
|
39 |
+
"action-packed"
|
40 |
+
],
|
41 |
+
"writing style": [
|
42 |
+
"descriptive",
|
43 |
+
"detailed",
|
44 |
+
"poetic",
|
45 |
+
"vivid",
|
46 |
+
"imaginative",
|
47 |
+
"introspective",
|
48 |
+
"emotional",
|
49 |
+
"flowery",
|
50 |
+
"philosophical",
|
51 |
+
"conversational",
|
52 |
+
"formal",
|
53 |
+
"immersive",
|
54 |
+
"character-driven",
|
55 |
+
"fast-paced",
|
56 |
+
"evocative",
|
57 |
+
"dramatic",
|
58 |
+
"present tense",
|
59 |
+
"witty",
|
60 |
+
"dialogue-heavy",
|
61 |
+
"lyrical",
|
62 |
+
"narrative",
|
63 |
+
"atmospheric",
|
64 |
+
"analytical"
|
65 |
+
],
|
66 |
+
"pacing": [
|
67 |
+
"medium",
|
68 |
+
"slow",
|
69 |
+
"fast"
|
70 |
+
],
|
71 |
+
"sensory detail": [
|
72 |
+
"abstract",
|
73 |
+
"selective",
|
74 |
+
"vivid sensory"
|
75 |
+
]
|
76 |
+
}
|
training_parameters.json
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"lora_name": "Waxwing",
|
3 |
+
"always_override": false,
|
4 |
+
"save_steps": 0.0,
|
5 |
+
"micro_batch_size": 3,
|
6 |
+
"batch_size": 0,
|
7 |
+
"epochs": 1.0,
|
8 |
+
"learning_rate": "3e-4",
|
9 |
+
"lr_scheduler_type": "linear",
|
10 |
+
"lora_rank": 16,
|
11 |
+
"lora_alpha": 8,
|
12 |
+
"lora_dropout": 0.05,
|
13 |
+
"cutoff_len": 1280,
|
14 |
+
"dataset": "dataset_11.27.23",
|
15 |
+
"eval_dataset": "None",
|
16 |
+
"format": "t2d_oobabooga_training_format",
|
17 |
+
"eval_steps": 100.0,
|
18 |
+
"raw_text_file": "None",
|
19 |
+
"higher_rank_limit": false,
|
20 |
+
"warmup_steps": 100.0,
|
21 |
+
"optimizer": "adamw_torch_fused",
|
22 |
+
"hard_cut_string": "\\n\\n\\n",
|
23 |
+
"train_only_after": "",
|
24 |
+
"stop_at_loss": 0,
|
25 |
+
"add_eos_token": false,
|
26 |
+
"min_chars": 0.0,
|
27 |
+
"report_to": "None",
|
28 |
+
"precize_slicing_overlap": true,
|
29 |
+
"add_eos_token_type": "Every Block",
|
30 |
+
"save_steps_under_loss": 1.8,
|
31 |
+
"add_bos_token": false,
|
32 |
+
"training_projection": "all",
|
33 |
+
"sliding_window": false,
|
34 |
+
"warmup_ratio": 0,
|
35 |
+
"grad_accumulation": 4,
|
36 |
+
"neft_noise_alpha": 3
|
37 |
+
}
|