bhenrym14 commited on
Commit
c677cd6
1 Parent(s): 222e6a4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - jondurbin/airoboros-gpt4-1.4.1
4
+ ---
5
+ # NTK-Aware Scaled RoPE QLoRA Finetune of airoboros-33b-gpt4-1.4.1 (LoRA)
6
+
7
+ LoRA Weights can be found here: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-NTK-16384-GPTQ
8
+
9
+ fp16 weights can be found here: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-NTK-16384-fp16
10
+
11
+ Analogue with RoPE Position Interpolation (PI) technique: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-LoRA
12
+
13
+ ## Overview
14
+
15
+ This is [Jon Durbin's Airoboros 33B GPT4 1.4](https://huggingface.co/jondurbin/airoboros-33b-gpt4-1.4) (LoRA) with several key modifications:
16
+ - Context length extended to 16384 by NTK-Aware Scaled RoPE Embeddings, but NOT via the superHOT LoRA. I started with base Llama-33b.
17
+ - Training sequences beyond 2048 have the target truncated to equal 2048.
18
+ - Used airoboros-gpt4-1.4.1 dataset instead of airoboros-gpt4-1.4
19
+
20
+ Otherwise, I emulated the training process as closely as possible (rank 64 QLoRA) It was trained on 1x RTX 6000 Ada for ~43 hours.