bhenrym14
/

airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ

Text Generation

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Jul 3, 2023

Commit

68c45b2

•

1 Parent(s): 2453b75

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -20,10 +20,10 @@ The easiest way is to use [oobabooga text-generation-webui](https://github.com/o
 Recent advancements in extending context by RoPE scaling ([kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) demonstrate the ability to extend the context window without (total) retraining. Finetuning has shown to be necessary to properly leverage the longer context. The superHOT LoRA is an adapter that has been finetuned on longer context (8192 tokens); even when applied to models trained on dissimilar datasets, it successfully extends the context window to which the model can attend. While it's impressive this adapter is so flexible, how much does performance suffer relative to a model that has been finetuned with the scaled embeddings from the start? This is an experiment to explore this.
 ## Relative Performance (perplexity)
-| Model                                                | Context     | Perplexity |
 | ---------------------------------------------------- | ----------- | ---------- |
 | TheBloke/airoboros-33B-gpt4-1-4-SuperHOT-8K-GPTQ     | 2048        | 5.15       |
-| TheBloke/airoboros-33B-gpt4-1-4-SuperHOT-8K-GPTQ     | 8192        | 5.04       |
 | **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ**    | **2048**    | **4.32**   |
 | **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ**    | **3072**    | **4.26**   |

 Recent advancements in extending context by RoPE scaling ([kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) demonstrate the ability to extend the context window without (total) retraining. Finetuning has shown to be necessary to properly leverage the longer context. The superHOT LoRA is an adapter that has been finetuned on longer context (8192 tokens); even when applied to models trained on dissimilar datasets, it successfully extends the context window to which the model can attend. While it's impressive this adapter is so flexible, how much does performance suffer relative to a model that has been finetuned with the scaled embeddings from the start? This is an experiment to explore this.
 ## Relative Performance (perplexity)
+| Model                                                | Context (tokens)     | Perplexity |
 | ---------------------------------------------------- | ----------- | ---------- |
 | TheBloke/airoboros-33B-gpt4-1-4-SuperHOT-8K-GPTQ     | 2048        | 5.15       |
+| TheBloke/airoboros-33B-gpt4-1-4-SuperHOT-8K-GPTQ     | 3072        | 5.04       |
 | **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ**    | **2048**    | **4.32**   |
 | **bhenrym14/airoboros-33b-gpt4-1.4.1-PI-8192-GPTQ**    | **3072**    | **4.26**   |