brucethemoose
/

Yi-34B-200K-DARE-megamerge-v8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

brucethemoose commited on Jan 15, 2024

Commit

2750ab8

·

verified ·

1 Parent(s): 63eba31

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -32,6 +32,8 @@ Being a Yi model, run a lower temperature with 0.05 or higher MinP, a little rep
 I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw. I've upload my own fiction-oriented quantizations here: https://huggingface.co/collections/brucethemoose/most-recent-merge-65742644ca03b6c514afa204
 To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.

 I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw. I've upload my own fiction-oriented quantizations here: https://huggingface.co/collections/brucethemoose/most-recent-merge-65742644ca03b6c514afa204
+Lonestriker has also uploaded more general purpose quantizations here: https://huggingface.co/models?sort=trending&search=LoneStriker+Yi-34B-200K-DARE-megamerge-v8
 To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.