--- license: apache-2.0 tags: - merge - mergekit - lazymergekit - NousResearch/Nous-Hermes-2-Yi-34B - jondurbin/bagel-dpo-34b-v0.2 --- # HermesBagel-34B-v0.1 HermesBagel-34B-v0.1 is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): * [NousResearch/Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) * [jondurbin/bagel-dpo-34b-v0.2](https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2) ## 🧩 Configuration ```yaml slices: - sources: - model: NousResearch/Nous-Hermes-2-Yi-34B layer_range: [0, 60] - model: jondurbin/bagel-dpo-34b-v0.2 layer_range: [0, 60] merge_method: slerp base_model: NousResearch/Nous-Hermes-2-Yi-34B parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: bfloat16 ``` ## Basic Usage
Setup ```python !pip install -qU transformers accelerate bitsandbytes from transformers import ( AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig ) import torch model = "dfurman/HermesBagel-34B-v0.1" nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 ) tokenizer = AutoTokenizer.from_pretrained(model) model = AutoModelForCausalLM.from_pretrained( model, torch_dtype=torch.bfloat16, device_map="auto", quantization_config=nf4_config, ) ```
```python messages = [ {"role": "user", "content": "What is a large language model?"}, ] print("\n\n*** Prompt:") input_ids = tokenizer.apply_chat_template( messages, tokenize=True, return_tensors="pt", ) print(tokenizer.decode(input_ids[0])) print("\n\n*** Generate:") with torch.autocast("cuda", dtype=torch.bfloat16): output = model.generate( input_ids=input_ids.to("cuda"), max_new_tokens=256, return_dict_in_generate=True, ) response = tokenizer.decode( output["sequences"][0][len(input_ids[0]):], skip_special_tokens=True ) print(response) ```