jordiclive
/

lora-llama-33B-alpaca_gpt4-dolly_15k-vicuna-r64

Text Generation

English

sft

Model card Files Files and versions Community

jordiclive commited on Jun 1, 2023

Commit

1ea9dde

1 Parent(s): 6ba3710

Create README.md

Browse files

Files changed (1) hide show

README.md +169 -0

README.md ADDED Viewed

	@@ -0,0 +1,169 @@

+---
+license: mit
+datasets:
+- sahil2801/CodeAlpaca-20k
+- yahma/alpaca-cleaned
+- databricks/databricks-dolly-15k
+- OpenAssistant/oasst1
+- jeffwan/sharegpt_vicuna
+- qwedsacf/grade-school-math-instructions
+- vicgalle/alpaca-gpt4
+language:
+- en
+tags:
+- sft
+pipeline_tag: text-generation
+widget:
+- text: >-
+    <|prompter|>What is a meme, and what's the history behind this
+    word?</s><|assistant|>
+- text: <|prompter|>What's the Earth total population</s><|assistant|>
+- text: <|prompter|>Write a story about future of AI development</s><|assistant|>
+---
+# LoRA Adapter for LLaMA 33B 'pre-trained' on several datasets part of the OpenAssistant project
+This repo contains a low-rank adapter for **LLaMA 33B** fit on datasets part of the OpenAssistant project.
+The model was trained with flash attention and gradient checkpointing and deepspeed stage 2 on 8 x A100 80gb
+## Dataset Details
+- sahil2801/CodeAlpaca-20k
+- yahma/alpaca-cleaned
+- databricks/databricks-dolly-15k
+- OpenAssistant/oasst1
+- jeffwan/sharegpt_vicuna
+- qwedsacf/grade-school-math-instructions
+- vicgalle/alpaca-gpt4
+## Model Details
+- **Developed** as part of the OpenAssistant Project
+- **Model type:** PEFT Adapter for frozen LLaMA
+- **Language:** English
+- Epochs: 1
+- Batch size: 128
+- Max Length: 2048
+- Learning rate: 5e-5
+- Lora _r_: 16
+- Lora Alpha: 32
+## Prompting
+Two special tokens are used to mark the beginning of user and assistant turns:
+`<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
+Input prompt example:
+```
+<|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
+```
+The input ends with the `<|assistant|>` token to signal that the model should
+start generating the assistant reply.
+# Example Inference Code (Note several embeddings need to be loaded along with the LoRA weights):
+```
+from pathlib import Path
+import torch
+import transformers
+from huggingface_hub import hf_hub_download
+from peft import PeftModel
+from transformers import GenerationConfig
+device = "cuda" if torch.cuda.is_available() else "cpu"
+dtype = torch.float16
+repo_id = "jordiclive/alpaca_gpt4-dolly_15k-vicuna-lora-30b-r64"
+base_model = "decapoda-research/llama-30b-hf"
+# Model Loading
+def transfer_embeddings(model, embed_path, tokenizer):
+    old_embeddings = model.get_input_embeddings()
+    old_num_tokens, old_embedding_dim = old_embeddings.weight.size()
+    new_embeddings = torch.nn.Embedding(old_num_tokens, old_embedding_dim)
+    new_embeddings.to(old_embeddings.weight.device, dtype=old_embeddings.weight.dtype)
+    model._init_weights(new_embeddings)
+    embed_weights = torch.load(embed_path, map_location=old_embeddings.weight.device)
+    vocab_size = tokenizer.vocab_size
+    new_embeddings.weight.data[:vocab_size, :] = old_embeddings.weight.data[:vocab_size, :]
+    new_embeddings.weight.data[vocab_size : vocab_size + embed_weights.shape[0], :] = embed_weights.weight.data.to(
+        new_embeddings.weight.dtype
+    ).to(new_embeddings.weight.device)
+    model.set_input_embeddings(new_embeddings)
+    model.tie_weights()
+def load_peft_model(model, peft_model_path, tokenizer):
+    embed_weights = hf_hub_download(peft_model_path, "extra_embeddings.pt")
+    model.resize_token_embeddings(tokenizer.vocab_size + embed_weights.shape[0])
+    model.config.eos_token_id = tokenizer.eos_token_id
+    model.config.bos_token_id = tokenizer.bos_token_id
+    model.config.pad_token_id = tokenizer.pad_token_id
+    model = PeftModel.from_pretrained(
+        model,
+        model_id=peft_model_path,
+        torch_dtype=model.dtype,
+    )
+    model.eos_token_id = tokenizer.eos_token_id
+    transfer_embeddings(model, Path(peft_model_path).joinpath("extra_embeddings.pt"), tokenizer)
+    return model
+tokenizer = transformers.AutoTokenizer.from_pretrained(repo_id)
+model = transformers.AutoModelForCausalLM.from_pretrained(
+    base_model, torch_dtype=dtype, trust_remote_code=True, cache_dir="/mnt/data/jordiclive/data_cache"
+)
+model = load_peft_model(model, repo_id, tokenizer)
+# device  configuration
+model = model.to(device)
+# Choose Generation parameters
+generation_config = GenerationConfig(
+    temperature=0.1,
+    top_p=0.75,
+    top_k=40,
+    num_beams=4,
+)
+def format_system_prompt(prompt, eos_token="</s>"):
+    return "{}{}{}{}".format("<|prompter|>", prompt, eos_token, "<|assistant|>")
+def generate(prompt, generation_config=generation_config, max_new_tokens=2048, device=device):
+    prompt = format_system_prompt(prompt)  # OpenAssistant Prompt Format expected
+    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
+    with torch.no_grad():
+        generation_output = model.generate(
+            input_ids=input_ids,
+            generation_config=generation_config,
+            return_dict_in_generate=True,
+            output_scores=True,
+            max_new_tokens=max_new_tokens,
+            eos_token_id=model.eos_token_id,
+        )
+    s = generation_output.sequences[0]
+    output = tokenizer.decode(s)
+    print("Text generated:")
+    print(output)
+    return output
+generate("What is a meme, and what's the history behind this word?")
+generate("What's the Earth total population")
+generate("Write a story about future of AI development")
+```