FinchResearch
/

llama2-archimedes-13b-lora

Question Answering

Model card Files Files and versions Community

Marcus Cedric R. Idia commited on Jul 27, 2023

Commit

75616c9

·

1 Parent(s): 123314f

Update README.md

Files changed (1) hide show

README.md +71 -21

README.md CHANGED Viewed

@@ -1,34 +1,84 @@
 ---
 library_name: peft
 datasets:
-- harpyerr/re-merged-pf-2
 - tatsu-lab/alpaca
 - BI55/MedText
-- databricks/databricks-dolly-15k
-- timdettmers/openassistant-guanaco
 language:
 - en
 pipeline_tag: question-answering
-license: mit
-tags:
-- language
-- conversational
-- questionanswering
 ---
-## Training procedure
-The following `bitsandbytes` quantization config was used during training:
-- load_in_8bit: False
-- load_in_4bit: True
-- llm_int8_threshold: 6.0
-- llm_int8_skip_modules: None
-- llm_int8_enable_fp32_cpu_offload: False
-- llm_int8_has_fp16_weight: False
-- bnb_4bit_quant_type: nf4
-- bnb_4bit_use_double_quant: False
-- bnb_4bit_compute_dtype: float16
-### Framework versions
-- PEFT 0.5.0.dev0

 ---
 library_name: peft
+license: mit
 datasets:
+- timdettmers/openassistant-guanaco
 - tatsu-lab/alpaca
 - BI55/MedText
 language:
 - en
 pipeline_tag: question-answering
 ---
+ Here is a README.md explaining how to run the Archimedes model locally:
+# Archimedes Model
+This README provides instructions for running the Archimedes conversational AI assistant locally.
+## Requirements
+- Python 3.6+
+- [Transformers](https://huggingface.co/docs/transformers/installation)
+- [Peft](https://github.com/hazyresearch/peft)
+- PyTorch
+- Access to the LLAMA 2 model files or a cloned public model
+Install requirements:
+```
+!pip install transformers
+!pip install peft
+!pip install torch
+!pip install datasets
+!pip install bitsandbytes
+```
+## Usage
+```python
+import transformers
+from peft import LoraConfig, get_peft_model
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+login() # Need access to the gated model.
+# Load LLAMA 2 model
+model_name = "meta-llama/Llama-2-7b-chat-hf"
+# Quantization configuration
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+# Load model
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    quantization_config=bnb_config,
+    trust_remote_code=True
+)
+# Load LoRA configuration
+lora_config = LoraConfig.from_pretrained('harpyerr/archimedes-300s-7b-chat')
+model = get_peft_model(model, lora_config)
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+tokenizer.pad_token = tokenizer.eos_token
+# Define prompt
+text = "Can you tell me who made Space-X?"
+prompt = "You are a helpful assistant. Please provide an informative response. \n\n" + text
+# Generate response
+device = "cuda:0"
+inputs = tokenizer(prompt, return_tensors="pt").to(device)
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+This loads the LLAMA 2 model, applies 4-bit quantization and LoRA optimizations, constructs a prompt, and generates a response.
+See the [docs](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM) for more details.