Question Answering
PEFT
English
Marcus Cedric R. Idia commited on
Commit
75616c9
1 Parent(s): 123314f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -21
README.md CHANGED
@@ -1,34 +1,84 @@
1
  ---
2
  library_name: peft
 
3
  datasets:
4
- - harpyerr/re-merged-pf-2
5
  - tatsu-lab/alpaca
6
  - BI55/MedText
7
- - databricks/databricks-dolly-15k
8
- - timdettmers/openassistant-guanaco
9
  language:
10
  - en
11
  pipeline_tag: question-answering
12
- license: mit
13
- tags:
14
- - language
15
- - conversational
16
- - questionanswering
17
  ---
18
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
 
 
 
20
 
21
- The following `bitsandbytes` quantization config was used during training:
22
- - load_in_8bit: False
23
- - load_in_4bit: True
24
- - llm_int8_threshold: 6.0
25
- - llm_int8_skip_modules: None
26
- - llm_int8_enable_fp32_cpu_offload: False
27
- - llm_int8_has_fp16_weight: False
28
- - bnb_4bit_quant_type: nf4
29
- - bnb_4bit_use_double_quant: False
30
- - bnb_4bit_compute_dtype: float16
31
- ### Framework versions
32
 
 
33
 
34
- - PEFT 0.5.0.dev0
 
1
  ---
2
  library_name: peft
3
+ license: mit
4
  datasets:
5
+ - timdettmers/openassistant-guanaco
6
  - tatsu-lab/alpaca
7
  - BI55/MedText
 
 
8
  language:
9
  - en
10
  pipeline_tag: question-answering
 
 
 
 
 
11
  ---
12
+ Here is a README.md explaining how to run the Archimedes model locally:
13
+
14
+ # Archimedes Model
15
+
16
+ This README provides instructions for running the Archimedes conversational AI assistant locally.
17
+
18
+ ## Requirements
19
+
20
+ - Python 3.6+
21
+ - [Transformers](https://huggingface.co/docs/transformers/installation)
22
+ - [Peft](https://github.com/hazyresearch/peft)
23
+ - PyTorch
24
+ - Access to the LLAMA 2 model files or a cloned public model
25
+
26
+ Install requirements:
27
+
28
+ ```
29
+ !pip install transformers
30
+ !pip install peft
31
+ !pip install torch
32
+ !pip install datasets
33
+ !pip install bitsandbytes
34
+ ```
35
+
36
+ ## Usage
37
+
38
+ ```python
39
+ import transformers
40
+ from peft import LoraConfig, get_peft_model
41
+ import torch
42
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
43
+
44
+ login() # Need access to the gated model.
45
+
46
+ # Load LLAMA 2 model
47
+ model_name = "meta-llama/Llama-2-7b-chat-hf"
48
+
49
+ # Quantization configuration
50
+ bnb_config = BitsAndBytesConfig(
51
+ load_in_4bit=True,
52
+ bnb_4bit_quant_type="nf4",
53
+ bnb_4bit_compute_dtype=torch.float16,
54
+ )
55
+
56
+ # Load model
57
+ model = AutoModelForCausalLM.from_pretrained(
58
+ model_name,
59
+ quantization_config=bnb_config,
60
+ trust_remote_code=True
61
+ )
62
+
63
+ # Load LoRA configuration
64
+ lora_config = LoraConfig.from_pretrained('harpyerr/archimedes-300s-7b-chat')
65
+ model = get_peft_model(model, lora_config)
66
+
67
+ # Load tokenizer
68
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
69
+ tokenizer.pad_token = tokenizer.eos_token
70
 
71
+ # Define prompt
72
+ text = "Can you tell me who made Space-X?"
73
+ prompt = "You are a helpful assistant. Please provide an informative response. \n\n" + text
74
 
75
+ # Generate response
76
+ device = "cuda:0"
77
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
78
+ outputs = model.generate(**inputs, max_new_tokens=100)
79
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
80
+ ```
 
 
 
 
 
81
 
82
+ This loads the LLAMA 2 model, applies 4-bit quantization and LoRA optimizations, constructs a prompt, and generates a response.
83
 
84
+ See the [docs](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM) for more details.