Spaces:
Runtime error
Runtime error
zetavg
commited on
update instructions for SkyPilot
Browse files
README.md
CHANGED
@@ -42,10 +42,10 @@ After approximately 5 minutes of running, you will see the public URL in the out
|
|
42 |
After following the [installation guide of SkyPilot](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html), create a `.yaml` to define a task for running the app:
|
43 |
|
44 |
```yaml
|
45 |
-
#
|
46 |
|
47 |
resources:
|
48 |
-
accelerators: A10:1 # 1x NVIDIA A10 GPU, about US$ 0.6 / hr on Lambda Cloud.
|
49 |
cloud: lambda # Optional; if left out, SkyPilot will automatically pick the cheapest cloud.
|
50 |
|
51 |
file_mounts:
|
@@ -53,30 +53,46 @@ file_mounts:
|
|
53 |
# (to store train datasets trained models)
|
54 |
# See https://skypilot.readthedocs.io/en/latest/reference/storage.html for details.
|
55 |
/data:
|
56 |
-
name:
|
57 |
store: s3 # Could be either of [s3, gcs]
|
58 |
mode: MOUNT
|
59 |
|
60 |
# Clone the LLaMA-LoRA Tuner repo and install its dependencies.
|
61 |
setup: |
|
62 |
-
|
63 |
-
|
|
|
|
|
|
|
|
|
|
|
64 |
pip install wandb
|
65 |
-
|
|
|
|
|
|
|
66 |
echo 'Dependencies installed.'
|
67 |
-
|
68 |
-
|
|
|
69 |
|
70 |
-
# Start the app.
|
71 |
run: |
|
72 |
-
|
73 |
-
python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
```
|
75 |
|
76 |
Then launch a cluster to run the task:
|
77 |
|
78 |
```
|
79 |
-
sky launch -c
|
80 |
```
|
81 |
|
82 |
`-c ...` is an optional flag to specify a cluster name. If not specified, SkyPilot will automatically generate one.
|
@@ -87,14 +103,28 @@ Note that exiting `sky launch` will only exit log streaming and will not stop th
|
|
87 |
|
88 |
When you are done, run `sky stop <cluster_name>` to stop the cluster. To terminate a cluster instead, run `sky down <cluster_name>`.
|
89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
### Run locally
|
91 |
|
92 |
<details>
|
93 |
<summary>Prepare environment with conda</summary>
|
94 |
|
95 |
```bash
|
96 |
-
conda create -y python=3.8 -n
|
97 |
-
conda activate
|
98 |
```
|
99 |
</details>
|
100 |
|
|
|
42 |
After following the [installation guide of SkyPilot](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html), create a `.yaml` to define a task for running the app:
|
43 |
|
44 |
```yaml
|
45 |
+
# llm-tuner.yaml
|
46 |
|
47 |
resources:
|
48 |
+
accelerators: A10:1 # 1x NVIDIA A10 GPU, about US$ 0.6 / hr on Lambda Cloud. Run `sky show-gpus` for supported GPU types, and `sky show-gpus [GPU_NAME]` for the detailed information of a GPU type.
|
49 |
cloud: lambda # Optional; if left out, SkyPilot will automatically pick the cheapest cloud.
|
50 |
|
51 |
file_mounts:
|
|
|
53 |
# (to store train datasets trained models)
|
54 |
# See https://skypilot.readthedocs.io/en/latest/reference/storage.html for details.
|
55 |
/data:
|
56 |
+
name: llm-tuner-data # Make sure this name is unique or you own this bucket. If it does not exists, SkyPilot will try to create a bucket with this name.
|
57 |
store: s3 # Could be either of [s3, gcs]
|
58 |
mode: MOUNT
|
59 |
|
60 |
# Clone the LLaMA-LoRA Tuner repo and install its dependencies.
|
61 |
setup: |
|
62 |
+
conda create -q python=3.8 -n llm-tuner -y
|
63 |
+
conda activate llm-tuner
|
64 |
+
# Clone the LLaMA-LoRA Tuner repo and install its dependencies
|
65 |
+
[ ! -d llm_tuner ] && git clone https://github.com/zetavg/LLaMA-LoRA-Tuner.git llm_tuner
|
66 |
+
echo 'Installing dependencies...'
|
67 |
+
pip install -r llm_tuner/requirements.lock.txt
|
68 |
+
# Optional: install wandb to enable logging to Weights & Biases
|
69 |
pip install wandb
|
70 |
+
# Optional: patch bitsandbytes to workaround error "libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats"
|
71 |
+
BITSANDBYTES_LOCATION="$(pip show bitsandbytes | grep 'Location' | awk '{print $2}')/bitsandbytes"
|
72 |
+
[ -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so" ] && [ ! -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so.bak" ] && [ -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cuda121.so" ] && echo 'Patching bitsandbytes for GPU support...' && mv "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so" "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so.bak" && cp "$BITSANDBYTES_LOCATION/libbitsandbytes_cuda121.so" "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so"
|
73 |
+
conda install -n llm-tuner cudatoolkit -y
|
74 |
echo 'Dependencies installed.'
|
75 |
+
# Optional: pre-download models
|
76 |
+
echo "Pre-downloading base models so that you won't have to wait for long once the app is ready..."
|
77 |
+
python llm_tuner/download_base_model.py --base_model_names='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j'
|
78 |
|
79 |
+
# Start the app. `wandb_api_key` and `wandb_project_name` are optional.
|
80 |
run: |
|
81 |
+
conda activate llm-tuner
|
82 |
+
python llm_tuner/app.py \
|
83 |
+
--data_dir='/data' \
|
84 |
+
--wandb_api_key="$([ -f /data/secrets/wandb_api_key.txt ] && cat /data/secrets/wandb_api_key.txt | tr -d '\n')" \
|
85 |
+
--wandb_project_name='llm-tuner' \
|
86 |
+
--timezone='Atlantic/Reykjavik' \
|
87 |
+
--base_model='decapoda-research/llama-7b-hf' \
|
88 |
+
--base_model_choices='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j,databricks/dolly-v2-7b' \
|
89 |
+
--share
|
90 |
```
|
91 |
|
92 |
Then launch a cluster to run the task:
|
93 |
|
94 |
```
|
95 |
+
sky launch -c llm-tuner llm-tuner.yaml
|
96 |
```
|
97 |
|
98 |
`-c ...` is an optional flag to specify a cluster name. If not specified, SkyPilot will automatically generate one.
|
|
|
103 |
|
104 |
When you are done, run `sky stop <cluster_name>` to stop the cluster. To terminate a cluster instead, run `sky down <cluster_name>`.
|
105 |
|
106 |
+
**Remember to stop or shutdown the cluster when you are done to avoid incurring unexpected charges.** Run `sky cost-report` to see the cost of your clusters.
|
107 |
+
|
108 |
+
<details>
|
109 |
+
<summary>Log into the cloud machine or mount the filesystem of the cloud machine on your local computer</summary>
|
110 |
+
|
111 |
+
To log into the cloud machine, run `ssh <cluster_name>`, such as `ssh llm-tuner`.
|
112 |
+
|
113 |
+
If you have `sshfs` installed on your local machine, you can mount the filesystem of the cloud machine on your local computer by running a command like the following:
|
114 |
+
|
115 |
+
```bash
|
116 |
+
mkdir -p /tmp/llm_tuner_server && umount /tmp/llm_tuner_server || : && sshfs llm-tuner:/ /tmp/llm_tuner_server
|
117 |
+
```
|
118 |
+
</details>
|
119 |
+
|
120 |
### Run locally
|
121 |
|
122 |
<details>
|
123 |
<summary>Prepare environment with conda</summary>
|
124 |
|
125 |
```bash
|
126 |
+
conda create -y python=3.8 -n llm-tuner
|
127 |
+
conda activate llm-tuner
|
128 |
```
|
129 |
</details>
|
130 |
|