Ayush Chaurasia glenn-jocher commited on
Commit
75d323d
·
unverified ·
1 Parent(s): 11e4aeb

W&B: Add advanced features tutorial (#4384)

Browse files

* Improve docstrings and run names

* default wandb login prompt with timeout

* return key

* Update api_key check logic

* Properly support zipped dataset feature

* update docstring

* Revert tuorial change

* extend changes to log_dataset

* add run name

* bug fix

* bug fix

* Update comment

* fix import check

* remove unused import

* Hardcore .yaml file extension

* reduce code

* Reformat using pycharm

* Remove redundant try catch

* More refactoring and bug fixes

* retry

* Reformat using pycharm

* respect LOGGERS include list

* Initial readme update

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Glenn Jocher <[email protected]>

Files changed (1) hide show
  1. utils/loggers/wandb/README.md +140 -0
utils/loggers/wandb/README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 📚 This guide explains how to use **Weights & Biases** (W&B) with YOLOv5 🚀.
2
+ * [About Weights & Biases](#about-weights-&-biases)
3
+ * [First-Time Setup](#first-time-setup)
4
+ * [Viewing runs](#viewing-runs)
5
+ * [Advanced Usage: Dataset Versioning and Evaluation](#advanced-usage)
6
+ * [Reports: Share your work with the world!](#reports)
7
+
8
+ ## About Weights & Biases
9
+ Think of [W&B](https://wandb.ai/site?utm_campaign=repo_yolo_wandbtutorial) like GitHub for machine learning models. With a few lines of code, save everything you need to debug, compare and reproduce your models — architecture, hyperparameters, git commits, model weights, GPU usage, and even datasets and predictions.
10
+
11
+ Used by top researchers including teams at OpenAI, Lyft, Github, and MILA, W&B is part of the new standard of best practices for machine learning. How W&B can help you optimize your machine learning workflows:
12
+
13
+ * [Debug](https://wandb.ai/wandb/getting-started/reports/Visualize-Debug-Machine-Learning-Models--VmlldzoyNzY5MDk#Free-2) model performance in real time
14
+ * [GPU usage](https://wandb.ai/wandb/getting-started/reports/Visualize-Debug-Machine-Learning-Models--VmlldzoyNzY5MDk#System-4), visualized automatically
15
+ * [Custom charts](https://wandb.ai/wandb/customizable-charts/reports/Powerful-Custom-Charts-To-Debug-Model-Peformance--VmlldzoyNzY4ODI) for powerful, extensible visualization
16
+ * [Share insights](https://wandb.ai/wandb/getting-started/reports/Visualize-Debug-Machine-Learning-Models--VmlldzoyNzY5MDk#Share-8) interactively with collaborators
17
+ * [Optimize hyperparameters](https://docs.wandb.com/sweeps) efficiently
18
+ * [Track](https://docs.wandb.com/artifacts) datasets, pipelines, and production models
19
+
20
+ ## First-Time Setup
21
+ <details open>
22
+ <summary> Toggle Details </summary>
23
+ When you first train, W&B will prompt you to create a new account and will generate an **API key** for you. If you are an existing user you can retrieve your key from https://wandb.ai/authorize. This key is used to tell W&B where to log your data. You only need to supply your key once, and then it is remembered on the same device.
24
+
25
+ W&B will create a cloud **project** (default is 'YOLOv5') for your training runs, and each new training run will be provided a unique run **name** within that project as project/name. You can also manually set your project and run name as:
26
+
27
+ ```shell
28
+ $ python train.py --project ... --name ...
29
+ ```
30
+
31
+ <img alt="" width="800" src="https://user-images.githubusercontent.com/26833433/98183367-4acbc600-1f08-11eb-9a23-7266a4192355.jpg">
32
+ </details>
33
+
34
+ ## Viewing Runs
35
+ <details open>
36
+ <summary> Toggle Details </summary>
37
+ Run information streams from your environment to the W&B cloud console as you train. This allows you to monitor and even cancel runs in <b>realtime</b> . All important information is logged:
38
+
39
+ * Training & Validation losses
40
+ * Metrics: Precision, Recall, [email protected], [email protected]:0.95
41
+ * Learning Rate over time
42
+ * A bounding box debugging panel, showing the training progress over time
43
+ * GPU: Type, **GPU Utilization**, power, temperature, **CUDA memory usage**
44
+ * System: Disk I/0, CPU utilization, RAM memory usage
45
+ * Your trained model as W&B Artifact
46
+ * Environment: OS and Python types, Git repository and state, **training command**
47
+
48
+ <img alt="" width="800" src="https://user-images.githubusercontent.com/26833433/98184457-bd3da580-1f0a-11eb-8461-95d908a71893.jpg">
49
+ </details>
50
+
51
+ ## Advanced Usage
52
+ You can leverage W&B artifacts and Tables integration to easily visualize and manage your datasets, models and training evaluations. Here are some quick examples to get you started.
53
+ <details open>
54
+ <h3>1. Visualize and Version Datasets</h3>
55
+ Log, visualize, dynamically query, and understand your data with <a href='https://docs.wandb.ai/guides/data-vis/tables'>W&B Tables</a>. You can use the following command to log your dataset as a W&B Table. This will generate a <code>{dataset}_wandb.yaml</code> file which can be used to train from dataset artifact.
56
+ <details>
57
+ <summary> <b>Usage</b> </summary>
58
+ <b>Code</b> <code> $ python utils/logger/wandb/log_dataset.py --project ... --name ... --data .. </code>
59
+
60
+ ![Screenshot (64)](https://user-images.githubusercontent.com/15766192/128486078-d8433890-98a3-4d12-8986-b6c0e3fc64b9.png)
61
+ </details>
62
+
63
+ <h3> 2: Train and Log Evaluation simultaneousy </h3>
64
+ This is an extension of the previous section, but it'll also training after uploading the dataset. <b> This also evaluation Table</b>
65
+ Evaluation table compares your predictions and ground truths across the validation set for each epoch. It uses the references to the already uploaded datasets,
66
+ so no images will be uploaded from your system more than once.
67
+ <details>
68
+ <summary> <b>Usage</b> </summary>
69
+ <b>Code</b> <code> $ python utils/logger/wandb/log_dataset.py --data .. --upload_data </code>
70
+
71
+ ![Screenshot (72)](https://user-images.githubusercontent.com/15766192/128979739-4cf63aeb-a76f-483f-8861-1c0100b938a5.png)
72
+ </details>
73
+
74
+ <h3> 3: Train using dataset artifact </h3>
75
+ When you upload a dataset as described in the first section, you get a new config file with an added `_wandb` to its name. This file contains the information that
76
+ can be used to train a model directly from the dataset artifact. <b> This also logs evaluation </b>
77
+ <details>
78
+ <summary> <b>Usage</b> </summary>
79
+ <b>Code</b> <code> $ python utils/logger/wandb/log_dataset.py --data {data}_wandb.yaml </code>
80
+
81
+ ![Screenshot (72)](https://user-images.githubusercontent.com/15766192/128979739-4cf63aeb-a76f-483f-8861-1c0100b938a5.png)
82
+ </details>
83
+
84
+ <h3> 4: Save model checkpoints as artifacts </h3>
85
+ To enable saving and versioning checkpoints of your experiment, pass `--save_period n` with the base cammand, where `n` represents checkpoint interval.
86
+ You can also log both the dataset and model checkpoints simultaneously. If not passed, only the final model will be logged
87
+
88
+ <details>
89
+ <summary> <b>Usage</b> </summary>
90
+ <b>Code</b> <code> $ python train.py --save_period 1 </code>
91
+
92
+ ![Screenshot (68)](https://user-images.githubusercontent.com/15766192/128726138-ec6c1f60-639d-437d-b4ee-3acd9de47ef3.png)
93
+ </details>
94
+
95
+ </details>
96
+
97
+ <h3> 5: Resume runs from checkpoint artifacts. </h3>
98
+ Any run can be resumed using artifacts if the <code>--resume</code> argument starts with <code>wandb-artifact://</code> prefix followed by the run path, i.e, <code>wandb-artifact://username/project/runid </code>. This doesn't require the model checkpoint to be present on the local system.
99
+
100
+ <details>
101
+ <summary> <b>Usage</b> </summary>
102
+ <b>Code</b> <code> $ python train.py --resume wandb-artifact://{run_path} </code>
103
+
104
+ ![Screenshot (70)](https://user-images.githubusercontent.com/15766192/128728988-4e84b355-6c87-41ae-a591-14aecf45343e.png)
105
+ </details>
106
+
107
+ <h3> 6: Resume runs from dataset artifact & checkpoint artifacts. </h3>
108
+ <b> Local dataset or model checkpoints are not required. This can be used to resume runs directly on a different device </b>
109
+ The syntax is same as the previous section, but you'll need to lof both the dataset and model checkpoints as artifacts, i.e, set bot <code>--upload_dataset</code> or
110
+ train from <code>_wandb.yaml</code> file and set <code>--save_period</code>
111
+
112
+ <details>
113
+ <summary> <b>Usage</b> </summary>
114
+ <b>Code</b> <code> $ python train.py --resume wandb-artifact://{run_path} </code>
115
+
116
+ ![Screenshot (70)](https://user-images.githubusercontent.com/15766192/128728988-4e84b355-6c87-41ae-a591-14aecf45343e.png)
117
+ </details>
118
+
119
+ </details>
120
+
121
+
122
+
123
+ <h3> Reports </h3>
124
+ W&B Reports can be created from your saved runs for sharing online. Once a report is created you will receive a link you can use to publically share your results. Here is an example report created from the COCO128 tutorial trainings of all four YOLOv5 models ([link](https://wandb.ai/glenn-jocher/yolov5_tutorial/reports/YOLOv5-COCO128-Tutorial-Results--VmlldzozMDI5OTY)).
125
+
126
+ <img alt="" width="800" src="https://user-images.githubusercontent.com/26833433/98185222-794ba000-1f0c-11eb-850f-3e9c45ad6949.jpg">
127
+
128
+ ## Environments
129
+ YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including [CUDA](https://developer.nvidia.com/cuda)/[CUDNN](https://developer.nvidia.com/cudnn), [Python](https://www.python.org/) and [PyTorch](https://pytorch.org/) preinstalled):
130
+
131
+ * **Google Colab and Kaggle** notebooks with free GPU: [![Open In Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb) [![Open In Kaggle](https://camo.githubusercontent.com/a08ca511178e691ace596a95d334f73cf4ce06e83a5c4a5169b8bb68cac27bef/68747470733a2f2f6b6167676c652e636f6d2f7374617469632f696d616765732f6f70656e2d696e2d6b6167676c652e737667)](https://www.kaggle.com/ultralytics/yolov5)
132
+ * **Google Cloud** Deep Learning VM. See [GCP Quickstart Guide](https://github.com/ultralytics/yolov5/wiki/GCP-Quickstart)
133
+ * **Amazon** Deep Learning AMI. See [AWS Quickstart Guide](https://github.com/ultralytics/yolov5/wiki/AWS-Quickstart)
134
+ * **Docker Image**. See [Docker Quickstart Guide](https://github.com/ultralytics/yolov5/wiki/Docker-Quickstart) [![Docker Pulls](https://camo.githubusercontent.com/280faedaf431e4c0c24fdb30ec00a66d627404e5c4c498210d3f014dd58c2c7e/68747470733a2f2f696d672e736869656c64732e696f2f646f636b65722f70756c6c732f756c7472616c79746963732f796f6c6f76353f6c6f676f3d646f636b6572)](https://hub.docker.com/r/ultralytics/yolov5)
135
+
136
+ ## Status
137
+ ![CI CPU testing](https://github.com/ultralytics/yolov5/workflows/CI%20CPU%20testing/badge.svg)
138
+
139
+ If this badge is green, all [YOLOv5 GitHub Actions](https://github.com/ultralytics/yolov5/actions) Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training ([train.py](https://github.com/ultralytics/yolov5/blob/master/train.py)), validation ([val.py](https://github.com/ultralytics/yolov5/blob/master/val.py)), inference ([detect.py](https://github.com/ultralytics/yolov5/blob/master/detect.py)) and export ([export.py](https://github.com/ultralytics/yolov5/blob/master/export.py)) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
140
+