ahsanMah commited on
Commit
133982f
·
1 Parent(s): c98586f

added training details to readme

Browse files
Files changed (1) hide show
  1. README.md +82 -4
README.md CHANGED
@@ -42,16 +42,94 @@ The model will work without a GPU but may take 15-30 seconds given your resource
42
 
43
  ## Usage
44
  ```bash
45
- python app.py
46
  ```
47
  Then go to [http://localhost:7860](localhost:7860)
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ### Notes
51
- - As the underlying models are trained on Imagenet-1k, you may have better success when the subject belongs to one of the 1000 classes it was trianed on
52
- - It helps to have the subject centered in the middle of the image. This is due to the resize-center-crop that was used to train the score models
 
 
 
 
 
 
 
 
 
53
 
54
 
55
- ### Acknowledgements
56
 
57
  Special thanks to NVLabs for their [EDM2 repository](https://github.com/NVlabs/edm2).
 
42
 
43
  ## Usage
44
  ```bash
45
+ DNNLIB_CACHE_DIR=/tmp/ python app.py
46
  ```
47
  Then go to [http://localhost:7860](localhost:7860)
48
 
49
+ Note that running this app will download a pickle of the EDM models by NVLabs available [here](https://nvlabs-fi-cdn.nvidia.com/edm2/posthoc-reconstructions/). The models will be saved in the `DNNLIB_CACHE_DIR` directory you specify, defaulting to `/tmp/` if not.
50
+
51
+ ## Best Practices and Usage Tips
52
+
53
+ To get the best results from our anomaly localization model, consider the following recommendations:
54
+
55
+ - **Image Content**: As the underlying models are trained on ImageNet-1k, you may have better success when the subject belongs to one of the 1000 classes it was trained on.
56
+
57
+ - **Subject Positioning**: It helps to have the subject centered in the middle of the image. This is due to the resize-center-crop that was used to train the score models.
58
+
59
+ - **Image Aspect**: For optimal performance, try to use square asopect ratios. In my testing it does not matter as much as the subject positioning.
60
+
61
+ - **Model Selection**: Choose the small, medium, large model preset (e.g., `edm2-img64-s-fid`) based on your available computational resources:
62
+ - **Fine-tuning**: For domain-specific applications, consider tuning the model on a dataset more closely related to your target domain using the information below
63
+
64
+ ## Training the Model
65
+
66
+ This section outlines the steps to train the model locally. Make sure you have the required dependencies installed and a GPU with CUDA support for optimal performance.
67
+
68
+ ### 1. Prepare the Dataset
69
+
70
+ First, download a dataset such as Imagenette and unzip it into a folder. Then, prepare the dataset using the `dataset_tool.py` script:
71
+
72
+ ```bash
73
+ python dataset_tool.py convert --source=/path/to/imagenette/ --dest=/path/to/output/img64/ --resolution=64x64 --transform=center-crop-dhariwal
74
+ ```
75
+
76
+ This command will process the images, resizing them to 64x64 pixels and applying the center-crop-dhariwal transform that was used to train the backbone score-based diffusion model
77
+
78
+ ### 2. Train the Flow Model
79
+ To train the flow model, use the following command:
80
+ ```bash
81
+ DNNLIB_CACHE_DIR=/path/to/model/cache CUDA_VISIBLE_DEVICES=0 python msma.py train-flow --outdir models/ --dataset_path /path/to/prepared/dataset --preset edm2-img64-s-fid --num_flows 8 --epochs 20 --batch_size 32
82
+ ```
83
+ Options:
84
+
85
+ - --outdir: Directory to save the trained model and logs
86
+ - --dataset_path: Path to the prepared dataset
87
+ - --preset: Configuration preset (default: "edm2-img64-s-fid")
88
+ - --num_flows: Number of normalizing flow functions in the PatchFlow model (default: 4)
89
+ - --epochs: Number of training epochs (default: 10)
90
+ - --batch_size: batch size for training (default: 128)
91
+
92
+ Note that the evaluation step will use twice the number of batches as training
93
+
94
+ ### 3. Cache Score Norms
95
+ While the flow model is training (or after, if GPU resources are limited), cache the score norms:
96
+ ```bash
97
+ DNNLIB_CACHE_DIR=/path/to/model/cache CUDA_VISIBLE_DEVICES=1 python msma.py cache-scores --outdir models/ --dataset_path /path/to/prepared/dataset --preset edm2-img64-s-fid --batch_size=128
98
+ ```
99
+ Options:
100
+ - --outdir: Directory to save the cached scores
101
+ - --dataset_path: Path to the prepared dataset
102
+ - --preset: Configuration preset (should match the flow training preset)
103
+ - --batch_size: Number of samples per batch (default: 64)
104
+
105
+
106
+ ### 4. Train the Gaussian Mixture Model (GMM)
107
+ Finally, train the GMM using the cached score norms:
108
+
109
+ ```bash
110
+ DNNLIB_CACHE_DIR=/path/to/model/cache python msma.py train-gmm --outdir models/ --preset edm2-img64-s-fid
111
+ ```
112
+
113
+ Options:
114
+
115
+ - --outdir: Directory to load the cached scores and save the trained GMM (should match the previous step)
116
+ - --preset: Configuration preset (should match previous steps)
117
+ - --gridsearch: (Optional) Use grid search to find the best number of components otherwise 7 are used (default: False)
118
 
119
  ### Notes
120
+
121
+ - Adjust the paths in the commands according to your directory structure.
122
+ - The `DNNLIB_CACHE_DIR` environment variable sets the cache directory for pre-trained models.
123
+ - `CUDA_VISIBLE_DEVICES` allows you to specify which GPU to use for training.
124
+ - You can run the flow training and score norm caching concurrently if you have multiple GPUs available.
125
+ - The `--preset` option should be consistent across all steps to ensure compatibility.
126
+
127
+ For more detailed information on each command and its options, refer to the script documentation or run python msma.py [command] --help.
128
+
129
+
130
+
131
 
132
 
133
+ ## Acknowledgements
134
 
135
  Special thanks to NVLabs for their [EDM2 repository](https://github.com/NVlabs/edm2).