crrrr30 commited on
Commit
da716ed
·
1 Parent(s): 603d25e

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitignore +1 -0
  2. .ipynb_checkpoints/0,0,0-checkpoint.png +0 -0
  3. .ipynb_checkpoints/tmp-checkpoint.jpg +0 -0
  4. 0,0,0.png +0 -0
  5. CONTRIBUTING.md +106 -0
  6. LICENSE +201 -0
  7. MANIFEST.in +3 -0
  8. README.md +781 -8
  9. avg_checkpoints.py +152 -0
  10. benchmark.py +696 -0
  11. bulk_runner.py +184 -0
  12. clean_checkpoint.py +115 -0
  13. convert/convert_from_mxnet.py +107 -0
  14. convert/convert_nest_flax.py +109 -0
  15. demo.py +120 -0
  16. distributed_train.sh +5 -0
  17. docs/archived_changes.md +406 -0
  18. docs/changes.md +314 -0
  19. docs/feature_extraction.md +174 -0
  20. docs/index.md +80 -0
  21. docs/javascripts/tables.js +6 -0
  22. docs/models.md +171 -0
  23. docs/models/.pages +1 -0
  24. docs/models/.templates/code_snippets.md +62 -0
  25. docs/models/.templates/generate_readmes.py +64 -0
  26. docs/models/.templates/models/adversarial-inception-v3.md +98 -0
  27. docs/models/.templates/models/advprop.md +457 -0
  28. docs/models/.templates/models/big-transfer.md +295 -0
  29. docs/models/.templates/models/csp-darknet.md +81 -0
  30. docs/models/.templates/models/csp-resnet.md +76 -0
  31. docs/models/.templates/models/csp-resnext.md +77 -0
  32. docs/models/.templates/models/densenet.md +305 -0
  33. docs/models/.templates/models/dla.md +545 -0
  34. docs/models/.templates/models/dpn.md +256 -0
  35. docs/models/.templates/models/ecaresnet.md +236 -0
  36. docs/models/.templates/models/efficientnet-pruned.md +145 -0
  37. docs/models/.templates/models/efficientnet.md +325 -0
  38. docs/models/.templates/models/ensemble-adversarial.md +98 -0
  39. docs/models/.templates/models/ese-vovnet.md +92 -0
  40. docs/models/.templates/models/fbnet.md +76 -0
  41. docs/models/.templates/models/gloun-inception-v3.md +78 -0
  42. docs/models/.templates/models/gloun-resnet.md +504 -0
  43. docs/models/.templates/models/gloun-resnext.md +142 -0
  44. docs/models/.templates/models/gloun-senet.md +63 -0
  45. docs/models/.templates/models/gloun-seresnext.md +136 -0
  46. docs/models/.templates/models/gloun-xception.md +66 -0
  47. docs/models/.templates/models/hrnet.md +358 -0
  48. docs/models/.templates/models/ig-resnext.md +209 -0
  49. docs/models/.templates/models/inception-resnet-v2.md +72 -0
  50. docs/models/.templates/models/inception-v3.md +85 -0
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ **/__pycache__/
.ipynb_checkpoints/0,0,0-checkpoint.png ADDED
.ipynb_checkpoints/tmp-checkpoint.jpg ADDED
0,0,0.png ADDED
CONTRIBUTING.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *This guideline is very much a work-in-progress.*
2
+
3
+ Contriubtions to `timm` for code, documentation, tests are more than welcome!
4
+
5
+ There haven't been any formal guidelines to date so please bear with me, and feel free to add to this guide.
6
+
7
+ # Coding style
8
+
9
+ Code linting and auto-format (black) are not currently in place but open to consideration. In the meantime, the style to follow is (mostly) aligned with Google's guide: https://google.github.io/styleguide/pyguide.html.
10
+
11
+ A few specific differences from Google style (or black)
12
+ 1. Line length is 120 char. Going over is okay in some cases (e.g. I prefer not to break URL across lines).
13
+ 2. Hanging indents are always prefered, please avoid aligning arguments with closing brackets or braces.
14
+
15
+ Example, from Google guide, but this is a NO here:
16
+ ```
17
+ # Aligned with opening delimiter.
18
+ foo = long_function_name(var_one, var_two,
19
+ var_three, var_four)
20
+ meal = (spam,
21
+ beans)
22
+
23
+ # Aligned with opening delimiter in a dictionary.
24
+ foo = {
25
+ 'long_dictionary_key': value1 +
26
+ value2,
27
+ ...
28
+ }
29
+ ```
30
+ This is YES:
31
+
32
+ ```
33
+ # 4-space hanging indent; nothing on first line,
34
+ # closing parenthesis on a new line.
35
+ foo = long_function_name(
36
+ var_one, var_two, var_three,
37
+ var_four
38
+ )
39
+ meal = (
40
+ spam,
41
+ beans,
42
+ )
43
+
44
+ # 4-space hanging indent in a dictionary.
45
+ foo = {
46
+ 'long_dictionary_key':
47
+ long_dictionary_value,
48
+ ...
49
+ }
50
+ ```
51
+
52
+ When there is descrepancy in a given source file (there are many origins for various bits of code and not all have been updated to what I consider current goal), please follow the style in a given file.
53
+
54
+ In general, if you add new code, formatting it with black using the following options should result in a style that is compatible with the rest of the code base:
55
+
56
+ ```
57
+ black --skip-string-normalization --line-length 120 <path-to-file>
58
+ ```
59
+
60
+ Avoid formatting code that is unrelated to your PR though.
61
+
62
+ PR with pure formatting / style fixes will be accepted but only in isolation from functional changes, best to ask before starting such a change.
63
+
64
+ # Documentation
65
+
66
+ As with code style, docstrings style based on the Google guide: guide: https://google.github.io/styleguide/pyguide.html
67
+
68
+ The goal for the code is to eventually move to have all major functions and `__init__` methods use PEP484 type annotations.
69
+
70
+ When type annotations are used for a function, as per the Google pyguide, they should **NOT** be duplicated in the docstrings, please leave annotations as the one source of truth re typing.
71
+
72
+ There are a LOT of gaps in current documentation relative to the functionality in timm, please, document away!
73
+
74
+ # Installation
75
+
76
+ Create a Python virtual environment using Python 3.10. Inside the environment, install torch` and `torchvision` using the instructions matching your system as listed on the [PyTorch website](https://pytorch.org/).
77
+
78
+ Then install the remaining dependencies:
79
+
80
+ ```
81
+ python -m pip install -r requirements.txt
82
+ python -m pip install -r requirements-dev.txt # for testing
83
+ python -m pip install -e .
84
+ ```
85
+
86
+ ## Unit tests
87
+
88
+ Run the tests using:
89
+
90
+ ```
91
+ pytest tests/
92
+ ```
93
+
94
+ Since the whole test suite takes a lot of time to run locally (a few hours), you may want to select a subset of tests relating to the changes you made by using the `-k` option of [`pytest`](https://docs.pytest.org/en/7.1.x/example/markers.html#using-k-expr-to-select-tests-based-on-their-name). Moreover, running tests in parallel (in this example 4 processes) with the `-n` option may help:
95
+
96
+ ```
97
+ pytest -k "substring-to-match" -n 4 tests/
98
+ ```
99
+
100
+ ## Building documentation
101
+
102
+ Please refer to [this document](https://github.com/huggingface/pytorch-image-models/tree/main/hfdocs).
103
+
104
+ # Questions
105
+
106
+ If you have any questions about contribution, where / how to contribute, please ask in the [Discussions](https://github.com/huggingface/pytorch-image-models/discussions/categories/contributing) (there is a `Contributing` topic).
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "{}"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright 2019 Ross Wightman
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
MANIFEST.in ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ include timm/models/_pruned/*.txt
2
+ include timm/data/_info/*.txt
3
+ include timm/data/_info/*.json
README.md CHANGED
@@ -1,12 +1,785 @@
1
  ---
2
- title: Cs Mixer
3
- emoji: 📉
4
- colorFrom: red
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 3.38.0
8
- app_file: app.py
9
- pinned: false
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: cs-mixer
3
+ app_file: demo.py
 
 
4
  sdk: gradio
5
+ sdk_version: 3.37.0
 
 
6
  ---
7
+ # PyTorch Image Models
8
+ - [Sponsors](#sponsors)
9
+ - [What's New](#whats-new)
10
+ - [Introduction](#introduction)
11
+ - [Models](#models)
12
+ - [Features](#features)
13
+ - [Results](#results)
14
+ - [Getting Started (Documentation)](#getting-started-documentation)
15
+ - [Train, Validation, Inference Scripts](#train-validation-inference-scripts)
16
+ - [Awesome PyTorch Resources](#awesome-pytorch-resources)
17
+ - [Licenses](#licenses)
18
+ - [Citing](#citing)
19
 
20
+ ## Sponsors
21
+
22
+ Thanks to the following for hardware support:
23
+ * TPU Research Cloud (TRC) (https://sites.research.google/trc/about/)
24
+ * Nvidia (https://www.nvidia.com/en-us/)
25
+
26
+ And a big thanks to all GitHub sponsors who helped with some of my costs before I joined Hugging Face.
27
+
28
+ ## What's New
29
+
30
+ ❗Updates after Oct 10, 2022 are available in version >= 0.9❗
31
+ * Many changes since the last 0.6.x stable releases. They were previewed in 0.8.x dev releases but not everyone transitioned.
32
+ * `timm.models.layers` moved to `timm.layers`:
33
+ * `from timm.models.layers import name` will still work via deprecation mapping (but please transition to `timm.layers`).
34
+ * `import timm.models.layers.module` or `from timm.models.layers.module import name` needs to be changed now.
35
+ * Builder, helper, non-model modules in `timm.models` have a `_` prefix added, ie `timm.models.helpers` -> `timm.models._helpers`, there are temporary deprecation mapping files but those will be removed.
36
+ * All models now support `architecture.pretrained_tag` naming (ex `resnet50.rsb_a1`).
37
+ * The pretrained_tag is the specific weight variant (different head) for the architecture.
38
+ * Using only `architecture` defaults to the first weights in the default_cfgs for that model architecture.
39
+ * In adding pretrained tags, many model names that existed to differentiate were renamed to use the tag (ex: `vit_base_patch16_224_in21k` -> `vit_base_patch16_224.augreg_in21k`). There are deprecation mappings for these.
40
+ * A number of models had their checkpoints remaped to match architecture changes needed to better support `features_only=True`, there are `checkpoint_filter_fn` methods in any model module that was remapped. These can be passed to `timm.models.load_checkpoint(..., filter_fn=timm.models.swin_transformer_v2.checkpoint_filter_fn)` to remap your existing checkpoint.
41
+ * The Hugging Face Hub (https://huggingface.co/timm) is now the primary source for `timm` weights. Model cards include link to papers, original source, license.
42
+ * Previous 0.6.x can be cloned from [0.6.x](https://github.com/rwightman/pytorch-image-models/tree/0.6.x) branch or installed via pip with version.
43
+
44
+ ### May 11, 2023
45
+ * `timm` 0.9 released, transition from 0.8.xdev releases
46
+
47
+ ### May 10, 2023
48
+ * Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in `timm`
49
+ * DINOv2 vit feature backbone weights added thanks to [Leng Yue](https://github.com/leng-yue)
50
+ * FB MAE vit feature backbone weights added
51
+ * OpenCLIP DataComp-XL L/14 feat backbone weights added
52
+ * MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by [Fredo Guan](https://github.com/fffffgggg54)
53
+ * Experimental `get_intermediate_layers` function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
54
+ * Model creation throws error if `pretrained=True` and no weights exist (instead of continuing with random initialization)
55
+ * Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
56
+ * bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use `bnb` prefix, ie `bnbadam8bit`
57
+ * Misc cleanup and fixes
58
+ * Final testing before switching to a 0.9 and bringing `timm` out of pre-release state
59
+
60
+ ### April 27, 2023
61
+ * 97% of `timm` models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
62
+ * Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.
63
+
64
+ ### April 21, 2023
65
+ * Gradient accumulation support added to train script and tested (`--grad-accum-steps`), thanks [Taeksang Kim](https://github.com/voidbag)
66
+ * More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
67
+ * Added `--head-init-scale` and `--head-init-bias` to train.py to scale classiifer head and set fixed bias for fine-tune
68
+ * Remove all InplaceABN (`inplace_abn`) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
69
+
70
+ ### April 12, 2023
71
+ * Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
72
+ * Refactor dropout args for vit and vit-like models, separate drop_rate into `drop_rate` (classifier dropout), `proj_drop_rate` (block mlp / out projections), `pos_drop_rate` (position embedding drop), `attn_drop_rate` (attention dropout). Also add patch dropout (FLIP) to vit and eva models.
73
+ * fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
74
+ * Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
75
+
76
+ ### April 5, 2023
77
+ * ALL ResNet models pushed to Hugging Face Hub with multi-weight support
78
+ * All past `timm` trained weights added with recipe based tags to differentiate
79
+ * All ResNet strikes back A1/A2/A3 (seed 0) and R50 example B/C1/C2/D weights available
80
+ * Add torchvision v2 recipe weights to existing torchvision originals
81
+ * See comparison table in https://huggingface.co/timm/seresnextaa101d_32x8d.sw_in12k_ft_in1k_288#model-comparison
82
+ * New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
83
+ * `resnetaa50d.sw_in12k_ft_in1k` - 81.7 @ 224, 82.6 @ 288
84
+ * `resnetaa101d.sw_in12k_ft_in1k` - 83.5 @ 224, 84.1 @ 288
85
+ * `seresnextaa101d_32x8d.sw_in12k_ft_in1k` - 86.0 @ 224, 86.5 @ 288
86
+ * `seresnextaa101d_32x8d.sw_in12k_ft_in1k_288` - 86.5 @ 288, 86.7 @ 320
87
+
88
+ ### March 31, 2023
89
+ * Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
90
+
91
+ | model |top1 |top5 |img_size|param_count|gmacs |macts |
92
+ |----------------------------------------------------------------------------------------------------------------------|------|------|--------|-----------|------|------|
93
+ | [convnext_xxlarge.clip_laion2b_soup_ft_in1k](https://huggingface.co/timm/convnext_xxlarge.clip_laion2b_soup_ft_in1k) |88.612|98.704|256 |846.47 |198.09|124.45|
94
+ | convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 |88.312|98.578|384 |200.13 |101.11|126.74|
95
+ | convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 |87.968|98.47 |320 |200.13 |70.21 |88.02 |
96
+ | convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 |87.138|98.212|384 |88.59 |45.21 |84.49 |
97
+ | convnext_base.clip_laion2b_augreg_ft_in12k_in1k |86.344|97.97 |256 |88.59 |20.09 |37.55 |
98
+
99
+ * Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
100
+
101
+ | model |top1 |top5 |param_count|img_size|
102
+ |----------------------------------------------------|------|------|-----------|--------|
103
+ | [eva02_large_patch14_448.mim_m38m_ft_in22k_in1k](https://huggingface.co/timm/eva02_large_patch14_448.mim_m38m_ft_in1k) |90.054|99.042|305.08 |448 |
104
+ | eva02_large_patch14_448.mim_in22k_ft_in22k_in1k |89.946|99.01 |305.08 |448 |
105
+ | eva_giant_patch14_560.m30m_ft_in22k_in1k |89.792|98.992|1014.45 |560 |
106
+ | eva02_large_patch14_448.mim_in22k_ft_in1k |89.626|98.954|305.08 |448 |
107
+ | eva02_large_patch14_448.mim_m38m_ft_in1k |89.57 |98.918|305.08 |448 |
108
+ | eva_giant_patch14_336.m30m_ft_in22k_in1k |89.56 |98.956|1013.01 |336 |
109
+ | eva_giant_patch14_336.clip_ft_in1k |89.466|98.82 |1013.01 |336 |
110
+ | eva_large_patch14_336.in22k_ft_in22k_in1k |89.214|98.854|304.53 |336 |
111
+ | eva_giant_patch14_224.clip_ft_in1k |88.882|98.678|1012.56 |224 |
112
+ | eva02_base_patch14_448.mim_in22k_ft_in22k_in1k |88.692|98.722|87.12 |448 |
113
+ | eva_large_patch14_336.in22k_ft_in1k |88.652|98.722|304.53 |336 |
114
+ | eva_large_patch14_196.in22k_ft_in22k_in1k |88.592|98.656|304.14 |196 |
115
+ | eva02_base_patch14_448.mim_in22k_ft_in1k |88.23 |98.564|87.12 |448 |
116
+ | eva_large_patch14_196.in22k_ft_in1k |87.934|98.504|304.14 |196 |
117
+ | eva02_small_patch14_336.mim_in22k_ft_in1k |85.74 |97.614|22.13 |336 |
118
+ | eva02_tiny_patch14_336.mim_in22k_ft_in1k |80.658|95.524|5.76 |336 |
119
+
120
+ * Multi-weight and HF hub for DeiT and MLP-Mixer based models
121
+
122
+ ### March 22, 2023
123
+ * More weights pushed to HF hub along with multi-weight support, including: `regnet.py`, `rexnet.py`, `byobnet.py`, `resnetv2.py`, `swin_transformer.py`, `swin_transformer_v2.py`, `swin_transformer_v2_cr.py`
124
+ * Swin Transformer models support feature extraction (NCHW feat maps for `swinv2_cr_*`, and NHWC for all others) and spatial embedding outputs.
125
+ * FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
126
+ * RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
127
+ * More ImageNet-12k pretrained and 1k fine-tuned `timm` weights:
128
+ * `rexnetr_200.sw_in12k_ft_in1k` - 82.6 @ 224, 83.2 @ 288
129
+ * `rexnetr_300.sw_in12k_ft_in1k` - 84.0 @ 224, 84.5 @ 288
130
+ * `regnety_120.sw_in12k_ft_in1k` - 85.0 @ 224, 85.4 @ 288
131
+ * `regnety_160.lion_in12k_ft_in1k` - 85.6 @ 224, 86.0 @ 288
132
+ * `regnety_160.sw_in12k_ft_in1k` - 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
133
+ * Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
134
+ * Minor bug fixes and improvements.
135
+
136
+ ### Feb 26, 2023
137
+ * Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see [model card](https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup)
138
+ * Update `convnext_xxlarge` default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
139
+ * 0.8.15dev0
140
+
141
+ ### Feb 20, 2023
142
+ * Add 320x320 `convnext_large_mlp.clip_laion2b_ft_320` and `convnext_lage_mlp.clip_laion2b_ft_soup_320` CLIP image tower weights for features & fine-tune
143
+ * 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
144
+
145
+ ### Feb 16, 2023
146
+ * `safetensor` checkpoint support added
147
+ * Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
148
+ * Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to `vit_*`, `vit_relpos*`, `coatnet` / `maxxvit` (to start)
149
+ * Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
150
+ * gradient checkpointing works with `features_only=True`
151
+
152
+ ### Feb 7, 2023
153
+ * New inference benchmark numbers added in [results](results/) folder.
154
+ * Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
155
+ * `convnext_base.clip_laion2b_augreg_ft_in1k` - 86.2% @ 256x256
156
+ * `convnext_base.clip_laiona_augreg_ft_in1k_384` - 86.5% @ 384x384
157
+ * `convnext_large_mlp.clip_laion2b_augreg_ft_in1k` - 87.3% @ 256x256
158
+ * `convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384` - 87.9% @ 384x384
159
+ * Add DaViT models. Supports `features_only=True`. Adapted from https://github.com/dingmyu/davit by [Fredo](https://github.com/fffffgggg54).
160
+ * Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
161
+ * Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
162
+ * New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports `features_only=True`.
163
+ * Minor updates to EfficientFormer.
164
+ * Refactor LeViT models to stages, add `features_only=True` support to new `conv` variants, weight remap required.
165
+ * Move ImageNet meta-data (synsets, indices) from `/results` to [`timm/data/_info`](timm/data/_info/).
166
+ * Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in `timm`
167
+ * Update `inference.py` to use, try: `python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5`
168
+ * Ready for 0.8.10 pypi pre-release (final testing).
169
+
170
+ ### Jan 20, 2023
171
+ * Add two convnext 12k -> 1k fine-tunes at 384x384
172
+ * `convnext_tiny.in12k_ft_in1k_384` - 85.1 @ 384
173
+ * `convnext_small.in12k_ft_in1k_384` - 86.2 @ 384
174
+
175
+ * Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for `rw` base MaxViT and CoAtNet 1/2 models
176
+
177
+ |model |top1 |top5 |samples / sec |Params (M) |GMAC |Act (M)|
178
+ |------------------------------------------------------------------------------------------------------------------------|----:|----:|--------------:|--------------:|-----:|------:|
179
+ |[maxvit_xlarge_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k) |88.53|98.64| 21.76| 475.77|534.14|1413.22|
180
+ |[maxvit_xlarge_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k) |88.32|98.54| 42.53| 475.32|292.78| 668.76|
181
+ |[maxvit_base_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k) |88.20|98.53| 50.87| 119.88|138.02| 703.99|
182
+ |[maxvit_large_tf_512.in21k_ft_in1k](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k) |88.04|98.40| 36.42| 212.33|244.75| 942.15|
183
+ |[maxvit_large_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k) |87.98|98.56| 71.75| 212.03|132.55| 445.84|
184
+ |[maxvit_base_tf_384.in21k_ft_in1k](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k) |87.92|98.54| 104.71| 119.65| 73.80| 332.90|
185
+ |[maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/maxvit_rmlp_base_rw_384.sw_in12k_ft_in1k) |87.81|98.37| 106.55| 116.14| 70.97| 318.95|
186
+ |[maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k) |87.47|98.37| 149.49| 116.09| 72.98| 213.74|
187
+ |[coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k) |87.39|98.31| 160.80| 73.88| 47.69| 209.43|
188
+ |[maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/maxvit_rmlp_base_rw_224.sw_in12k_ft_in1k) |86.89|98.02| 375.86| 116.14| 23.15| 92.64|
189
+ |[maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/maxxvitv2_rmlp_base_rw_224.sw_in12k_ft_in1k) |86.64|98.02| 501.03| 116.09| 24.20| 62.77|
190
+ |[maxvit_base_tf_512.in1k](https://huggingface.co/timm/maxvit_base_tf_512.in1k) |86.60|97.92| 50.75| 119.88|138.02| 703.99|
191
+ |[coatnet_2_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_2_rw_224.sw_in12k_ft_in1k) |86.57|97.89| 631.88| 73.87| 15.09| 49.22|
192
+ |[maxvit_large_tf_512.in1k](https://huggingface.co/timm/maxvit_large_tf_512.in1k) |86.52|97.88| 36.04| 212.33|244.75| 942.15|
193
+ |[coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_224.sw_in12k_ft_in1k) |86.49|97.90| 620.58| 73.88| 15.18| 54.78|
194
+ |[maxvit_base_tf_384.in1k](https://huggingface.co/timm/maxvit_base_tf_384.in1k) |86.29|97.80| 101.09| 119.65| 73.80| 332.90|
195
+ |[maxvit_large_tf_384.in1k](https://huggingface.co/timm/maxvit_large_tf_384.in1k) |86.23|97.69| 70.56| 212.03|132.55| 445.84|
196
+ |[maxvit_small_tf_512.in1k](https://huggingface.co/timm/maxvit_small_tf_512.in1k) |86.10|97.76| 88.63| 69.13| 67.26| 383.77|
197
+ |[maxvit_tiny_tf_512.in1k](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k) |85.67|97.58| 144.25| 31.05| 33.49| 257.59|
198
+ |[maxvit_small_tf_384.in1k](https://huggingface.co/timm/maxvit_small_tf_384.in1k) |85.54|97.46| 188.35| 69.02| 35.87| 183.65|
199
+ |[maxvit_tiny_tf_384.in1k](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k) |85.11|97.38| 293.46| 30.98| 17.53| 123.42|
200
+ |[maxvit_large_tf_224.in1k](https://huggingface.co/timm/maxvit_large_tf_224.in1k) |84.93|96.97| 247.71| 211.79| 43.68| 127.35|
201
+ |[coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k](https://huggingface.co/timm/coatnet_rmlp_1_rw2_224.sw_in12k_ft_in1k) |84.90|96.96| 1025.45| 41.72| 8.11| 40.13|
202
+ |[maxvit_base_tf_224.in1k](https://huggingface.co/timm/maxvit_base_tf_224.in1k) |84.85|96.99| 358.25| 119.47| 24.04| 95.01|
203
+ |[maxxvit_rmlp_small_rw_256.sw_in1k](https://huggingface.co/timm/maxxvit_rmlp_small_rw_256.sw_in1k) |84.63|97.06| 575.53| 66.01| 14.67| 58.38|
204
+ |[coatnet_rmlp_2_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_2_rw_224.sw_in1k) |84.61|96.74| 625.81| 73.88| 15.18| 54.78|
205
+ |[maxvit_rmlp_small_rw_224.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_small_rw_224.sw_in1k) |84.49|96.76| 693.82| 64.90| 10.75| 49.30|
206
+ |[maxvit_small_tf_224.in1k](https://huggingface.co/timm/maxvit_small_tf_224.in1k) |84.43|96.83| 647.96| 68.93| 11.66| 53.17|
207
+ |[maxvit_rmlp_tiny_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_tiny_rw_256.sw_in1k) |84.23|96.78| 807.21| 29.15| 6.77| 46.92|
208
+ |[coatnet_1_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_1_rw_224.sw_in1k) |83.62|96.38| 989.59| 41.72| 8.04| 34.60|
209
+ |[maxvit_tiny_rw_224.sw_in1k](https://huggingface.co/timm/maxvit_tiny_rw_224.sw_in1k) |83.50|96.50| 1100.53| 29.06| 5.11| 33.11|
210
+ |[maxvit_tiny_tf_224.in1k](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k) |83.41|96.59| 1004.94| 30.92| 5.60| 35.78|
211
+ |[coatnet_rmlp_1_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_1_rw_224.sw_in1k) |83.36|96.45| 1093.03| 41.69| 7.85| 35.47|
212
+ |[maxxvitv2_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxxvitv2_nano_rw_256.sw_in1k) |83.11|96.33| 1276.88| 23.70| 6.26| 23.05|
213
+ |[maxxvit_rmlp_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxxvit_rmlp_nano_rw_256.sw_in1k) |83.03|96.34| 1341.24| 16.78| 4.37| 26.05|
214
+ |[maxvit_rmlp_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_nano_rw_256.sw_in1k) |82.96|96.26| 1283.24| 15.50| 4.47| 31.92|
215
+ |[maxvit_nano_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_nano_rw_256.sw_in1k) |82.93|96.23| 1218.17| 15.45| 4.46| 30.28|
216
+ |[coatnet_bn_0_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_bn_0_rw_224.sw_in1k) |82.39|96.19| 1600.14| 27.44| 4.67| 22.04|
217
+ |[coatnet_0_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_0_rw_224.sw_in1k) |82.39|95.84| 1831.21| 27.44| 4.43| 18.73|
218
+ |[coatnet_rmlp_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_rmlp_nano_rw_224.sw_in1k) |82.05|95.87| 2109.09| 15.15| 2.62| 20.34|
219
+ |[coatnext_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnext_nano_rw_224.sw_in1k) |81.95|95.92| 2525.52| 14.70| 2.47| 12.80|
220
+ |[coatnet_nano_rw_224.sw_in1k](https://huggingface.co/timm/coatnet_nano_rw_224.sw_in1k) |81.70|95.64| 2344.52| 15.14| 2.41| 15.41|
221
+ |[maxvit_rmlp_pico_rw_256.sw_in1k](https://huggingface.co/timm/maxvit_rmlp_pico_rw_256.sw_in1k) |80.53|95.21| 1594.71| 7.52| 1.85| 24.86|
222
+
223
+ ### Jan 11, 2023
224
+ * Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT `.in12k` tags)
225
+ * `convnext_nano.in12k_ft_in1k` - 82.3 @ 224, 82.9 @ 288 (previously released)
226
+ * `convnext_tiny.in12k_ft_in1k` - 84.2 @ 224, 84.5 @ 288
227
+ * `convnext_small.in12k_ft_in1k` - 85.2 @ 224, 85.3 @ 288
228
+
229
+ ### Jan 6, 2023
230
+ * Finally got around to adding `--model-kwargs` and `--opt-kwargs` to scripts to pass through rare args directly to model classes from cmd line
231
+ * `train.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu`
232
+ * `train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12`
233
+ * Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.
234
+
235
+ ### Jan 5, 2023
236
+ * ConvNeXt-V2 models and weights added to existing `convnext.py`
237
+ * Paper: [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](http://arxiv.org/abs/2301.00808)
238
+ * Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)
239
+
240
+ ### Dec 23, 2022 🎄☃
241
+ * Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
242
+ * NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
243
+ * Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
244
+ * More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
245
+ * More ImageNet-12k (subset of 22k) pretrain models popping up:
246
+ * `efficientnet_b5.in12k_ft_in1k` - 85.9 @ 448x448
247
+ * `vit_medium_patch16_gap_384.in12k_ft_in1k` - 85.5 @ 384x384
248
+ * `vit_medium_patch16_gap_256.in12k_ft_in1k` - 84.5 @ 256x256
249
+ * `convnext_nano.in12k_ft_in1k` - 82.9 @ 288x288
250
+
251
+ ### Dec 8, 2022
252
+ * Add 'EVA l' to `vision_transformer.py`, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
253
+ * original source: https://github.com/baaivision/EVA
254
+
255
+ | model | top1 | param_count | gmac | macts | hub |
256
+ |:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------|
257
+ | eva_large_patch14_336.in22k_ft_in22k_in1k | 89.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
258
+ | eva_large_patch14_336.in22k_ft_in1k | 88.7 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
259
+ | eva_large_patch14_196.in22k_ft_in22k_in1k | 88.6 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
260
+ | eva_large_patch14_196.in22k_ft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
261
+
262
+ ### Dec 6, 2022
263
+ * Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to `beit.py`.
264
+ * original source: https://github.com/baaivision/EVA
265
+ * paper: https://arxiv.org/abs/2211.07636
266
+
267
+ | model | top1 | param_count | gmac | macts | hub |
268
+ |:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------|
269
+ | eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | [link](https://huggingface.co/BAAI/EVA) |
270
+ | eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.6 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
271
+ | eva_giant_patch14_336.clip_ft_in1k | 89.4 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
272
+ | eva_giant_patch14_224.clip_ft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | [link](https://huggingface.co/BAAI/EVA) |
273
+
274
+ ### Dec 5, 2022
275
+
276
+ * Pre-release (`0.8.0dev0`) of multi-weight support (`model_arch.pretrained_tag`). Install with `pip install --pre timm`
277
+ * vision_transformer, maxvit, convnext are the first three model impl w/ support
278
+ * model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
279
+ * bugs are likely, but I need feedback so please try it out
280
+ * if stability is needed, please use 0.6.x pypi releases or clone from [0.6.x branch](https://github.com/rwightman/pytorch-image-models/tree/0.6.x)
281
+ * Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use `--torchcompile` argument
282
+ * Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
283
+ * Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
284
+
285
+ | model | top1 | param_count | gmac | macts | hub |
286
+ |:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------|
287
+ | vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k | 88.6 | 632.5 | 391 | 407.5 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k) |
288
+ | vit_large_patch14_clip_336.openai_ft_in12k_in1k | 88.3 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.openai_ft_in12k_in1k) |
289
+ | vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k | 88.2 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k) |
290
+ | vit_large_patch14_clip_336.laion2b_ft_in12k_in1k | 88.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k) |
291
+ | vit_large_patch14_clip_224.openai_ft_in12k_in1k | 88.2 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k) |
292
+ | vit_large_patch14_clip_224.laion2b_ft_in12k_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in12k_in1k) |
293
+ | vit_large_patch14_clip_224.openai_ft_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in1k) |
294
+ | vit_large_patch14_clip_336.laion2b_ft_in1k | 87.9 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in1k) |
295
+ | vit_huge_patch14_clip_224.laion2b_ft_in1k | 87.6 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in1k) |
296
+ | vit_large_patch14_clip_224.laion2b_ft_in1k | 87.3 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in1k) |
297
+ | vit_base_patch16_clip_384.laion2b_ft_in12k_in1k | 87.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k) |
298
+ | vit_base_patch16_clip_384.openai_ft_in12k_in1k | 87 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k) |
299
+ | vit_base_patch16_clip_384.laion2b_ft_in1k | 86.6 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k) |
300
+ | vit_base_patch16_clip_384.openai_ft_in1k | 86.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k) |
301
+ | vit_base_patch16_clip_224.laion2b_ft_in12k_in1k | 86.2 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k) |
302
+ | vit_base_patch16_clip_224.openai_ft_in12k_in1k | 85.9 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k) |
303
+ | vit_base_patch32_clip_448.laion2b_ft_in12k_in1k | 85.8 | 88.3 | 17.9 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k) |
304
+ | vit_base_patch16_clip_224.laion2b_ft_in1k | 85.5 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k) |
305
+ | vit_base_patch32_clip_384.laion2b_ft_in12k_in1k | 85.4 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k) |
306
+ | vit_base_patch16_clip_224.openai_ft_in1k | 85.3 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k) |
307
+ | vit_base_patch32_clip_384.openai_ft_in12k_in1k | 85.2 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k) |
308
+ | vit_base_patch32_clip_224.laion2b_ft_in12k_in1k | 83.3 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k) |
309
+ | vit_base_patch32_clip_224.laion2b_ft_in1k | 82.6 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k) |
310
+ | vit_base_patch32_clip_224.openai_ft_in1k | 81.9 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k) |
311
+
312
+ * Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
313
+ * There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
314
+
315
+ | model | top1 | param_count | gmac | macts | hub |
316
+ |:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------|
317
+ | maxvit_xlarge_tf_512.in21k_ft_in1k | 88.5 | 475.8 | 534.1 | 1413.2 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k) |
318
+ | maxvit_xlarge_tf_384.in21k_ft_in1k | 88.3 | 475.3 | 292.8 | 668.8 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k) |
319
+ | maxvit_base_tf_512.in21k_ft_in1k | 88.2 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k) |
320
+ | maxvit_large_tf_512.in21k_ft_in1k | 88 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k) |
321
+ | maxvit_large_tf_384.in21k_ft_in1k | 88 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k) |
322
+ | maxvit_base_tf_384.in21k_ft_in1k | 87.9 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k) |
323
+ | maxvit_base_tf_512.in1k | 86.6 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in1k) |
324
+ | maxvit_large_tf_512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in1k) |
325
+ | maxvit_base_tf_384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in1k) |
326
+ | maxvit_large_tf_384.in1k | 86.2 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in1k) |
327
+ | maxvit_small_tf_512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | [link](https://huggingface.co/timm/maxvit_small_tf_512.in1k) |
328
+ | maxvit_tiny_tf_512.in1k | 85.7 | 31 | 33.5 | 257.6 | [link](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k) |
329
+ | maxvit_small_tf_384.in1k | 85.5 | 69 | 35.9 | 183.6 | [link](https://huggingface.co/timm/maxvit_small_tf_384.in1k) |
330
+ | maxvit_tiny_tf_384.in1k | 85.1 | 31 | 17.5 | 123.4 | [link](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k) |
331
+ | maxvit_large_tf_224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | [link](https://huggingface.co/timm/maxvit_large_tf_224.in1k) |
332
+ | maxvit_base_tf_224.in1k | 84.9 | 119.5 | 24 | 95 | [link](https://huggingface.co/timm/maxvit_base_tf_224.in1k) |
333
+ | maxvit_small_tf_224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | [link](https://huggingface.co/timm/maxvit_small_tf_224.in1k) |
334
+ | maxvit_tiny_tf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | [link](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k) |
335
+
336
+ ### Oct 15, 2022
337
+ * Train and validation script enhancements
338
+ * Non-GPU (ie CPU) device support
339
+ * SLURM compatibility for train script
340
+ * HF datasets support (via ReaderHfds)
341
+ * TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
342
+ * in_chans !=3 support for scripts / loader
343
+ * Adan optimizer
344
+ * Can enable per-step LR scheduling via args
345
+ * Dataset 'parsers' renamed to 'readers', more descriptive of purpose
346
+ * AMP args changed, APEX via `--amp-impl apex`, bfloat16 supportedf via `--amp-dtype bfloat16`
347
+ * main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
348
+ * master -> main branch rename
349
+
350
+ ### Oct 10, 2022
351
+ * More weights in `maxxvit` series, incl first ConvNeXt block based `coatnext` and `maxxvit` experiments:
352
+ * `coatnext_nano_rw_224` - 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm)
353
+ * `maxxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)
354
+ * `maxvit_rmlp_small_rw_224` - 84.5 @ 224, 85.1 @ 320 (G)
355
+ * `maxxvit_rmlp_small_rw_256` - 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparams need tuning (uses ConvNeXt block, no BN)
356
+ * `coatnet_rmlp_2_rw_224` - 84.6 @ 224, 85 @ 320 (T)
357
+ * NOTE: official MaxVit weights (in1k) have been released at https://github.com/google-research/maxvit -- some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun.
358
+
359
+ ### Sept 23, 2022
360
+ * LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier)
361
+ * vit_base_patch32_224_clip_laion2b
362
+ * vit_large_patch14_224_clip_laion2b
363
+ * vit_huge_patch14_224_clip_laion2b
364
+ * vit_giant_patch14_224_clip_laion2b
365
+
366
+ ### Sept 7, 2022
367
+ * Hugging Face [`timm` docs](https://huggingface.co/docs/hub/timm) home now exists, look for more here in the future
368
+ * Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
369
+ * Add more weights in `maxxvit` series incl a `pico` (7.5M params, 1.9 GMACs), two `tiny` variants:
370
+ * `maxvit_rmlp_pico_rw_256` - 80.5 @ 256, 81.3 @ 320 (T)
371
+ * `maxvit_tiny_rw_224` - 83.5 @ 224 (G)
372
+ * `maxvit_rmlp_tiny_rw_256` - 84.2 @ 256, 84.8 @ 320 (T)
373
+
374
+ ### Aug 29, 2022
375
+ * MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
376
+ * `maxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.6 @ 320 (T)
377
+
378
+ ### Aug 26, 2022
379
+ * CoAtNet (https://arxiv.org/abs/2106.04803) and MaxVit (https://arxiv.org/abs/2204.01697) `timm` original models
380
+ * both found in [`maxxvit.py`](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/maxxvit.py) model def, contains numerous experiments outside scope of original papers
381
+ * an unfinished Tensorflow version from MaxVit authors can be found https://github.com/google-research/maxvit
382
+ * Initial CoAtNet and MaxVit timm pretrained weights (working on more):
383
+ * `coatnet_nano_rw_224` - 81.7 @ 224 (T)
384
+ * `coatnet_rmlp_nano_rw_224` - 82.0 @ 224, 82.8 @ 320 (T)
385
+ * `coatnet_0_rw_224` - 82.4 (T) -- NOTE timm '0' coatnets have 2 more 3rd stage blocks
386
+ * `coatnet_bn_0_rw_224` - 82.4 (T)
387
+ * `maxvit_nano_rw_256` - 82.9 @ 256 (T)
388
+ * `coatnet_rmlp_1_rw_224` - 83.4 @ 224, 84 @ 320 (T)
389
+ * `coatnet_1_rw_224` - 83.6 @ 224 (G)
390
+ * (T) = TPU trained with `bits_and_tpu` branch training code, (G) = GPU trained
391
+ * GCVit (weights adapted from https://github.com/NVlabs/GCVit, code 100% `timm` re-write for license purposes)
392
+ * MViT-V2 (multi-scale vit, adapted from https://github.com/facebookresearch/mvit)
393
+ * EfficientFormer (adapted from https://github.com/snap-research/EfficientFormer)
394
+ * PyramidVisionTransformer-V2 (adapted from https://github.com/whai362/PVT)
395
+ * 'Fast Norm' support for LayerNorm and GroupNorm that avoids float32 upcast w/ AMP (uses APEX LN if available for further boost)
396
+
397
+
398
+ ### Aug 15, 2022
399
+ * ConvNeXt atto weights added
400
+ * `convnext_atto` - 75.7 @ 224, 77.0 @ 288
401
+ * `convnext_atto_ols` - 75.9 @ 224, 77.2 @ 288
402
+
403
+ ### Aug 5, 2022
404
+ * More custom ConvNeXt smaller model defs with weights
405
+ * `convnext_femto` - 77.5 @ 224, 78.7 @ 288
406
+ * `convnext_femto_ols` - 77.9 @ 224, 78.9 @ 288
407
+ * `convnext_pico` - 79.5 @ 224, 80.4 @ 288
408
+ * `convnext_pico_ols` - 79.5 @ 224, 80.5 @ 288
409
+ * `convnext_nano_ols` - 80.9 @ 224, 81.6 @ 288
410
+ * Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)
411
+
412
+ ### July 28, 2022
413
+ * Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks [Hugo Touvron](https://github.com/TouvronHugo)!
414
+
415
+ ### July 27, 2022
416
+ * All runtime benchmark and validation result csv files are finally up-to-date!
417
+ * A few more weights & model defs added:
418
+ * `darknetaa53` - 79.8 @ 256, 80.5 @ 288
419
+ * `convnext_nano` - 80.8 @ 224, 81.5 @ 288
420
+ * `cs3sedarknet_l` - 81.2 @ 256, 81.8 @ 288
421
+ * `cs3darknet_x` - 81.8 @ 256, 82.2 @ 288
422
+ * `cs3sedarknet_x` - 82.2 @ 256, 82.7 @ 288
423
+ * `cs3edgenet_x` - 82.2 @ 256, 82.7 @ 288
424
+ * `cs3se_edgenet_x` - 82.8 @ 256, 83.5 @ 320
425
+ * `cs3*` weights above all trained on TPU w/ `bits_and_tpu` branch. Thanks to TRC program!
426
+ * Add output_stride=8 and 16 support to ConvNeXt (dilation)
427
+ * deit3 models not being able to resize pos_emb fixed
428
+ * Version 0.6.7 PyPi release (/w above bug fixes and new weighs since 0.6.5)
429
+
430
+ ### July 8, 2022
431
+ More models, more fixes
432
+ * Official research models (w/ weights) added:
433
+ * EdgeNeXt from (https://github.com/mmaaz60/EdgeNeXt)
434
+ * MobileViT-V2 from (https://github.com/apple/ml-cvnets)
435
+ * DeiT III (Revenge of the ViT) from (https://github.com/facebookresearch/deit)
436
+ * My own models:
437
+ * Small `ResNet` defs added by request with 1 block repeats for both basic and bottleneck (resnet10 and resnet14)
438
+ * `CspNet` refactored with dataclass config, simplified CrossStage3 (`cs3`) option. These are closer to YOLO-v5+ backbone defs.
439
+ * More relative position vit fiddling. Two `srelpos` (shared relative position) models trained, and a medium w/ class token.
440
+ * Add an alternate downsample mode to EdgeNeXt and train a `small` model. Better than original small, but not their new USI trained weights.
441
+ * My own model weight results (all ImageNet-1k training)
442
+ * `resnet10t` - 66.5 @ 176, 68.3 @ 224
443
+ * `resnet14t` - 71.3 @ 176, 72.3 @ 224
444
+ * `resnetaa50` - 80.6 @ 224 , 81.6 @ 288
445
+ * `darknet53` - 80.0 @ 256, 80.5 @ 288
446
+ * `cs3darknet_m` - 77.0 @ 256, 77.6 @ 288
447
+ * `cs3darknet_focus_m` - 76.7 @ 256, 77.3 @ 288
448
+ * `cs3darknet_l` - 80.4 @ 256, 80.9 @ 288
449
+ * `cs3darknet_focus_l` - 80.3 @ 256, 80.9 @ 288
450
+ * `vit_srelpos_small_patch16_224` - 81.1 @ 224, 82.1 @ 320
451
+ * `vit_srelpos_medium_patch16_224` - 82.3 @ 224, 83.1 @ 320
452
+ * `vit_relpos_small_patch16_cls_224` - 82.6 @ 224, 83.6 @ 320
453
+ * `edgnext_small_rw` - 79.6 @ 224, 80.4 @ 320
454
+ * `cs3`, `darknet`, and `vit_*relpos` weights above all trained on TPU thanks to TRC program! Rest trained on overheating GPUs.
455
+ * Hugging Face Hub support fixes verified, demo notebook TBA
456
+ * Pretrained weights / configs can be loaded externally (ie from local disk) w/ support for head adaptation.
457
+ * Add support to change image extensions scanned by `timm` datasets/readers. See (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
458
+ * Default ConvNeXt LayerNorm impl to use `F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2)` via `LayerNorm2d` in all cases.
459
+ * a bit slower than previous custom impl on some hardware (ie Ampere w/ CL), but overall fewer regressions across wider HW / PyTorch version ranges.
460
+ * previous impl exists as `LayerNormExp2d` in `models/layers/norm.py`
461
+ * Numerous bug fixes
462
+ * Currently testing for imminent PyPi 0.6.x release
463
+ * LeViT pretraining of larger models still a WIP, they don't train well / easily without distillation. Time to add distill support (finally)?
464
+ * ImageNet-22k weight training + finetune ongoing, work on multi-weight support (slowly) chugging along (there are a LOT of weights, sigh) ...
465
+
466
+ ### May 13, 2022
467
+ * Official Swin-V2 models and weights added from (https://github.com/microsoft/Swin-Transformer). Cleaned up to support torchscript.
468
+ * Some refactoring for existing `timm` Swin-V2-CR impl, will likely do a bit more to bring parts closer to official and decide whether to merge some aspects.
469
+ * More Vision Transformer relative position / residual post-norm experiments (all trained on TPU thanks to TRC program)
470
+ * `vit_relpos_small_patch16_224` - 81.5 @ 224, 82.5 @ 320 -- rel pos, layer scale, no class token, avg pool
471
+ * `vit_relpos_medium_patch16_rpn_224` - 82.3 @ 224, 83.1 @ 320 -- rel pos + res-post-norm, no class token, avg pool
472
+ * `vit_relpos_medium_patch16_224` - 82.5 @ 224, 83.3 @ 320 -- rel pos, layer scale, no class token, avg pool
473
+ * `vit_relpos_base_patch16_gapcls_224` - 82.8 @ 224, 83.9 @ 320 -- rel pos, layer scale, class token, avg pool (by mistake)
474
+ * Bring 512 dim, 8-head 'medium' ViT model variant back to life (after using in a pre DeiT 'small' model for first ViT impl back in 2020)
475
+ * Add ViT relative position support for switching btw existing impl and some additions in official Swin-V2 impl for future trials
476
+ * Sequencer2D impl (https://arxiv.org/abs/2205.01972), added via PR from author (https://github.com/okojoalg)
477
+
478
+ ### May 2, 2022
479
+ * Vision Transformer experiments adding Relative Position (Swin-V2 log-coord) (`vision_transformer_relpos.py`) and Residual Post-Norm branches (from Swin-V2) (`vision_transformer*.py`)
480
+ * `vit_relpos_base_patch32_plus_rpn_256` - 79.5 @ 256, 80.6 @ 320 -- rel pos + extended width + res-post-norm, no class token, avg pool
481
+ * `vit_relpos_base_patch16_224` - 82.5 @ 224, 83.6 @ 320 -- rel pos, layer scale, no class token, avg pool
482
+ * `vit_base_patch16_rpn_224` - 82.3 @ 224 -- rel pos + res-post-norm, no class token, avg pool
483
+ * Vision Transformer refactor to remove representation layer that was only used in initial vit and rarely used since with newer pretrain (ie `How to Train Your ViT`)
484
+ * `vit_*` models support removal of class token, use of global average pool, use of fc_norm (ala beit, mae).
485
+
486
+ ### April 22, 2022
487
+ * `timm` models are now officially supported in [fast.ai](https://www.fast.ai/)! Just in time for the new Practical Deep Learning course. `timmdocs` documentation link updated to [timm.fast.ai](http://timm.fast.ai/).
488
+ * Two more model weights added in the TPU trained [series](https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights). Some In22k pretrain still in progress.
489
+ * `seresnext101d_32x8d` - 83.69 @ 224, 84.35 @ 288
490
+ * `seresnextaa101d_32x8d` (anti-aliased w/ AvgPool2d) - 83.85 @ 224, 84.57 @ 288
491
+
492
+ ### March 23, 2022
493
+ * Add `ParallelBlock` and `LayerScale` option to base vit models to support model configs in [Three things everyone should know about ViT](https://arxiv.org/abs/2203.09795)
494
+ * `convnext_tiny_hnf` (head norm first) weights trained with (close to) A2 recipe, 82.2% top-1, could do better with more epochs.
495
+
496
+ ### March 21, 2022
497
+ * Merge `norm_norm_norm`. **IMPORTANT** this update for a coming 0.6.x release will likely de-stabilize the master branch for a while. Branch [`0.5.x`](https://github.com/rwightman/pytorch-image-models/tree/0.5.x) or a previous 0.5.x release can be used if stability is required.
498
+ * Significant weights update (all TPU trained) as described in this [release](https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights)
499
+ * `regnety_040` - 82.3 @ 224, 82.96 @ 288
500
+ * `regnety_064` - 83.0 @ 224, 83.65 @ 288
501
+ * `regnety_080` - 83.17 @ 224, 83.86 @ 288
502
+ * `regnetv_040` - 82.44 @ 224, 83.18 @ 288 (timm pre-act)
503
+ * `regnetv_064` - 83.1 @ 224, 83.71 @ 288 (timm pre-act)
504
+ * `regnetz_040` - 83.67 @ 256, 84.25 @ 320
505
+ * `regnetz_040h` - 83.77 @ 256, 84.5 @ 320 (w/ extra fc in head)
506
+ * `resnetv2_50d_gn` - 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)
507
+ * `resnetv2_50d_evos` 80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)
508
+ * `regnetz_c16_evos` - 81.9 @ 256, 82.64 @ 320 (EvoNormS)
509
+ * `regnetz_d8_evos` - 83.42 @ 256, 84.04 @ 320 (EvoNormS)
510
+ * `xception41p` - 82 @ 299 (timm pre-act)
511
+ * `xception65` - 83.17 @ 299
512
+ * `xception65p` - 83.14 @ 299 (timm pre-act)
513
+ * `resnext101_64x4d` - 82.46 @ 224, 83.16 @ 288
514
+ * `seresnext101_32x8d` - 83.57 @ 224, 84.270 @ 288
515
+ * `resnetrs200` - 83.85 @ 256, 84.44 @ 320
516
+ * HuggingFace hub support fixed w/ initial groundwork for allowing alternative 'config sources' for pretrained model definitions and weights (generic local file / remote url support soon)
517
+ * SwinTransformer-V2 implementation added. Submitted by [Christoph Reich](https://github.com/ChristophReich1996). Training experiments and model changes by myself are ongoing so expect compat breaks.
518
+ * Swin-S3 (AutoFormerV2) models / weights added from https://github.com/microsoft/Cream/tree/main/AutoFormerV2
519
+ * MobileViT models w/ weights adapted from https://github.com/apple/ml-cvnets
520
+ * PoolFormer models w/ weights adapted from https://github.com/sail-sg/poolformer
521
+ * VOLO models w/ weights adapted from https://github.com/sail-sg/volo
522
+ * Significant work experimenting with non-BatchNorm norm layers such as EvoNorm, FilterResponseNorm, GroupNorm, etc
523
+ * Enhance support for alternate norm + act ('NormAct') layers added to a number of models, esp EfficientNet/MobileNetV3, RegNet, and aligned Xception
524
+ * Grouped conv support added to EfficientNet family
525
+ * Add 'group matching' API to all models to allow grouping model parameters for application of 'layer-wise' LR decay, lr scale added to LR scheduler
526
+ * Gradient checkpointing support added to many models
527
+ * `forward_head(x, pre_logits=False)` fn added to all models to allow separate calls of `forward_features` + `forward_head`
528
+ * All vision transformer and vision MLP models update to return non-pooled / non-token selected features from `foward_features`, for consistency with CNN models, token selection or pooling now applied in `forward_head`
529
+
530
+ ### Feb 2, 2022
531
+ * [Chris Hughes](https://github.com/Chris-hughes10) posted an exhaustive run through of `timm` on his blog yesterday. Well worth a read. [Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055)
532
+ * I'm currently prepping to merge the `norm_norm_norm` branch back to master (ver 0.6.x) in next week or so.
533
+ * The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware `pip install git+https://github.com/rwightman/pytorch-image-models` installs!
534
+ * `0.5.x` releases and a `0.5.x` branch will remain stable with a cherry pick or two until dust clears. Recommend sticking to pypi install for a bit if you want stable.
535
+
536
+ ### Jan 14, 2022
537
+ * Version 0.5.4 w/ release to be pushed to pypi. It's been a while since last pypi update and riskier changes will be merged to main branch soon....
538
+ * Add ConvNeXT models /w weights from official impl (https://github.com/facebookresearch/ConvNeXt), a few perf tweaks, compatible with timm features
539
+ * Tried training a few small (~1.8-3M param) / mobile optimized models, a few are good so far, more on the way...
540
+ * `mnasnet_small` - 65.6 top-1
541
+ * `mobilenetv2_050` - 65.9
542
+ * `lcnet_100/075/050` - 72.1 / 68.8 / 63.1
543
+ * `semnasnet_075` - 73
544
+ * `fbnetv3_b/d/g` - 79.1 / 79.7 / 82.0
545
+ * TinyNet models added by [rsomani95](https://github.com/rsomani95)
546
+ * LCNet added via MobileNetV3 architecture
547
+
548
+ ## Introduction
549
+
550
+ Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.
551
+
552
+ The work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code docstrings. Please let me know if I missed anything.
553
+
554
+ ## Models
555
+
556
+ All model architecture families include variants with pretrained weights. There are specific model variants without any weights, it is NOT a bug. Help training new or better weights is always appreciated.
557
+
558
+ * Aggregating Nested Transformers - https://arxiv.org/abs/2105.12723
559
+ * BEiT - https://arxiv.org/abs/2106.08254
560
+ * Big Transfer ResNetV2 (BiT) - https://arxiv.org/abs/1912.11370
561
+ * Bottleneck Transformers - https://arxiv.org/abs/2101.11605
562
+ * CaiT (Class-Attention in Image Transformers) - https://arxiv.org/abs/2103.17239
563
+ * CoaT (Co-Scale Conv-Attentional Image Transformers) - https://arxiv.org/abs/2104.06399
564
+ * CoAtNet (Convolution and Attention) - https://arxiv.org/abs/2106.04803
565
+ * ConvNeXt - https://arxiv.org/abs/2201.03545
566
+ * ConvNeXt-V2 - http://arxiv.org/abs/2301.00808
567
+ * ConViT (Soft Convolutional Inductive Biases Vision Transformers)- https://arxiv.org/abs/2103.10697
568
+ * CspNet (Cross-Stage Partial Networks) - https://arxiv.org/abs/1911.11929
569
+ * DeiT - https://arxiv.org/abs/2012.12877
570
+ * DeiT-III - https://arxiv.org/pdf/2204.07118.pdf
571
+ * DenseNet - https://arxiv.org/abs/1608.06993
572
+ * DLA - https://arxiv.org/abs/1707.06484
573
+ * DPN (Dual-Path Network) - https://arxiv.org/abs/1707.01629
574
+ * EdgeNeXt - https://arxiv.org/abs/2206.10589
575
+ * EfficientFormer - https://arxiv.org/abs/2206.01191
576
+ * EfficientNet (MBConvNet Family)
577
+ * EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
578
+ * EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
579
+ * EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
580
+ * EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
581
+ * EfficientNet V2 - https://arxiv.org/abs/2104.00298
582
+ * FBNet-C - https://arxiv.org/abs/1812.03443
583
+ * MixNet - https://arxiv.org/abs/1907.09595
584
+ * MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
585
+ * MobileNet-V2 - https://arxiv.org/abs/1801.04381
586
+ * Single-Path NAS - https://arxiv.org/abs/1904.02877
587
+ * TinyNet - https://arxiv.org/abs/2010.14819
588
+ * EVA - https://arxiv.org/abs/2211.07636
589
+ * EVA-02 - https://arxiv.org/abs/2303.11331
590
+ * FlexiViT - https://arxiv.org/abs/2212.08013
591
+ * FocalNet (Focal Modulation Networks) - https://arxiv.org/abs/2203.11926
592
+ * GCViT (Global Context Vision Transformer) - https://arxiv.org/abs/2206.09959
593
+ * GhostNet - https://arxiv.org/abs/1911.11907
594
+ * gMLP - https://arxiv.org/abs/2105.08050
595
+ * GPU-Efficient Networks - https://arxiv.org/abs/2006.14090
596
+ * Halo Nets - https://arxiv.org/abs/2103.12731
597
+ * HRNet - https://arxiv.org/abs/1908.07919
598
+ * Inception-V3 - https://arxiv.org/abs/1512.00567
599
+ * Inception-ResNet-V2 and Inception-V4 - https://arxiv.org/abs/1602.07261
600
+ * Lambda Networks - https://arxiv.org/abs/2102.08602
601
+ * LeViT (Vision Transformer in ConvNet's Clothing) - https://arxiv.org/abs/2104.01136
602
+ * MaxViT (Multi-Axis Vision Transformer) - https://arxiv.org/abs/2204.01697
603
+ * MLP-Mixer - https://arxiv.org/abs/2105.01601
604
+ * MobileNet-V3 (MBConvNet w/ Efficient Head) - https://arxiv.org/abs/1905.02244
605
+ * FBNet-V3 - https://arxiv.org/abs/2006.02049
606
+ * HardCoRe-NAS - https://arxiv.org/abs/2102.11646
607
+ * LCNet - https://arxiv.org/abs/2109.15099
608
+ * MobileViT - https://arxiv.org/abs/2110.02178
609
+ * MobileViT-V2 - https://arxiv.org/abs/2206.02680
610
+ * MViT-V2 (Improved Multiscale Vision Transformer) - https://arxiv.org/abs/2112.01526
611
+ * NASNet-A - https://arxiv.org/abs/1707.07012
612
+ * NesT - https://arxiv.org/abs/2105.12723
613
+ * NFNet-F - https://arxiv.org/abs/2102.06171
614
+ * NF-RegNet / NF-ResNet - https://arxiv.org/abs/2101.08692
615
+ * PNasNet - https://arxiv.org/abs/1712.00559
616
+ * PoolFormer (MetaFormer) - https://arxiv.org/abs/2111.11418
617
+ * Pooling-based Vision Transformer (PiT) - https://arxiv.org/abs/2103.16302
618
+ * PVT-V2 (Improved Pyramid Vision Transformer) - https://arxiv.org/abs/2106.13797
619
+ * RegNet - https://arxiv.org/abs/2003.13678
620
+ * RegNetZ - https://arxiv.org/abs/2103.06877
621
+ * RepVGG - https://arxiv.org/abs/2101.03697
622
+ * ResMLP - https://arxiv.org/abs/2105.03404
623
+ * ResNet/ResNeXt
624
+ * ResNet (v1b/v1.5) - https://arxiv.org/abs/1512.03385
625
+ * ResNeXt - https://arxiv.org/abs/1611.05431
626
+ * 'Bag of Tricks' / Gluon C, D, E, S variations - https://arxiv.org/abs/1812.01187
627
+ * Weakly-supervised (WSL) Instagram pretrained / ImageNet tuned ResNeXt101 - https://arxiv.org/abs/1805.00932
628
+ * Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet/ResNeXts - https://arxiv.org/abs/1905.00546
629
+ * ECA-Net (ECAResNet) - https://arxiv.org/abs/1910.03151v4
630
+ * Squeeze-and-Excitation Networks (SEResNet) - https://arxiv.org/abs/1709.01507
631
+ * ResNet-RS - https://arxiv.org/abs/2103.07579
632
+ * Res2Net - https://arxiv.org/abs/1904.01169
633
+ * ResNeSt - https://arxiv.org/abs/2004.08955
634
+ * ReXNet - https://arxiv.org/abs/2007.00992
635
+ * SelecSLS - https://arxiv.org/abs/1907.00837
636
+ * Selective Kernel Networks - https://arxiv.org/abs/1903.06586
637
+ * Sequencer2D - https://arxiv.org/abs/2205.01972
638
+ * Swin S3 (AutoFormerV2) - https://arxiv.org/abs/2111.14725
639
+ * Swin Transformer - https://arxiv.org/abs/2103.14030
640
+ * Swin Transformer V2 - https://arxiv.org/abs/2111.09883
641
+ * Transformer-iN-Transformer (TNT) - https://arxiv.org/abs/2103.00112
642
+ * TResNet - https://arxiv.org/abs/2003.13630
643
+ * Twins (Spatial Attention in Vision Transformers) - https://arxiv.org/pdf/2104.13840.pdf
644
+ * Visformer - https://arxiv.org/abs/2104.12533
645
+ * Vision Transformer - https://arxiv.org/abs/2010.11929
646
+ * VOLO (Vision Outlooker) - https://arxiv.org/abs/2106.13112
647
+ * VovNet V2 and V1 - https://arxiv.org/abs/1911.06667
648
+ * Xception - https://arxiv.org/abs/1610.02357
649
+ * Xception (Modified Aligned, Gluon) - https://arxiv.org/abs/1802.02611
650
+ * Xception (Modified Aligned, TF) - https://arxiv.org/abs/1802.02611
651
+ * XCiT (Cross-Covariance Image Transformers) - https://arxiv.org/abs/2106.09681
652
+
653
+ ## Features
654
+
655
+ Several (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:
656
+
657
+ * All models have a common default configuration interface and API for
658
+ * accessing/changing the classifier - `get_classifier` and `reset_classifier`
659
+ * doing a forward pass on just the features - `forward_features` (see [documentation](https://huggingface.co/docs/timm/feature_extraction))
660
+ * these makes it easy to write consistent network wrappers that work with any of the models
661
+ * All models support multi-scale feature map extraction (feature pyramids) via create_model (see [documentation](https://huggingface.co/docs/timm/feature_extraction))
662
+ * `create_model(name, features_only=True, out_indices=..., output_stride=...)`
663
+ * `out_indices` creation arg specifies which feature maps to return, these indices are 0 based and generally correspond to the `C(i + 1)` feature level.
664
+ * `output_stride` creation arg controls output stride of the network by using dilated convolutions. Most networks are stride 32 by default. Not all networks support this.
665
+ * feature map channel counts, reduction level (stride) can be queried AFTER model creation via the `.feature_info` member
666
+ * All models have a consistent pretrained weight loader that adapts last linear if necessary, and from 3 to 1 channel input if desired
667
+ * High performance [reference training, validation, and inference scripts](https://huggingface.co/docs/timm/training_script) that work in several process/GPU modes:
668
+ * NVIDIA DDP w/ a single GPU per process, multiple processes with APEX present (AMP mixed-precision optional)
669
+ * PyTorch DistributedDataParallel w/ multi-gpu, single process (AMP disabled as it crashes when enabled)
670
+ * PyTorch w/ single GPU single process (AMP optional)
671
+ * A dynamic global pool implementation that allows selecting from average pooling, max pooling, average + max, or concat([average, max]) at model creation. All global pooling is adaptive average by default and compatible with pretrained weights.
672
+ * A 'Test Time Pool' wrapper that can wrap any of the included models and usually provides improved performance doing inference with input images larger than the training size. Idea adapted from original DPN implementation when I ported (https://github.com/cypw/DPNs)
673
+ * Learning rate schedulers
674
+ * Ideas adopted from
675
+ * [AllenNLP schedulers](https://github.com/allenai/allennlp/tree/master/allennlp/training/learning_rate_schedulers)
676
+ * [FAIRseq lr_scheduler](https://github.com/pytorch/fairseq/tree/master/fairseq/optim/lr_scheduler)
677
+ * SGDR: Stochastic Gradient Descent with Warm Restarts (https://arxiv.org/abs/1608.03983)
678
+ * Schedulers include `step`, `cosine` w/ restarts, `tanh` w/ restarts, `plateau`
679
+ * Optimizers:
680
+ * `rmsprop_tf` adapted from PyTorch RMSProp by myself. Reproduces much improved Tensorflow RMSProp behaviour.
681
+ * `radam` by [Liyuan Liu](https://github.com/LiyuanLucasLiu/RAdam) (https://arxiv.org/abs/1908.03265)
682
+ * `novograd` by [Masashi Kimura](https://github.com/convergence-lab/novograd) (https://arxiv.org/abs/1905.11286)
683
+ * `lookahead` adapted from impl by [Liam](https://github.com/alphadl/lookahead.pytorch) (https://arxiv.org/abs/1907.08610)
684
+ * `fused<name>` optimizers by name with [NVIDIA Apex](https://github.com/NVIDIA/apex/tree/master/apex/optimizers) installed
685
+ * `adamp` and `sgdp` by [Naver ClovAI](https://github.com/clovaai) (https://arxiv.org/abs/2006.08217)
686
+ * `adafactor` adapted from [FAIRSeq impl](https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py) (https://arxiv.org/abs/1804.04235)
687
+ * `adahessian` by [David Samuel](https://github.com/davda54/ada-hessian) (https://arxiv.org/abs/2006.00719)
688
+ * Random Erasing from [Zhun Zhong](https://github.com/zhunzhong07/Random-Erasing/blob/master/transforms.py) (https://arxiv.org/abs/1708.04896)
689
+ * Mixup (https://arxiv.org/abs/1710.09412)
690
+ * CutMix (https://arxiv.org/abs/1905.04899)
691
+ * AutoAugment (https://arxiv.org/abs/1805.09501) and RandAugment (https://arxiv.org/abs/1909.13719) ImageNet configurations modeled after impl for EfficientNet training (https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/autoaugment.py)
692
+ * AugMix w/ JSD loss (https://arxiv.org/abs/1912.02781), JSD w/ clean + augmented mixing support works with AutoAugment and RandAugment as well
693
+ * SplitBachNorm - allows splitting batch norm layers between clean and augmented (auxiliary batch norm) data
694
+ * DropPath aka "Stochastic Depth" (https://arxiv.org/abs/1603.09382)
695
+ * DropBlock (https://arxiv.org/abs/1810.12890)
696
+ * Blur Pooling (https://arxiv.org/abs/1904.11486)
697
+ * Space-to-Depth by [mrT23](https://github.com/mrT23/TResNet/blob/master/src/models/tresnet/layers/space_to_depth.py) (https://arxiv.org/abs/1801.04590) -- original paper?
698
+ * Adaptive Gradient Clipping (https://arxiv.org/abs/2102.06171, https://github.com/deepmind/deepmind-research/tree/master/nfnets)
699
+ * An extensive selection of channel and/or spatial attention modules:
700
+ * Bottleneck Transformer - https://arxiv.org/abs/2101.11605
701
+ * CBAM - https://arxiv.org/abs/1807.06521
702
+ * Effective Squeeze-Excitation (ESE) - https://arxiv.org/abs/1911.06667
703
+ * Efficient Channel Attention (ECA) - https://arxiv.org/abs/1910.03151
704
+ * Gather-Excite (GE) - https://arxiv.org/abs/1810.12348
705
+ * Global Context (GC) - https://arxiv.org/abs/1904.11492
706
+ * Halo - https://arxiv.org/abs/2103.12731
707
+ * Involution - https://arxiv.org/abs/2103.06255
708
+ * Lambda Layer - https://arxiv.org/abs/2102.08602
709
+ * Non-Local (NL) - https://arxiv.org/abs/1711.07971
710
+ * Squeeze-and-Excitation (SE) - https://arxiv.org/abs/1709.01507
711
+ * Selective Kernel (SK) - (https://arxiv.org/abs/1903.06586
712
+ * Split (SPLAT) - https://arxiv.org/abs/2004.08955
713
+ * Shifted Window (SWIN) - https://arxiv.org/abs/2103.14030
714
+
715
+ ## Results
716
+
717
+ Model validation results can be found in the [results tables](results/README.md)
718
+
719
+ ## Getting Started (Documentation)
720
+
721
+ The official documentation can be found at https://huggingface.co/docs/hub/timm. Documentation contributions are welcome.
722
+
723
+ [Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055) by [Chris Hughes](https://github.com/Chris-hughes10) is an extensive blog post covering many aspects of `timm` in detail.
724
+
725
+ [timmdocs](http://timm.fast.ai/) is an alternate set of documentation for `timm`. A big thanks to [Aman Arora](https://github.com/amaarora) for his efforts creating timmdocs.
726
+
727
+ [paperswithcode](https://paperswithcode.com/lib/timm) is a good resource for browsing the models within `timm`.
728
+
729
+ ## Train, Validation, Inference Scripts
730
+
731
+ The root folder of the repository contains reference train, validation, and inference scripts that work with the included models and other features of this repository. They are adaptable for other datasets and use cases with a little hacking. See [documentation](https://huggingface.co/docs/timm/training_script).
732
+
733
+ ## Awesome PyTorch Resources
734
+
735
+ One of the greatest assets of PyTorch is the community and their contributions. A few of my favourite resources that pair well with the models and components here are listed below.
736
+
737
+ ### Object Detection, Instance and Semantic Segmentation
738
+ * Detectron2 - https://github.com/facebookresearch/detectron2
739
+ * Segmentation Models (Semantic) - https://github.com/qubvel/segmentation_models.pytorch
740
+ * EfficientDet (Obj Det, Semantic soon) - https://github.com/rwightman/efficientdet-pytorch
741
+
742
+ ### Computer Vision / Image Augmentation
743
+ * Albumentations - https://github.com/albumentations-team/albumentations
744
+ * Kornia - https://github.com/kornia/kornia
745
+
746
+ ### Knowledge Distillation
747
+ * RepDistiller - https://github.com/HobbitLong/RepDistiller
748
+ * torchdistill - https://github.com/yoshitomo-matsubara/torchdistill
749
+
750
+ ### Metric Learning
751
+ * PyTorch Metric Learning - https://github.com/KevinMusgrave/pytorch-metric-learning
752
+
753
+ ### Training / Frameworks
754
+ * fastai - https://github.com/fastai/fastai
755
+
756
+ ## Licenses
757
+
758
+ ### Code
759
+ The code here is licensed Apache 2.0. I've taken care to make sure any third party code included or adapted has compatible (permissive) licenses such as MIT, BSD, etc. I've made an effort to avoid any GPL / LGPL conflicts. That said, it is your responsibility to ensure you comply with licenses here and conditions of any dependent licenses. Where applicable, I've linked the sources/references for various components in docstrings. If you think I've missed anything please create an issue.
760
+
761
+ ### Pretrained Weights
762
+ So far all of the pretrained weights available here are pretrained on ImageNet with a select few that have some additional pretraining (see extra note below). ImageNet was released for non-commercial research purposes only (https://image-net.org/download). It's not clear what the implications of that are for the use of pretrained weights from that dataset. Any models I have trained with ImageNet are done for research purposes and one should assume that the original dataset license applies to the weights. It's best to seek legal advice if you intend to use the pretrained weights in a commercial product.
763
+
764
+ #### Pretrained on more than ImageNet
765
+ Several weights included or references here were pretrained with proprietary datasets that I do not have access to. These include the Facebook WSL, SSL, SWSL ResNe(Xt) and the Google Noisy Student EfficientNet models. The Facebook models have an explicit non-commercial license (CC-BY-NC 4.0, https://github.com/facebookresearch/semi-supervised-ImageNet1K-models, https://github.com/facebookresearch/WSL-Images). The Google models do not appear to have any restriction beyond the Apache 2.0 license (and ImageNet concerns). In either case, you should contact Facebook or Google with any questions.
766
+
767
+ ## Citing
768
+
769
+ ### BibTeX
770
+
771
+ ```bibtex
772
+ @misc{rw2019timm,
773
+ author = {Ross Wightman},
774
+ title = {PyTorch Image Models},
775
+ year = {2019},
776
+ publisher = {GitHub},
777
+ journal = {GitHub repository},
778
+ doi = {10.5281/zenodo.4414861},
779
+ howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
780
+ }
781
+ ```
782
+
783
+ ### Latest DOI
784
+
785
+ [![DOI](https://zenodo.org/badge/168799526.svg)](https://zenodo.org/badge/latestdoi/168799526)
avg_checkpoints.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """ Checkpoint Averaging Script
3
+
4
+ This script averages all model weights for checkpoints in specified path that match
5
+ the specified filter wildcard. All checkpoints must be from the exact same model.
6
+
7
+ For any hope of decent results, the checkpoints should be from the same or child
8
+ (via resumes) training session. This can be viewed as similar to maintaining running
9
+ EMA (exponential moving average) of the model weights or performing SWA (stochastic
10
+ weight averaging), but post-training.
11
+
12
+ Hacked together by / Copyright 2020 Ross Wightman (https://github.com/rwightman)
13
+ """
14
+ import torch
15
+ import argparse
16
+ import os
17
+ import glob
18
+ import hashlib
19
+ from timm.models import load_state_dict
20
+ try:
21
+ import safetensors.torch
22
+ _has_safetensors = True
23
+ except ImportError:
24
+ _has_safetensors = False
25
+
26
+ DEFAULT_OUTPUT = "./averaged.pth"
27
+ DEFAULT_SAFE_OUTPUT = "./averaged.safetensors"
28
+
29
+ parser = argparse.ArgumentParser(description='PyTorch Checkpoint Averager')
30
+ parser.add_argument('--input', default='', type=str, metavar='PATH',
31
+ help='path to base input folder containing checkpoints')
32
+ parser.add_argument('--filter', default='*.pth.tar', type=str, metavar='WILDCARD',
33
+ help='checkpoint filter (path wildcard)')
34
+ parser.add_argument('--output', default=DEFAULT_OUTPUT, type=str, metavar='PATH',
35
+ help=f'Output filename. Defaults to {DEFAULT_SAFE_OUTPUT} when passing --safetensors.')
36
+ parser.add_argument('--no-use-ema', dest='no_use_ema', action='store_true',
37
+ help='Force not using ema version of weights (if present)')
38
+ parser.add_argument('--no-sort', dest='no_sort', action='store_true',
39
+ help='Do not sort and select by checkpoint metric, also makes "n" argument irrelevant')
40
+ parser.add_argument('-n', type=int, default=10, metavar='N',
41
+ help='Number of checkpoints to average')
42
+ parser.add_argument('--safetensors', action='store_true',
43
+ help='Save weights using safetensors instead of the default torch way (pickle).')
44
+
45
+
46
+ def checkpoint_metric(checkpoint_path):
47
+ if not checkpoint_path or not os.path.isfile(checkpoint_path):
48
+ return {}
49
+ print("=> Extracting metric from checkpoint '{}'".format(checkpoint_path))
50
+ checkpoint = torch.load(checkpoint_path, map_location='cpu')
51
+ metric = None
52
+ if 'metric' in checkpoint:
53
+ metric = checkpoint['metric']
54
+ elif 'metrics' in checkpoint and 'metric_name' in checkpoint:
55
+ metrics = checkpoint['metrics']
56
+ print(metrics)
57
+ metric = metrics[checkpoint['metric_name']]
58
+ return metric
59
+
60
+
61
+ def main():
62
+ args = parser.parse_args()
63
+ # by default use the EMA weights (if present)
64
+ args.use_ema = not args.no_use_ema
65
+ # by default sort by checkpoint metric (if present) and avg top n checkpoints
66
+ args.sort = not args.no_sort
67
+
68
+ if args.safetensors and args.output == DEFAULT_OUTPUT:
69
+ # Default path changes if using safetensors
70
+ args.output = DEFAULT_SAFE_OUTPUT
71
+
72
+ output, output_ext = os.path.splitext(args.output)
73
+ if not output_ext:
74
+ output_ext = ('.safetensors' if args.safetensors else '.pth')
75
+ output = output + output_ext
76
+
77
+ if args.safetensors and not output_ext == ".safetensors":
78
+ print(
79
+ "Warning: saving weights as safetensors but output file extension is not "
80
+ f"set to '.safetensors': {args.output}"
81
+ )
82
+
83
+ if os.path.exists(output):
84
+ print("Error: Output filename ({}) already exists.".format(output))
85
+ exit(1)
86
+
87
+ pattern = args.input
88
+ if not args.input.endswith(os.path.sep) and not args.filter.startswith(os.path.sep):
89
+ pattern += os.path.sep
90
+ pattern += args.filter
91
+ checkpoints = glob.glob(pattern, recursive=True)
92
+
93
+ if args.sort:
94
+ checkpoint_metrics = []
95
+ for c in checkpoints:
96
+ metric = checkpoint_metric(c)
97
+ if metric is not None:
98
+ checkpoint_metrics.append((metric, c))
99
+ checkpoint_metrics = list(sorted(checkpoint_metrics))
100
+ checkpoint_metrics = checkpoint_metrics[-args.n:]
101
+ if checkpoint_metrics:
102
+ print("Selected checkpoints:")
103
+ [print(m, c) for m, c in checkpoint_metrics]
104
+ avg_checkpoints = [c for m, c in checkpoint_metrics]
105
+ else:
106
+ avg_checkpoints = checkpoints
107
+ if avg_checkpoints:
108
+ print("Selected checkpoints:")
109
+ [print(c) for c in checkpoints]
110
+
111
+ if not avg_checkpoints:
112
+ print('Error: No checkpoints found to average.')
113
+ exit(1)
114
+
115
+ avg_state_dict = {}
116
+ avg_counts = {}
117
+ for c in avg_checkpoints:
118
+ new_state_dict = load_state_dict(c, args.use_ema)
119
+ if not new_state_dict:
120
+ print(f"Error: Checkpoint ({c}) doesn't exist")
121
+ continue
122
+ for k, v in new_state_dict.items():
123
+ if k not in avg_state_dict:
124
+ avg_state_dict[k] = v.clone().to(dtype=torch.float64)
125
+ avg_counts[k] = 1
126
+ else:
127
+ avg_state_dict[k] += v.to(dtype=torch.float64)
128
+ avg_counts[k] += 1
129
+
130
+ for k, v in avg_state_dict.items():
131
+ v.div_(avg_counts[k])
132
+
133
+ # float32 overflow seems unlikely based on weights seen to date, but who knows
134
+ float32_info = torch.finfo(torch.float32)
135
+ final_state_dict = {}
136
+ for k, v in avg_state_dict.items():
137
+ v = v.clamp(float32_info.min, float32_info.max)
138
+ final_state_dict[k] = v.to(dtype=torch.float32)
139
+
140
+ if args.safetensors:
141
+ assert _has_safetensors, "`pip install safetensors` to use .safetensors"
142
+ safetensors.torch.save_file(final_state_dict, output)
143
+ else:
144
+ torch.save(final_state_dict, output)
145
+
146
+ with open(output, 'rb') as f:
147
+ sha_hash = hashlib.sha256(f.read()).hexdigest()
148
+ print(f"=> Saved state_dict to '{output}, SHA256: {sha_hash}'")
149
+
150
+
151
+ if __name__ == '__main__':
152
+ main()
benchmark.py ADDED
@@ -0,0 +1,696 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """ Model Benchmark Script
3
+
4
+ An inference and train step benchmark script for timm models.
5
+
6
+ Hacked together by Ross Wightman (https://github.com/rwightman)
7
+ """
8
+ import argparse
9
+ import csv
10
+ import json
11
+ import logging
12
+ import time
13
+ from collections import OrderedDict
14
+ from contextlib import suppress
15
+ from functools import partial
16
+
17
+ import torch
18
+ import torch.nn as nn
19
+ import torch.nn.parallel
20
+
21
+ from timm.data import resolve_data_config
22
+ from timm.layers import set_fast_norm
23
+ from timm.models import create_model, is_model, list_models
24
+ from timm.optim import create_optimizer_v2
25
+ from timm.utils import setup_default_logging, set_jit_fuser, decay_batch_step, check_batch_size_retry, ParseKwargs
26
+
27
+ has_apex = False
28
+ try:
29
+ from apex import amp
30
+ has_apex = True
31
+ except ImportError:
32
+ pass
33
+
34
+ has_native_amp = False
35
+ try:
36
+ if getattr(torch.cuda.amp, 'autocast') is not None:
37
+ has_native_amp = True
38
+ except AttributeError:
39
+ pass
40
+
41
+ try:
42
+ from deepspeed.profiling.flops_profiler import get_model_profile
43
+ has_deepspeed_profiling = True
44
+ except ImportError as e:
45
+ has_deepspeed_profiling = False
46
+
47
+ try:
48
+ from fvcore.nn import FlopCountAnalysis, flop_count_str, ActivationCountAnalysis
49
+ has_fvcore_profiling = True
50
+ except ImportError as e:
51
+ FlopCountAnalysis = None
52
+ has_fvcore_profiling = False
53
+
54
+ try:
55
+ from functorch.compile import memory_efficient_fusion
56
+ has_functorch = True
57
+ except ImportError as e:
58
+ has_functorch = False
59
+
60
+ has_compile = hasattr(torch, 'compile')
61
+
62
+ if torch.cuda.is_available():
63
+ torch.backends.cuda.matmul.allow_tf32 = True
64
+ torch.backends.cudnn.benchmark = True
65
+ _logger = logging.getLogger('validate')
66
+
67
+
68
+ parser = argparse.ArgumentParser(description='PyTorch Benchmark')
69
+
70
+ # benchmark specific args
71
+ parser.add_argument('--model-list', metavar='NAME', default='',
72
+ help='txt file based list of model names to benchmark')
73
+ parser.add_argument('--bench', default='both', type=str,
74
+ help="Benchmark mode. One of 'inference', 'train', 'both'. Defaults to 'both'")
75
+ parser.add_argument('--detail', action='store_true', default=False,
76
+ help='Provide train fwd/bwd/opt breakdown detail if True. Defaults to False')
77
+ parser.add_argument('--no-retry', action='store_true', default=False,
78
+ help='Do not decay batch size and retry on error.')
79
+ parser.add_argument('--results-file', default='', type=str,
80
+ help='Output csv file for validation results (summary)')
81
+ parser.add_argument('--results-format', default='csv', type=str,
82
+ help='Format for results file one of (csv, json) (default: csv).')
83
+ parser.add_argument('--num-warm-iter', default=10, type=int,
84
+ help='Number of warmup iterations (default: 10)')
85
+ parser.add_argument('--num-bench-iter', default=40, type=int,
86
+ help='Number of benchmark iterations (default: 40)')
87
+ parser.add_argument('--device', default='cuda', type=str,
88
+ help="device to run benchmark on")
89
+
90
+ # common inference / train args
91
+ parser.add_argument('--model', '-m', metavar='NAME', default='resnet50',
92
+ help='model architecture (default: resnet50)')
93
+ parser.add_argument('-b', '--batch-size', default=256, type=int,
94
+ metavar='N', help='mini-batch size (default: 256)')
95
+ parser.add_argument('--img-size', default=None, type=int,
96
+ metavar='N', help='Input image dimension, uses model default if empty')
97
+ parser.add_argument('--input-size', default=None, nargs=3, type=int,
98
+ metavar='N N N', help='Input all image dimensions (d h w, e.g. --input-size 3 224 224), uses model default if empty')
99
+ parser.add_argument('--use-train-size', action='store_true', default=False,
100
+ help='Run inference at train size, not test-input-size if it exists.')
101
+ parser.add_argument('--num-classes', type=int, default=None,
102
+ help='Number classes in dataset')
103
+ parser.add_argument('--gp', default=None, type=str, metavar='POOL',
104
+ help='Global pool type, one of (fast, avg, max, avgmax, avgmaxc). Model default if None.')
105
+ parser.add_argument('--channels-last', action='store_true', default=False,
106
+ help='Use channels_last memory layout')
107
+ parser.add_argument('--grad-checkpointing', action='store_true', default=False,
108
+ help='Enable gradient checkpointing through model blocks/stages')
109
+ parser.add_argument('--amp', action='store_true', default=False,
110
+ help='use PyTorch Native AMP for mixed precision training. Overrides --precision arg.')
111
+ parser.add_argument('--amp-dtype', default='float16', type=str,
112
+ help='lower precision AMP dtype (default: float16). Overrides --precision arg if args.amp True.')
113
+ parser.add_argument('--precision', default='float32', type=str,
114
+ help='Numeric precision. One of (amp, float32, float16, bfloat16, tf32)')
115
+ parser.add_argument('--fuser', default='', type=str,
116
+ help="Select jit fuser. One of ('', 'te', 'old', 'nvfuser')")
117
+ parser.add_argument('--fast-norm', default=False, action='store_true',
118
+ help='enable experimental fast-norm')
119
+ parser.add_argument('--model-kwargs', nargs='*', default={}, action=ParseKwargs)
120
+
121
+ # codegen (model compilation) options
122
+ scripting_group = parser.add_mutually_exclusive_group()
123
+ scripting_group.add_argument('--torchscript', dest='torchscript', action='store_true',
124
+ help='convert model torchscript for inference')
125
+ scripting_group.add_argument('--torchcompile', nargs='?', type=str, default=None, const='inductor',
126
+ help="Enable compilation w/ specified backend (default: inductor).")
127
+ scripting_group.add_argument('--aot-autograd', default=False, action='store_true',
128
+ help="Enable AOT Autograd optimization.")
129
+
130
+ # train optimizer parameters
131
+ parser.add_argument('--opt', default='sgd', type=str, metavar='OPTIMIZER',
132
+ help='Optimizer (default: "sgd"')
133
+ parser.add_argument('--opt-eps', default=None, type=float, metavar='EPSILON',
134
+ help='Optimizer Epsilon (default: None, use opt default)')
135
+ parser.add_argument('--opt-betas', default=None, type=float, nargs='+', metavar='BETA',
136
+ help='Optimizer Betas (default: None, use opt default)')
137
+ parser.add_argument('--momentum', type=float, default=0.9, metavar='M',
138
+ help='Optimizer momentum (default: 0.9)')
139
+ parser.add_argument('--weight-decay', type=float, default=0.0001,
140
+ help='weight decay (default: 0.0001)')
141
+ parser.add_argument('--clip-grad', type=float, default=None, metavar='NORM',
142
+ help='Clip gradient norm (default: None, no clipping)')
143
+ parser.add_argument('--clip-mode', type=str, default='norm',
144
+ help='Gradient clipping mode. One of ("norm", "value", "agc")')
145
+
146
+
147
+ # model regularization / loss params that impact model or loss fn
148
+ parser.add_argument('--smoothing', type=float, default=0.1,
149
+ help='Label smoothing (default: 0.1)')
150
+ parser.add_argument('--drop', type=float, default=0.0, metavar='PCT',
151
+ help='Dropout rate (default: 0.)')
152
+ parser.add_argument('--drop-path', type=float, default=None, metavar='PCT',
153
+ help='Drop path rate (default: None)')
154
+ parser.add_argument('--drop-block', type=float, default=None, metavar='PCT',
155
+ help='Drop block rate (default: None)')
156
+
157
+
158
+ def timestamp(sync=False):
159
+ return time.perf_counter()
160
+
161
+
162
+ def cuda_timestamp(sync=False, device=None):
163
+ if sync:
164
+ torch.cuda.synchronize(device=device)
165
+ return time.perf_counter()
166
+
167
+
168
+ def count_params(model: nn.Module):
169
+ return sum([m.numel() for m in model.parameters()])
170
+
171
+
172
+ def resolve_precision(precision: str):
173
+ assert precision in ('amp', 'amp_bfloat16', 'float16', 'bfloat16', 'float32')
174
+ amp_dtype = None # amp disabled
175
+ model_dtype = torch.float32
176
+ data_dtype = torch.float32
177
+ if precision == 'amp':
178
+ amp_dtype = torch.float16
179
+ elif precision == 'amp_bfloat16':
180
+ amp_dtype = torch.bfloat16
181
+ elif precision == 'float16':
182
+ model_dtype = torch.float16
183
+ data_dtype = torch.float16
184
+ elif precision == 'bfloat16':
185
+ model_dtype = torch.bfloat16
186
+ data_dtype = torch.bfloat16
187
+ return amp_dtype, model_dtype, data_dtype
188
+
189
+
190
+ def profile_deepspeed(model, input_size=(3, 224, 224), batch_size=1, detailed=False):
191
+ _, macs, _ = get_model_profile(
192
+ model=model,
193
+ input_shape=(batch_size,) + input_size, # input shape/resolution
194
+ print_profile=detailed, # prints the model graph with the measured profile attached to each module
195
+ detailed=detailed, # print the detailed profile
196
+ warm_up=10, # the number of warm-ups before measuring the time of each module
197
+ as_string=False, # print raw numbers (e.g. 1000) or as human-readable strings (e.g. 1k)
198
+ output_file=None, # path to the output file. If None, the profiler prints to stdout.
199
+ ignore_modules=None) # the list of modules to ignore in the profiling
200
+ return macs, 0 # no activation count in DS
201
+
202
+
203
+ def profile_fvcore(model, input_size=(3, 224, 224), batch_size=1, detailed=False, force_cpu=False):
204
+ if force_cpu:
205
+ model = model.to('cpu')
206
+ device, dtype = next(model.parameters()).device, next(model.parameters()).dtype
207
+ example_input = torch.ones((batch_size,) + input_size, device=device, dtype=dtype)
208
+ fca = FlopCountAnalysis(model, example_input)
209
+ aca = ActivationCountAnalysis(model, example_input)
210
+ if detailed:
211
+ fcs = flop_count_str(fca)
212
+ print(fcs)
213
+ return fca.total(), aca.total()
214
+
215
+
216
+ class BenchmarkRunner:
217
+ def __init__(
218
+ self,
219
+ model_name,
220
+ detail=False,
221
+ device='cuda',
222
+ torchscript=False,
223
+ torchcompile=None,
224
+ aot_autograd=False,
225
+ precision='float32',
226
+ fuser='',
227
+ num_warm_iter=10,
228
+ num_bench_iter=50,
229
+ use_train_size=False,
230
+ **kwargs
231
+ ):
232
+ self.model_name = model_name
233
+ self.detail = detail
234
+ self.device = device
235
+ self.amp_dtype, self.model_dtype, self.data_dtype = resolve_precision(precision)
236
+ self.channels_last = kwargs.pop('channels_last', False)
237
+ if self.amp_dtype is not None:
238
+ self.amp_autocast = partial(torch.cuda.amp.autocast, dtype=self.amp_dtype)
239
+ else:
240
+ self.amp_autocast = suppress
241
+
242
+ if fuser:
243
+ set_jit_fuser(fuser)
244
+ self.model = create_model(
245
+ model_name,
246
+ num_classes=kwargs.pop('num_classes', None),
247
+ in_chans=3,
248
+ global_pool=kwargs.pop('gp', 'fast'),
249
+ scriptable=torchscript,
250
+ drop_rate=kwargs.pop('drop', 0.),
251
+ drop_path_rate=kwargs.pop('drop_path', None),
252
+ drop_block_rate=kwargs.pop('drop_block', None),
253
+ **kwargs.pop('model_kwargs', {}),
254
+ )
255
+ self.model.to(
256
+ device=self.device,
257
+ dtype=self.model_dtype,
258
+ memory_format=torch.channels_last if self.channels_last else None)
259
+ self.num_classes = self.model.num_classes
260
+ self.param_count = count_params(self.model)
261
+ _logger.info('Model %s created, param count: %d' % (model_name, self.param_count))
262
+
263
+ data_config = resolve_data_config(kwargs, model=self.model, use_test_size=not use_train_size)
264
+ self.input_size = data_config['input_size']
265
+ self.batch_size = kwargs.pop('batch_size', 256)
266
+
267
+ self.compiled = False
268
+ if torchscript:
269
+ self.model = torch.jit.script(self.model)
270
+ self.compiled = True
271
+ elif torchcompile:
272
+ assert has_compile, 'A version of torch w/ torch.compile() is required, possibly a nightly.'
273
+ torch._dynamo.reset()
274
+ self.model = torch.compile(self.model, backend=torchcompile)
275
+ self.compiled = True
276
+ elif aot_autograd:
277
+ assert has_functorch, "functorch is needed for --aot-autograd"
278
+ self.model = memory_efficient_fusion(self.model)
279
+ self.compiled = True
280
+
281
+ self.example_inputs = None
282
+ self.num_warm_iter = num_warm_iter
283
+ self.num_bench_iter = num_bench_iter
284
+ self.log_freq = num_bench_iter // 5
285
+ if 'cuda' in self.device:
286
+ self.time_fn = partial(cuda_timestamp, device=self.device)
287
+ else:
288
+ self.time_fn = timestamp
289
+
290
+ def _init_input(self):
291
+ self.example_inputs = torch.randn(
292
+ (self.batch_size,) + self.input_size, device=self.device, dtype=self.data_dtype)
293
+ if self.channels_last:
294
+ self.example_inputs = self.example_inputs.contiguous(memory_format=torch.channels_last)
295
+
296
+
297
+ class InferenceBenchmarkRunner(BenchmarkRunner):
298
+
299
+ def __init__(
300
+ self,
301
+ model_name,
302
+ device='cuda',
303
+ torchscript=False,
304
+ **kwargs
305
+ ):
306
+ super().__init__(model_name=model_name, device=device, torchscript=torchscript, **kwargs)
307
+ self.model.eval()
308
+
309
+ def run(self):
310
+ def _step():
311
+ t_step_start = self.time_fn()
312
+ with self.amp_autocast():
313
+ output = self.model(self.example_inputs)
314
+ t_step_end = self.time_fn(True)
315
+ return t_step_end - t_step_start
316
+
317
+ _logger.info(
318
+ f'Running inference benchmark on {self.model_name} for {self.num_bench_iter} steps w/ '
319
+ f'input size {self.input_size} and batch size {self.batch_size}.')
320
+
321
+ with torch.no_grad():
322
+ self._init_input()
323
+
324
+ for _ in range(self.num_warm_iter):
325
+ _step()
326
+
327
+ total_step = 0.
328
+ num_samples = 0
329
+ t_run_start = self.time_fn()
330
+ for i in range(self.num_bench_iter):
331
+ delta_fwd = _step()
332
+ total_step += delta_fwd
333
+ num_samples += self.batch_size
334
+ num_steps = i + 1
335
+ if num_steps % self.log_freq == 0:
336
+ _logger.info(
337
+ f"Infer [{num_steps}/{self.num_bench_iter}]."
338
+ f" {num_samples / total_step:0.2f} samples/sec."
339
+ f" {1000 * total_step / num_steps:0.3f} ms/step.")
340
+ t_run_end = self.time_fn(True)
341
+ t_run_elapsed = t_run_end - t_run_start
342
+
343
+ results = dict(
344
+ samples_per_sec=round(num_samples / t_run_elapsed, 2),
345
+ step_time=round(1000 * total_step / self.num_bench_iter, 3),
346
+ batch_size=self.batch_size,
347
+ img_size=self.input_size[-1],
348
+ param_count=round(self.param_count / 1e6, 2),
349
+ )
350
+
351
+ retries = 0 if self.compiled else 2 # skip profiling if model is scripted
352
+ while retries:
353
+ retries -= 1
354
+ try:
355
+ if has_deepspeed_profiling:
356
+ macs, _ = profile_deepspeed(self.model, self.input_size)
357
+ results['gmacs'] = round(macs / 1e9, 2)
358
+ elif has_fvcore_profiling:
359
+ macs, activations = profile_fvcore(self.model, self.input_size, force_cpu=not retries)
360
+ results['gmacs'] = round(macs / 1e9, 2)
361
+ results['macts'] = round(activations / 1e6, 2)
362
+ except RuntimeError as e:
363
+ pass
364
+
365
+ _logger.info(
366
+ f"Inference benchmark of {self.model_name} done. "
367
+ f"{results['samples_per_sec']:.2f} samples/sec, {results['step_time']:.2f} ms/step")
368
+
369
+ return results
370
+
371
+
372
+ class TrainBenchmarkRunner(BenchmarkRunner):
373
+
374
+ def __init__(
375
+ self,
376
+ model_name,
377
+ device='cuda',
378
+ torchscript=False,
379
+ **kwargs
380
+ ):
381
+ super().__init__(model_name=model_name, device=device, torchscript=torchscript, **kwargs)
382
+ self.model.train()
383
+
384
+ self.loss = nn.CrossEntropyLoss().to(self.device)
385
+ self.target_shape = tuple()
386
+
387
+ self.optimizer = create_optimizer_v2(
388
+ self.model,
389
+ opt=kwargs.pop('opt', 'sgd'),
390
+ lr=kwargs.pop('lr', 1e-4))
391
+
392
+ if kwargs.pop('grad_checkpointing', False):
393
+ self.model.set_grad_checkpointing()
394
+
395
+ def _gen_target(self, batch_size):
396
+ return torch.empty(
397
+ (batch_size,) + self.target_shape, device=self.device, dtype=torch.long).random_(self.num_classes)
398
+
399
+ def run(self):
400
+ def _step(detail=False):
401
+ self.optimizer.zero_grad() # can this be ignored?
402
+ t_start = self.time_fn()
403
+ t_fwd_end = t_start
404
+ t_bwd_end = t_start
405
+ with self.amp_autocast():
406
+ output = self.model(self.example_inputs)
407
+ if isinstance(output, tuple):
408
+ output = output[0]
409
+ if detail:
410
+ t_fwd_end = self.time_fn(True)
411
+ target = self._gen_target(output.shape[0])
412
+ self.loss(output, target).backward()
413
+ if detail:
414
+ t_bwd_end = self.time_fn(True)
415
+ self.optimizer.step()
416
+ t_end = self.time_fn(True)
417
+ if detail:
418
+ delta_fwd = t_fwd_end - t_start
419
+ delta_bwd = t_bwd_end - t_fwd_end
420
+ delta_opt = t_end - t_bwd_end
421
+ return delta_fwd, delta_bwd, delta_opt
422
+ else:
423
+ delta_step = t_end - t_start
424
+ return delta_step
425
+
426
+ _logger.info(
427
+ f'Running train benchmark on {self.model_name} for {self.num_bench_iter} steps w/ '
428
+ f'input size {self.input_size} and batch size {self.batch_size}.')
429
+
430
+ self._init_input()
431
+
432
+ for _ in range(self.num_warm_iter):
433
+ _step()
434
+
435
+ t_run_start = self.time_fn()
436
+ if self.detail:
437
+ total_fwd = 0.
438
+ total_bwd = 0.
439
+ total_opt = 0.
440
+ num_samples = 0
441
+ for i in range(self.num_bench_iter):
442
+ delta_fwd, delta_bwd, delta_opt = _step(True)
443
+ num_samples += self.batch_size
444
+ total_fwd += delta_fwd
445
+ total_bwd += delta_bwd
446
+ total_opt += delta_opt
447
+ num_steps = (i + 1)
448
+ if num_steps % self.log_freq == 0:
449
+ total_step = total_fwd + total_bwd + total_opt
450
+ _logger.info(
451
+ f"Train [{num_steps}/{self.num_bench_iter}]."
452
+ f" {num_samples / total_step:0.2f} samples/sec."
453
+ f" {1000 * total_fwd / num_steps:0.3f} ms/step fwd,"
454
+ f" {1000 * total_bwd / num_steps:0.3f} ms/step bwd,"
455
+ f" {1000 * total_opt / num_steps:0.3f} ms/step opt."
456
+ )
457
+ total_step = total_fwd + total_bwd + total_opt
458
+ t_run_elapsed = self.time_fn() - t_run_start
459
+ results = dict(
460
+ samples_per_sec=round(num_samples / t_run_elapsed, 2),
461
+ step_time=round(1000 * total_step / self.num_bench_iter, 3),
462
+ fwd_time=round(1000 * total_fwd / self.num_bench_iter, 3),
463
+ bwd_time=round(1000 * total_bwd / self.num_bench_iter, 3),
464
+ opt_time=round(1000 * total_opt / self.num_bench_iter, 3),
465
+ batch_size=self.batch_size,
466
+ img_size=self.input_size[-1],
467
+ param_count=round(self.param_count / 1e6, 2),
468
+ )
469
+ else:
470
+ total_step = 0.
471
+ num_samples = 0
472
+ for i in range(self.num_bench_iter):
473
+ delta_step = _step(False)
474
+ num_samples += self.batch_size
475
+ total_step += delta_step
476
+ num_steps = (i + 1)
477
+ if num_steps % self.log_freq == 0:
478
+ _logger.info(
479
+ f"Train [{num_steps}/{self.num_bench_iter}]."
480
+ f" {num_samples / total_step:0.2f} samples/sec."
481
+ f" {1000 * total_step / num_steps:0.3f} ms/step.")
482
+ t_run_elapsed = self.time_fn() - t_run_start
483
+ results = dict(
484
+ samples_per_sec=round(num_samples / t_run_elapsed, 2),
485
+ step_time=round(1000 * total_step / self.num_bench_iter, 3),
486
+ batch_size=self.batch_size,
487
+ img_size=self.input_size[-1],
488
+ param_count=round(self.param_count / 1e6, 2),
489
+ )
490
+
491
+ _logger.info(
492
+ f"Train benchmark of {self.model_name} done. "
493
+ f"{results['samples_per_sec']:.2f} samples/sec, {results['step_time']:.2f} ms/sample")
494
+
495
+ return results
496
+
497
+
498
+ class ProfileRunner(BenchmarkRunner):
499
+
500
+ def __init__(self, model_name, device='cuda', profiler='', **kwargs):
501
+ super().__init__(model_name=model_name, device=device, **kwargs)
502
+ if not profiler:
503
+ if has_deepspeed_profiling:
504
+ profiler = 'deepspeed'
505
+ elif has_fvcore_profiling:
506
+ profiler = 'fvcore'
507
+ assert profiler, "One of deepspeed or fvcore needs to be installed for profiling to work."
508
+ self.profiler = profiler
509
+ self.model.eval()
510
+
511
+ def run(self):
512
+ _logger.info(
513
+ f'Running profiler on {self.model_name} w/ '
514
+ f'input size {self.input_size} and batch size {self.batch_size}.')
515
+
516
+ macs = 0
517
+ activations = 0
518
+ if self.profiler == 'deepspeed':
519
+ macs, _ = profile_deepspeed(self.model, self.input_size, batch_size=self.batch_size, detailed=True)
520
+ elif self.profiler == 'fvcore':
521
+ macs, activations = profile_fvcore(self.model, self.input_size, batch_size=self.batch_size, detailed=True)
522
+
523
+ results = dict(
524
+ gmacs=round(macs / 1e9, 2),
525
+ macts=round(activations / 1e6, 2),
526
+ batch_size=self.batch_size,
527
+ img_size=self.input_size[-1],
528
+ param_count=round(self.param_count / 1e6, 2),
529
+ )
530
+
531
+ _logger.info(
532
+ f"Profile of {self.model_name} done. "
533
+ f"{results['gmacs']:.2f} GMACs, {results['param_count']:.2f} M params.")
534
+
535
+ return results
536
+
537
+
538
+ def _try_run(
539
+ model_name,
540
+ bench_fn,
541
+ bench_kwargs,
542
+ initial_batch_size,
543
+ no_batch_size_retry=False
544
+ ):
545
+ batch_size = initial_batch_size
546
+ results = dict()
547
+ error_str = 'Unknown'
548
+ while batch_size:
549
+ try:
550
+ torch.cuda.empty_cache()
551
+ bench = bench_fn(model_name=model_name, batch_size=batch_size, **bench_kwargs)
552
+ results = bench.run()
553
+ return results
554
+ except RuntimeError as e:
555
+ error_str = str(e)
556
+ _logger.error(f'"{error_str}" while running benchmark.')
557
+ if not check_batch_size_retry(error_str):
558
+ _logger.error(f'Unrecoverable error encountered while benchmarking {model_name}, skipping.')
559
+ break
560
+ if no_batch_size_retry:
561
+ break
562
+ batch_size = decay_batch_step(batch_size)
563
+ _logger.warning(f'Reducing batch size to {batch_size} for retry.')
564
+ results['error'] = error_str
565
+ return results
566
+
567
+
568
+ def benchmark(args):
569
+ if args.amp:
570
+ _logger.warning("Overriding precision to 'amp' since --amp flag set.")
571
+ args.precision = 'amp' if args.amp_dtype == 'float16' else '_'.join(['amp', args.amp_dtype])
572
+ _logger.info(f'Benchmarking in {args.precision} precision. '
573
+ f'{"NHWC" if args.channels_last else "NCHW"} layout. '
574
+ f'torchscript {"enabled" if args.torchscript else "disabled"}')
575
+
576
+ bench_kwargs = vars(args).copy()
577
+ bench_kwargs.pop('amp')
578
+ model = bench_kwargs.pop('model')
579
+ batch_size = bench_kwargs.pop('batch_size')
580
+
581
+ bench_fns = (InferenceBenchmarkRunner,)
582
+ prefixes = ('infer',)
583
+ if args.bench == 'both':
584
+ bench_fns = (
585
+ InferenceBenchmarkRunner,
586
+ TrainBenchmarkRunner
587
+ )
588
+ prefixes = ('infer', 'train')
589
+ elif args.bench == 'train':
590
+ bench_fns = TrainBenchmarkRunner,
591
+ prefixes = 'train',
592
+ elif args.bench.startswith('profile'):
593
+ # specific profiler used if included in bench mode string, otherwise default to deepspeed, fallback to fvcore
594
+ if 'deepspeed' in args.bench:
595
+ assert has_deepspeed_profiling, "deepspeed must be installed to use deepspeed flop counter"
596
+ bench_kwargs['profiler'] = 'deepspeed'
597
+ elif 'fvcore' in args.bench:
598
+ assert has_fvcore_profiling, "fvcore must be installed to use fvcore flop counter"
599
+ bench_kwargs['profiler'] = 'fvcore'
600
+ bench_fns = ProfileRunner,
601
+ batch_size = 1
602
+
603
+ model_results = OrderedDict(model=model)
604
+ for prefix, bench_fn in zip(prefixes, bench_fns):
605
+ run_results = _try_run(
606
+ model,
607
+ bench_fn,
608
+ bench_kwargs=bench_kwargs,
609
+ initial_batch_size=batch_size,
610
+ no_batch_size_retry=args.no_retry,
611
+ )
612
+ if prefix and 'error' not in run_results:
613
+ run_results = {'_'.join([prefix, k]): v for k, v in run_results.items()}
614
+ model_results.update(run_results)
615
+ if 'error' in run_results:
616
+ break
617
+ if 'error' not in model_results:
618
+ param_count = model_results.pop('infer_param_count', model_results.pop('train_param_count', 0))
619
+ model_results.setdefault('param_count', param_count)
620
+ model_results.pop('train_param_count', 0)
621
+ return model_results
622
+
623
+
624
+ def main():
625
+ setup_default_logging()
626
+ args = parser.parse_args()
627
+ model_cfgs = []
628
+ model_names = []
629
+
630
+ if args.fast_norm:
631
+ set_fast_norm()
632
+
633
+ if args.model_list:
634
+ args.model = ''
635
+ with open(args.model_list) as f:
636
+ model_names = [line.rstrip() for line in f]
637
+ model_cfgs = [(n, None) for n in model_names]
638
+ elif args.model == 'all':
639
+ # validate all models in a list of names with pretrained checkpoints
640
+ args.pretrained = True
641
+ model_names = list_models(pretrained=True, exclude_filters=['*in21k'])
642
+ model_cfgs = [(n, None) for n in model_names]
643
+ elif not is_model(args.model):
644
+ # model name doesn't exist, try as wildcard filter
645
+ model_names = list_models(args.model)
646
+ model_cfgs = [(n, None) for n in model_names]
647
+
648
+ if len(model_cfgs):
649
+ _logger.info('Running bulk validation on these pretrained models: {}'.format(', '.join(model_names)))
650
+ results = []
651
+ try:
652
+ for m, _ in model_cfgs:
653
+ if not m:
654
+ continue
655
+ args.model = m
656
+ r = benchmark(args)
657
+ if r:
658
+ results.append(r)
659
+ time.sleep(10)
660
+ except KeyboardInterrupt as e:
661
+ pass
662
+ sort_key = 'infer_samples_per_sec'
663
+ if 'train' in args.bench:
664
+ sort_key = 'train_samples_per_sec'
665
+ elif 'profile' in args.bench:
666
+ sort_key = 'infer_gmacs'
667
+ results = filter(lambda x: sort_key in x, results)
668
+ results = sorted(results, key=lambda x: x[sort_key], reverse=True)
669
+ else:
670
+ results = benchmark(args)
671
+
672
+ if args.results_file:
673
+ write_results(args.results_file, results, format=args.results_format)
674
+
675
+ # output results in JSON to stdout w/ delimiter for runner script
676
+ print(f'--result\n{json.dumps(results, indent=4)}')
677
+
678
+
679
+ def write_results(results_file, results, format='csv'):
680
+ with open(results_file, mode='w') as cf:
681
+ if format == 'json':
682
+ json.dump(results, cf, indent=4)
683
+ else:
684
+ if not isinstance(results, (list, tuple)):
685
+ results = [results]
686
+ if not results:
687
+ return
688
+ dw = csv.DictWriter(cf, fieldnames=results[0].keys())
689
+ dw.writeheader()
690
+ for r in results:
691
+ dw.writerow(r)
692
+ cf.flush()
693
+
694
+
695
+ if __name__ == '__main__':
696
+ main()
bulk_runner.py ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """ Bulk Model Script Runner
3
+
4
+ Run validation or benchmark script in separate process for each model
5
+
6
+ Benchmark all 'vit*' models:
7
+ python bulk_runner.py --model-list 'vit*' --results-file vit_bench.csv benchmark.py --amp -b 512
8
+
9
+ Validate all models:
10
+ python bulk_runner.py --model-list all --results-file val.csv --pretrained validate.py /imagenet/validation/ --amp -b 512 --retry
11
+
12
+ Hacked together by Ross Wightman (https://github.com/rwightman)
13
+ """
14
+ import argparse
15
+ import os
16
+ import sys
17
+ import csv
18
+ import json
19
+ import subprocess
20
+ import time
21
+ from typing import Callable, List, Tuple, Union
22
+
23
+
24
+ from timm.models import is_model, list_models
25
+
26
+
27
+ parser = argparse.ArgumentParser(description='Per-model process launcher')
28
+
29
+ # model and results args
30
+ parser.add_argument(
31
+ '--model-list', metavar='NAME', default='',
32
+ help='txt file based list of model names to benchmark')
33
+ parser.add_argument(
34
+ '--results-file', default='', type=str, metavar='FILENAME',
35
+ help='Output csv file for validation results (summary)')
36
+ parser.add_argument(
37
+ '--sort-key', default='', type=str, metavar='COL',
38
+ help='Specify sort key for results csv')
39
+ parser.add_argument(
40
+ "--pretrained", action='store_true',
41
+ help="only run models with pretrained weights")
42
+
43
+ parser.add_argument(
44
+ "--delay",
45
+ type=float,
46
+ default=0,
47
+ help="Interval, in seconds, to delay between model invocations.",
48
+ )
49
+ parser.add_argument(
50
+ "--start_method", type=str, default="spawn", choices=["spawn", "fork", "forkserver"],
51
+ help="Multiprocessing start method to use when creating workers.",
52
+ )
53
+ parser.add_argument(
54
+ "--no_python",
55
+ help="Skip prepending the script with 'python' - just execute it directly. Useful "
56
+ "when the script is not a Python script.",
57
+ )
58
+ parser.add_argument(
59
+ "-m",
60
+ "--module",
61
+ help="Change each process to interpret the launch script as a Python module, executing "
62
+ "with the same behavior as 'python -m'.",
63
+ )
64
+
65
+ # positional
66
+ parser.add_argument(
67
+ "script", type=str,
68
+ help="Full path to the program/script to be launched for each model config.",
69
+ )
70
+ parser.add_argument("script_args", nargs=argparse.REMAINDER)
71
+
72
+
73
+ def cmd_from_args(args) -> Tuple[Union[Callable, str], List[str]]:
74
+ # If ``args`` not passed, defaults to ``sys.argv[:1]``
75
+ with_python = not args.no_python
76
+ cmd: Union[Callable, str]
77
+ cmd_args = []
78
+ if with_python:
79
+ cmd = os.getenv("PYTHON_EXEC", sys.executable)
80
+ cmd_args.append("-u")
81
+ if args.module:
82
+ cmd_args.append("-m")
83
+ cmd_args.append(args.script)
84
+ else:
85
+ if args.module:
86
+ raise ValueError(
87
+ "Don't use both the '--no_python' flag"
88
+ " and the '--module' flag at the same time."
89
+ )
90
+ cmd = args.script
91
+ cmd_args.extend(args.script_args)
92
+
93
+ return cmd, cmd_args
94
+
95
+
96
+ def main():
97
+ args = parser.parse_args()
98
+ cmd, cmd_args = cmd_from_args(args)
99
+
100
+ model_cfgs = []
101
+ model_names = []
102
+ if args.model_list == 'all':
103
+ # NOTE should make this config, for validation / benchmark runs the focus is 1k models,
104
+ # so we filter out 21/22k and some other unusable heads. This will change in the future...
105
+ exclude_model_filters = ['*in21k', '*in22k', '*dino', '*_22k']
106
+ model_names = list_models(
107
+ pretrained=args.pretrained, # only include models w/ pretrained checkpoints if set
108
+ exclude_filters=exclude_model_filters
109
+ )
110
+ model_cfgs = [(n, None) for n in model_names]
111
+ elif not is_model(args.model_list):
112
+ # model name doesn't exist, try as wildcard filter
113
+ model_names = list_models(args.model_list)
114
+ model_cfgs = [(n, None) for n in model_names]
115
+
116
+ if not model_cfgs and os.path.exists(args.model_list):
117
+ with open(args.model_list) as f:
118
+ model_names = [line.rstrip() for line in f]
119
+ model_cfgs = [(n, None) for n in model_names]
120
+
121
+ if len(model_cfgs):
122
+ results_file = args.results_file or './results.csv'
123
+ results = []
124
+ errors = []
125
+ print('Running script on these models: {}'.format(', '.join(model_names)))
126
+ if not args.sort_key:
127
+ if 'benchmark' in args.script:
128
+ if any(['train' in a for a in args.script_args]):
129
+ sort_key = 'train_samples_per_sec'
130
+ else:
131
+ sort_key = 'infer_samples_per_sec'
132
+ else:
133
+ sort_key = 'top1'
134
+ else:
135
+ sort_key = args.sort_key
136
+ print(f'Script: {args.script}, Args: {args.script_args}, Sort key: {sort_key}')
137
+
138
+ try:
139
+ for m, _ in model_cfgs:
140
+ if not m:
141
+ continue
142
+ args_str = (cmd, *[str(e) for e in cmd_args], '--model', m)
143
+ try:
144
+ o = subprocess.check_output(args=args_str).decode('utf-8').split('--result')[-1]
145
+ r = json.loads(o)
146
+ results.append(r)
147
+ except Exception as e:
148
+ # FIXME batch_size retry loop is currently done in either validation.py or benchmark.py
149
+ # for further robustness (but more overhead), we may want to manage that by looping here...
150
+ errors.append(dict(model=m, error=str(e)))
151
+ if args.delay:
152
+ time.sleep(args.delay)
153
+ except KeyboardInterrupt as e:
154
+ pass
155
+
156
+ errors.extend(list(filter(lambda x: 'error' in x, results)))
157
+ if errors:
158
+ print(f'{len(errors)} models had errors during run.')
159
+ for e in errors:
160
+ print(f"\t {e['model']} ({e.get('error', 'Unknown')})")
161
+ results = list(filter(lambda x: 'error' not in x, results))
162
+
163
+ no_sortkey = list(filter(lambda x: sort_key not in x, results))
164
+ if no_sortkey:
165
+ print(f'{len(no_sortkey)} results missing sort key, skipping sort.')
166
+ else:
167
+ results = sorted(results, key=lambda x: x[sort_key], reverse=True)
168
+
169
+ if len(results):
170
+ print(f'{len(results)} models run successfully. Saving results to {results_file}.')
171
+ write_results(results_file, results)
172
+
173
+
174
+ def write_results(results_file, results):
175
+ with open(results_file, mode='w') as cf:
176
+ dw = csv.DictWriter(cf, fieldnames=results[0].keys())
177
+ dw.writeheader()
178
+ for r in results:
179
+ dw.writerow(r)
180
+ cf.flush()
181
+
182
+
183
+ if __name__ == '__main__':
184
+ main()
clean_checkpoint.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """ Checkpoint Cleaning Script
3
+
4
+ Takes training checkpoints with GPU tensors, optimizer state, extra dict keys, etc.
5
+ and outputs a CPU tensor checkpoint with only the `state_dict` along with SHA256
6
+ calculation for model zoo compatibility.
7
+
8
+ Hacked together by / Copyright 2020 Ross Wightman (https://github.com/rwightman)
9
+ """
10
+ import torch
11
+ import argparse
12
+ import os
13
+ import hashlib
14
+ import shutil
15
+ import tempfile
16
+ from timm.models import load_state_dict
17
+ try:
18
+ import safetensors.torch
19
+ _has_safetensors = True
20
+ except ImportError:
21
+ _has_safetensors = False
22
+
23
+ parser = argparse.ArgumentParser(description='PyTorch Checkpoint Cleaner')
24
+ parser.add_argument('--checkpoint', default='', type=str, metavar='PATH',
25
+ help='path to latest checkpoint (default: none)')
26
+ parser.add_argument('--output', default='', type=str, metavar='PATH',
27
+ help='output path')
28
+ parser.add_argument('--no-use-ema', dest='no_use_ema', action='store_true',
29
+ help='use ema version of weights if present')
30
+ parser.add_argument('--no-hash', dest='no_hash', action='store_true',
31
+ help='no hash in output filename')
32
+ parser.add_argument('--clean-aux-bn', dest='clean_aux_bn', action='store_true',
33
+ help='remove auxiliary batch norm layers (from SplitBN training) from checkpoint')
34
+ parser.add_argument('--safetensors', action='store_true',
35
+ help='Save weights using safetensors instead of the default torch way (pickle).')
36
+
37
+
38
+ def main():
39
+ args = parser.parse_args()
40
+
41
+ if os.path.exists(args.output):
42
+ print("Error: Output filename ({}) already exists.".format(args.output))
43
+ exit(1)
44
+
45
+ clean_checkpoint(
46
+ args.checkpoint,
47
+ args.output,
48
+ not args.no_use_ema,
49
+ args.no_hash,
50
+ args.clean_aux_bn,
51
+ safe_serialization=args.safetensors,
52
+ )
53
+
54
+
55
+ def clean_checkpoint(
56
+ checkpoint,
57
+ output,
58
+ use_ema=True,
59
+ no_hash=False,
60
+ clean_aux_bn=False,
61
+ safe_serialization: bool=False,
62
+ ):
63
+ # Load an existing checkpoint to CPU, strip everything but the state_dict and re-save
64
+ if checkpoint and os.path.isfile(checkpoint):
65
+ print("=> Loading checkpoint '{}'".format(checkpoint))
66
+ state_dict = load_state_dict(checkpoint, use_ema=use_ema)
67
+ new_state_dict = {}
68
+ for k, v in state_dict.items():
69
+ if clean_aux_bn and 'aux_bn' in k:
70
+ # If all aux_bn keys are removed, the SplitBN layers will end up as normal and
71
+ # load with the unmodified model using BatchNorm2d.
72
+ continue
73
+ name = k[7:] if k.startswith('module.') else k
74
+ new_state_dict[name] = v
75
+ print("=> Loaded state_dict from '{}'".format(checkpoint))
76
+
77
+ ext = ''
78
+ if output:
79
+ checkpoint_root, checkpoint_base = os.path.split(output)
80
+ checkpoint_base, ext = os.path.splitext(checkpoint_base)
81
+ else:
82
+ checkpoint_root = ''
83
+ checkpoint_base = os.path.split(checkpoint)[1]
84
+ checkpoint_base = os.path.splitext(checkpoint_base)[0]
85
+
86
+ temp_filename = '__' + checkpoint_base
87
+ if safe_serialization:
88
+ assert _has_safetensors, "`pip install safetensors` to use .safetensors"
89
+ safetensors.torch.save_file(new_state_dict, temp_filename)
90
+ else:
91
+ torch.save(new_state_dict, temp_filename)
92
+
93
+ with open(temp_filename, 'rb') as f:
94
+ sha_hash = hashlib.sha256(f.read()).hexdigest()
95
+
96
+ if ext:
97
+ final_ext = ext
98
+ else:
99
+ final_ext = ('.safetensors' if safe_serialization else '.pth')
100
+
101
+ if no_hash:
102
+ final_filename = checkpoint_base + final_ext
103
+ else:
104
+ final_filename = '-'.join([checkpoint_base, sha_hash[:8]]) + final_ext
105
+
106
+ shutil.move(temp_filename, os.path.join(checkpoint_root, final_filename))
107
+ print("=> Saved state_dict to '{}, SHA256: {}'".format(final_filename, sha_hash))
108
+ return final_filename
109
+ else:
110
+ print("Error: Checkpoint ({}) doesn't exist".format(checkpoint))
111
+ return ''
112
+
113
+
114
+ if __name__ == '__main__':
115
+ main()
convert/convert_from_mxnet.py ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import hashlib
3
+ import os
4
+
5
+ import mxnet as mx
6
+ import gluoncv
7
+ import torch
8
+ from timm import create_model
9
+
10
+ parser = argparse.ArgumentParser(description='Convert from MXNet')
11
+ parser.add_argument('--model', default='all', type=str, metavar='MODEL',
12
+ help='Name of model to train (default: "all"')
13
+
14
+
15
+ def convert(mxnet_name, torch_name):
16
+ # download and load the pre-trained model
17
+ net = gluoncv.model_zoo.get_model(mxnet_name, pretrained=True)
18
+
19
+ # create corresponding torch model
20
+ torch_net = create_model(torch_name)
21
+
22
+ mxp = [(k, v) for k, v in net.collect_params().items() if 'running' not in k]
23
+ torchp = list(torch_net.named_parameters())
24
+ torch_params = {}
25
+
26
+ # convert parameters
27
+ # NOTE: we are relying on the fact that the order of parameters
28
+ # are usually exactly the same between these models, thus no key name mapping
29
+ # is necessary. Asserts will trip if this is not the case.
30
+ for (tn, tv), (mn, mv) in zip(torchp, mxp):
31
+ m_split = mn.split('_')
32
+ t_split = tn.split('.')
33
+ print(t_split, m_split)
34
+ print(tv.shape, mv.shape)
35
+
36
+ # ensure ordering of BN params match since their sizes are not specific
37
+ if m_split[-1] == 'gamma':
38
+ assert t_split[-1] == 'weight'
39
+ if m_split[-1] == 'beta':
40
+ assert t_split[-1] == 'bias'
41
+
42
+ # ensure shapes match
43
+ assert all(t == m for t, m in zip(tv.shape, mv.shape))
44
+
45
+ torch_tensor = torch.from_numpy(mv.data().asnumpy())
46
+ torch_params[tn] = torch_tensor
47
+
48
+ # convert buffers (batch norm running stats)
49
+ mxb = [(k, v) for k, v in net.collect_params().items() if any(x in k for x in ['running_mean', 'running_var'])]
50
+ torchb = [(k, v) for k, v in torch_net.named_buffers() if 'num_batches' not in k]
51
+ for (tn, tv), (mn, mv) in zip(torchb, mxb):
52
+ print(tn, mn)
53
+ print(tv.shape, mv.shape)
54
+
55
+ # ensure ordering of BN params match since their sizes are not specific
56
+ if 'running_var' in tn:
57
+ assert 'running_var' in mn
58
+ if 'running_mean' in tn:
59
+ assert 'running_mean' in mn
60
+
61
+ torch_tensor = torch.from_numpy(mv.data().asnumpy())
62
+ torch_params[tn] = torch_tensor
63
+
64
+ torch_net.load_state_dict(torch_params)
65
+ torch_filename = './%s.pth' % torch_name
66
+ torch.save(torch_net.state_dict(), torch_filename)
67
+ with open(torch_filename, 'rb') as f:
68
+ sha_hash = hashlib.sha256(f.read()).hexdigest()
69
+ final_filename = os.path.splitext(torch_filename)[0] + '-' + sha_hash[:8] + '.pth'
70
+ os.rename(torch_filename, final_filename)
71
+ print("=> Saved converted model to '{}, SHA256: {}'".format(final_filename, sha_hash))
72
+
73
+
74
+ def map_mx_to_torch_model(mx_name):
75
+ torch_name = mx_name.lower()
76
+ if torch_name.startswith('se_'):
77
+ torch_name = torch_name.replace('se_', 'se')
78
+ elif torch_name.startswith('senet_'):
79
+ torch_name = torch_name.replace('senet_', 'senet')
80
+ elif torch_name.startswith('inceptionv3'):
81
+ torch_name = torch_name.replace('inceptionv3', 'inception_v3')
82
+ torch_name = 'gluon_' + torch_name
83
+ return torch_name
84
+
85
+
86
+ ALL = ['resnet18_v1b', 'resnet34_v1b', 'resnet50_v1b', 'resnet101_v1b', 'resnet152_v1b',
87
+ 'resnet50_v1c', 'resnet101_v1c', 'resnet152_v1c', 'resnet50_v1d', 'resnet101_v1d', 'resnet152_v1d',
88
+ #'resnet50_v1e', 'resnet101_v1e', 'resnet152_v1e',
89
+ 'resnet50_v1s', 'resnet101_v1s', 'resnet152_v1s', 'resnext50_32x4d', 'resnext101_32x4d', 'resnext101_64x4d',
90
+ 'se_resnext50_32x4d', 'se_resnext101_32x4d', 'se_resnext101_64x4d', 'senet_154', 'inceptionv3']
91
+
92
+
93
+ def main():
94
+ args = parser.parse_args()
95
+
96
+ if not args.model or args.model == 'all':
97
+ for mx_model in ALL:
98
+ torch_model = map_mx_to_torch_model(mx_model)
99
+ convert(mx_model, torch_model)
100
+ else:
101
+ mx_model = args.model
102
+ torch_model = map_mx_to_torch_model(mx_model)
103
+ convert(mx_model, torch_model)
104
+
105
+
106
+ if __name__ == '__main__':
107
+ main()
convert/convert_nest_flax.py ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Convert weights from https://github.com/google-research/nested-transformer
3
+ NOTE: You'll need https://github.com/google/CommonLoopUtils, not included in requirements.txt
4
+ """
5
+
6
+ import sys
7
+
8
+ import numpy as np
9
+ import torch
10
+
11
+ from clu import checkpoint
12
+
13
+
14
+ arch_depths = {
15
+ 'nest_base': [2, 2, 20],
16
+ 'nest_small': [2, 2, 20],
17
+ 'nest_tiny': [2, 2, 8],
18
+ }
19
+
20
+
21
+ def convert_nest(checkpoint_path, arch):
22
+ """
23
+ Expects path to checkpoint which is a dir containing 4 files like in each of these folders
24
+ - https://console.cloud.google.com/storage/browser/gresearch/nest-checkpoints
25
+ `arch` is needed to
26
+ Returns a state dict that can be used with `torch.nn.Module.load_state_dict`
27
+ Hint: Follow timm.models.nest.Nest.__init__ and
28
+ https://github.com/google-research/nested-transformer/blob/main/models/nest_net.py
29
+ """
30
+ assert arch in ['nest_base', 'nest_small', 'nest_tiny'], "Your `arch` is not supported"
31
+
32
+ flax_dict = checkpoint.load_state_dict(checkpoint_path)['optimizer']['target']
33
+ state_dict = {}
34
+
35
+ # Patch embedding
36
+ state_dict['patch_embed.proj.weight'] = torch.tensor(
37
+ flax_dict['PatchEmbedding_0']['Conv_0']['kernel']).permute(3, 2, 0, 1)
38
+ state_dict['patch_embed.proj.bias'] = torch.tensor(flax_dict['PatchEmbedding_0']['Conv_0']['bias'])
39
+
40
+ # Positional embeddings
41
+ posemb_keys = [k for k in flax_dict.keys() if k.startswith('PositionEmbedding')]
42
+ for i, k in enumerate(posemb_keys):
43
+ state_dict[f'levels.{i}.pos_embed'] = torch.tensor(flax_dict[k]['pos_embedding'])
44
+
45
+ # Transformer encoders
46
+ depths = arch_depths[arch]
47
+ for level in range(len(depths)):
48
+ for layer in range(depths[level]):
49
+ global_layer_ix = sum(depths[:level]) + layer
50
+ # Norms
51
+ for i in range(2):
52
+ state_dict[f'levels.{level}.transformer_encoder.{layer}.norm{i+1}.weight'] = torch.tensor(
53
+ flax_dict[f'EncoderNDBlock_{global_layer_ix}'][f'LayerNorm_{i}']['scale'])
54
+ state_dict[f'levels.{level}.transformer_encoder.{layer}.norm{i+1}.bias'] = torch.tensor(
55
+ flax_dict[f'EncoderNDBlock_{global_layer_ix}'][f'LayerNorm_{i}']['bias'])
56
+ # Attention qkv
57
+ w_q = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['DenseGeneral_0']['kernel']
58
+ w_kv = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['DenseGeneral_1']['kernel']
59
+ # Pay attention to dims here (maybe get pen and paper)
60
+ w_kv = np.concatenate(np.split(w_kv, 2, -1), 1)
61
+ w_qkv = np.concatenate([w_q, w_kv], 1)
62
+ state_dict[f'levels.{level}.transformer_encoder.{layer}.attn.qkv.weight'] = torch.tensor(w_qkv).flatten(1).permute(1,0)
63
+ b_q = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['DenseGeneral_0']['bias']
64
+ b_kv = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['DenseGeneral_1']['bias']
65
+ # Pay attention to dims here (maybe get pen and paper)
66
+ b_kv = np.concatenate(np.split(b_kv, 2, -1), 0)
67
+ b_qkv = np.concatenate([b_q, b_kv], 0)
68
+ state_dict[f'levels.{level}.transformer_encoder.{layer}.attn.qkv.bias'] = torch.tensor(b_qkv).reshape(-1)
69
+ # Attention proj
70
+ w_proj = flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['proj_kernel']
71
+ w_proj = torch.tensor(w_proj).permute(2, 1, 0).flatten(1)
72
+ state_dict[f'levels.{level}.transformer_encoder.{layer}.attn.proj.weight'] = w_proj
73
+ state_dict[f'levels.{level}.transformer_encoder.{layer}.attn.proj.bias'] = torch.tensor(
74
+ flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MultiHeadAttention_0']['bias'])
75
+ # MLP
76
+ for i in range(2):
77
+ state_dict[f'levels.{level}.transformer_encoder.{layer}.mlp.fc{i+1}.weight'] = torch.tensor(
78
+ flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MlpBlock_0'][f'Dense_{i}']['kernel']).permute(1, 0)
79
+ state_dict[f'levels.{level}.transformer_encoder.{layer}.mlp.fc{i+1}.bias'] = torch.tensor(
80
+ flax_dict[f'EncoderNDBlock_{global_layer_ix}']['MlpBlock_0'][f'Dense_{i}']['bias'])
81
+
82
+ # Block aggregations (ConvPool)
83
+ for level in range(1, len(depths)):
84
+ # Convs
85
+ state_dict[f'levels.{level}.pool.conv.weight'] = torch.tensor(
86
+ flax_dict[f'ConvPool_{level-1}']['Conv_0']['kernel']).permute(3, 2, 0, 1)
87
+ state_dict[f'levels.{level}.pool.conv.bias'] = torch.tensor(
88
+ flax_dict[f'ConvPool_{level-1}']['Conv_0']['bias'])
89
+ # Norms
90
+ state_dict[f'levels.{level}.pool.norm.weight'] = torch.tensor(
91
+ flax_dict[f'ConvPool_{level-1}']['LayerNorm_0']['scale'])
92
+ state_dict[f'levels.{level}.pool.norm.bias'] = torch.tensor(
93
+ flax_dict[f'ConvPool_{level-1}']['LayerNorm_0']['bias'])
94
+
95
+ # Final norm
96
+ state_dict[f'norm.weight'] = torch.tensor(flax_dict['LayerNorm_0']['scale'])
97
+ state_dict[f'norm.bias'] = torch.tensor(flax_dict['LayerNorm_0']['bias'])
98
+
99
+ # Classifier
100
+ state_dict['head.weight'] = torch.tensor(flax_dict['Dense_0']['kernel']).permute(1, 0)
101
+ state_dict['head.bias'] = torch.tensor(flax_dict['Dense_0']['bias'])
102
+
103
+ return state_dict
104
+
105
+
106
+ if __name__ == '__main__':
107
+ variant = sys.argv[1] # base, small, or tiny
108
+ state_dict = convert_nest(f'./nest-{variant[0]}_imagenet', f'nest_{variant}')
109
+ torch.save(state_dict, f'./jx_nest_{variant}.pth')
demo.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import os, cv2, time, math
3
+ print("=> Loading libraries...")
4
+ start = time.time()
5
+
6
+ import requests, torch
7
+ import gradio as gr
8
+ from torchvision import transforms
9
+ from datasets import load_dataset
10
+ from timm.data import create_transform
11
+ from timm.models import create_model, load_checkpoint
12
+ from pytorch_grad_cam import GradCAM
13
+ from pytorch_grad_cam.utils.image import show_cam_on_image
14
+
15
+
16
+ print(f"=> Libraries loaded in {time.time()- start:.2f} sec(s).")
17
+ print("=> Loading model...")
18
+ start = time.time()
19
+
20
+ size = "b"
21
+ img_size = 224
22
+ crop_pct = 0.9
23
+ IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
24
+ IMAGENET_DEFAULT_STD = (0.229, 0.224, 0.225)
25
+
26
+ model = create_model(f"tpmlp_{size}").cuda()
27
+ load_checkpoint(model, f"../tpmlp_{size}.pth.tar", True)
28
+ model.eval()
29
+
30
+ response = requests.get("https://git.io/JJkYN")
31
+ labels = response.text.split("\n")
32
+
33
+ augs = create_transform(
34
+ input_size=(3, 224, 224),
35
+ is_training=False,
36
+ use_prefetcher=False,
37
+ crop_pct=0.9,
38
+ )
39
+
40
+
41
+ scale_size = math.floor(img_size / crop_pct)
42
+ resize = transforms.Compose([
43
+ transforms.Resize(scale_size),
44
+ transforms.CenterCrop(img_size),
45
+ transforms.ToTensor()
46
+ ])
47
+ normalize = transforms.Normalize(mean=torch.tensor(IMAGENET_DEFAULT_MEAN), std=torch.tensor(IMAGENET_DEFAULT_STD))
48
+
49
+ def transform(img):
50
+ img = resize(img.convert("RGB"))
51
+ tensor = normalize(img)
52
+ return img, tensor
53
+
54
+ def predict(inp):
55
+ img, inp = transform(inp)
56
+ inp = inp.unsqueeze(0)
57
+ with GradCAM(model=model, target_layers=[model.layers[3]], use_cuda=True) as cam:
58
+ grayscale_cam, probs = cam(input_tensor=inp, aug_smooth=False, eigen_smooth=False, return_probs=True)
59
+
60
+ # Here grayscale_cam has only one image in the batch
61
+ grayscale_cam = grayscale_cam[0, :]
62
+ probs = probs[0, :]
63
+
64
+ cam_image = show_cam_on_image(img.permute(1, 2, 0).detach().cpu().numpy(), grayscale_cam, use_rgb=True, image_weight=0.5, colormap=cv2.COLORMAP_TWILIGHT_SHIFTED)
65
+ confidences = {labels[i]: float(probs[i]) for i in range(1000)}
66
+ return confidences, cam_image
67
+
68
+ print(f"=> Model (tpmlp_{size}) loaded in {time.time()- start:.2f} sec(s).")
69
+
70
+ if not os.path.isdir("../example-imgs"):
71
+ os.mkdir("../example-imgs")
72
+
73
+ print("=> Loading examples.")
74
+ indices = [
75
+ 0, # Coucal
76
+ 2, # Volcano
77
+ 7, # Sombrero
78
+ 9, # Balance beam
79
+ 10, # Sulphur-crested cockatoo
80
+ 11, # Shower cap
81
+ 12, # Petri dish INCORRECTLY CLASSIFIED as lens
82
+ 14, # Angora rabbit
83
+ ]
84
+ ds = load_dataset("imagenet-1k", split="validation", streaming=True)
85
+ examples = []; idx = 0
86
+ start = time.time()
87
+ for data in ds:
88
+ if idx == indices:
89
+ data['image'].save(f"../example-imgs/{idx}.png")
90
+ idx += 1
91
+ if idx == max(indices):
92
+ break
93
+ del ds
94
+ print(f"=> Examples loaded in {time.time()- start:.2f} sec(s).")
95
+
96
+ # demo = gr.Interface(
97
+ # fn=predict,
98
+ # inputs=gr.inputs.Image(type="pil"),
99
+ # outputs=[gr.outputs.Label(num_top_classes=4), gr.outputs.Image(type="numpy")],
100
+ # examples=[f"../example-imgs/{idx}.png" for idx in indices],
101
+ # )
102
+
103
+
104
+ with gr.Blocks(theme=gr.themes.Monochrome(font=[gr.themes.GoogleFont("DM Sans"), "sans-serif"])) as demo:
105
+ gr.HTML("""
106
+ <h1 align="center">Interactive Demo</h1>
107
+ <h2 align="center">CS-Mixer: A Cross-Scale Vision MLP Model with Spatial–Channel Mixing</h2>
108
+ <br><br>
109
+ """)
110
+ with gr.Row():
111
+ input_image = gr.Image(type="pil", min_width=300, label="Input Image")
112
+ softmax = gr.Label(num_top_classes=4, min_width=200, label="Model Predictions")
113
+ grad_cam = gr.Image(type="numpy", min_width=300, label="Grad-CAM")
114
+ with gr.Row():
115
+ gr.Button("Predict").click(fn=predict, inputs=input_image, outputs=[softmax, grad_cam])
116
+ gr.ClearButton(input_image)
117
+ with gr.Row():
118
+ gr.Examples([f"../example-imgs/{idx}.png" for idx in indices], inputs=input_image, outputs=[softmax, grad_cam], fn=predict, run_on_click=True)
119
+
120
+ demo.launch(share=True, allowed_paths=["../example-imgs"])
distributed_train.sh ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ NUM_PROC=$1
3
+ shift
4
+ torchrun --nproc_per_node=$NUM_PROC train.py "$@"
5
+
docs/archived_changes.md ADDED
@@ -0,0 +1,406 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Archived Changes
2
+
3
+ ### Nov 22, 2021
4
+ * A number of updated weights anew new model defs
5
+ * `eca_halonext26ts` - 79.5 @ 256
6
+ * `resnet50_gn` (new) - 80.1 @ 224, 81.3 @ 288
7
+ * `resnet50` - 80.7 @ 224, 80.9 @ 288 (trained at 176, not replacing current a1 weights as default since these don't scale as well to higher res, [weights](https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1h2_176-001a1197.pth))
8
+ * `resnext50_32x4d` - 81.1 @ 224, 82.0 @ 288
9
+ * `sebotnet33ts_256` (new) - 81.2 @ 224
10
+ * `lamhalobotnet50ts_256` - 81.5 @ 256
11
+ * `halonet50ts` - 81.7 @ 256
12
+ * `halo2botnet50ts_256` - 82.0 @ 256
13
+ * `resnet101` - 82.0 @ 224, 82.8 @ 288
14
+ * `resnetv2_101` (new) - 82.1 @ 224, 83.0 @ 288
15
+ * `resnet152` - 82.8 @ 224, 83.5 @ 288
16
+ * `regnetz_d8` (new) - 83.5 @ 256, 84.0 @ 320
17
+ * `regnetz_e8` (new) - 84.5 @ 256, 85.0 @ 320
18
+ * `vit_base_patch8_224` (85.8 top-1) & `in21k` variant weights added thanks [Martins Bruveris](https://github.com/martinsbruveris)
19
+ * Groundwork in for FX feature extraction thanks to [Alexander Soare](https://github.com/alexander-soare)
20
+ * models updated for tracing compatibility (almost full support with some distlled transformer exceptions)
21
+
22
+ ### Oct 19, 2021
23
+ * ResNet strikes back (https://arxiv.org/abs/2110.00476) weights added, plus any extra training components used. Model weights and some more details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-rsb-weights)
24
+ * BCE loss and Repeated Augmentation support for RSB paper
25
+ * 4 series of ResNet based attention model experiments being added (implemented across byobnet.py/byoanet.py). These include all sorts of attention, from channel attn like SE, ECA to 2D QKV self-attention layers such as Halo, Bottlneck, Lambda. Details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
26
+ * Working implementations of the following 2D self-attention modules (likely to be differences from paper or eventual official impl):
27
+ * Halo (https://arxiv.org/abs/2103.12731)
28
+ * Bottleneck Transformer (https://arxiv.org/abs/2101.11605)
29
+ * LambdaNetworks (https://arxiv.org/abs/2102.08602)
30
+ * A RegNetZ series of models with some attention experiments (being added to). These do not follow the paper (https://arxiv.org/abs/2103.06877) in any way other than block architecture, details of official models are not available. See more here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
31
+ * ConvMixer (https://openreview.net/forum?id=TVHS5Y4dNvM), CrossVit (https://arxiv.org/abs/2103.14899), and BeiT (https://arxiv.org/abs/2106.08254) architectures + weights added
32
+ * freeze/unfreeze helpers by [Alexander Soare](https://github.com/alexander-soare)
33
+
34
+ ### Aug 18, 2021
35
+ * Optimizer bonanza!
36
+ * Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ `timm bits` [branch](https://github.com/rwightman/pytorch-image-models/tree/bits_and_tpu/timm/bits))
37
+ * Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)
38
+ * Some cleanup on all optimizers and factory. No more `.data`, a bit more consistency, unit tests for all!
39
+ * SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).
40
+ * EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.
41
+ * Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.
42
+
43
+ ### July 12, 2021
44
+ * Add XCiT models from [official facebook impl](https://github.com/facebookresearch/xcit). Contributed by [Alexander Soare](https://github.com/alexander-soare)
45
+
46
+ ### July 5-9, 2021
47
+ * Add `efficientnetv2_rw_t` weights, a custom 'tiny' 13.6M param variant that is a bit better than (non NoisyStudent) B3 models. Both faster and better accuracy (at same or lower res)
48
+ * top-1 82.34 @ 288x288 and 82.54 @ 320x320
49
+ * Add [SAM pretrained](https://arxiv.org/abs/2106.01548) in1k weight for ViT B/16 (`vit_base_patch16_sam_224`) and B/32 (`vit_base_patch32_sam_224`) models.
50
+ * Add 'Aggregating Nested Transformer' (NesT) w/ weights converted from official [Flax impl](https://github.com/google-research/nested-transformer). Contributed by [Alexander Soare](https://github.com/alexander-soare).
51
+ * `jx_nest_base` - 83.534, `jx_nest_small` - 83.120, `jx_nest_tiny` - 81.426
52
+
53
+ ### June 23, 2021
54
+ * Reproduce gMLP model training, `gmlp_s16_224` trained to 79.6 top-1, matching [paper](https://arxiv.org/abs/2105.08050). Hparams for this and other recent MLP training [here](https://gist.github.com/rwightman/d6c264a9001f9167e06c209f630b2cc6)
55
+
56
+ ### June 20, 2021
57
+ * Release Vision Transformer 'AugReg' weights from [How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers](https://arxiv.org/abs/2106.10270)
58
+ * .npz weight loading support added, can load any of the 50K+ weights from the [AugReg series](https://console.cloud.google.com/storage/browser/vit_models/augreg)
59
+ * See [example notebook](https://colab.research.google.com/github/google-research/vision_transformer/blob/master/vit_jax_augreg.ipynb) from [official impl](https://github.com/google-research/vision_transformer/) for navigating the augreg weights
60
+ * Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work.
61
+ * Highlights: `vit_large_patch16_384` (87.1 top-1), `vit_large_r50_s32_384` (86.2 top-1), `vit_base_patch16_384` (86.0 top-1)
62
+ * `vit_deit_*` renamed to just `deit_*`
63
+ * Remove my old small model, replace with DeiT compatible small w/ AugReg weights
64
+ * Add 1st training of my `gmixer_24_224` MLP /w GLU, 78.1 top-1 w/ 25M params.
65
+ * Add weights from official ResMLP release (https://github.com/facebookresearch/deit)
66
+ * Add `eca_nfnet_l2` weights from my 'lightweight' series. 84.7 top-1 at 384x384.
67
+ * Add distilled BiT 50x1 student and 152x2 Teacher weights from [Knowledge distillation: A good teacher is patient and consistent](https://arxiv.org/abs/2106.05237)
68
+ * NFNets and ResNetV2-BiT models work w/ Pytorch XLA now
69
+ * weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered)
70
+ * eps values adjusted, will be slight differences but should be quite close
71
+ * Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models
72
+ * Cleanup a few classifier / flatten details for models w/ conv classifiers or early global pool
73
+ * Please report any regressions, this PR touched quite a few models.
74
+
75
+ ### June 8, 2021
76
+ * Add first ResMLP weights, trained in PyTorch XLA on TPU-VM w/ my XLA branch. 24 block variant, 79.2 top-1.
77
+ * Add ResNet51-Q model w/ pretrained weights at 82.36 top-1.
78
+ * NFNet inspired block layout with quad layer stem and no maxpool
79
+ * Same param count (35.7M) and throughput as ResNetRS-50 but +1.5 top-1 @ 224x224 and +2.5 top-1 at 288x288
80
+
81
+ ### May 25, 2021
82
+ * Add LeViT, Visformer, Convit (PR by Aman Arora), Twins (PR by paper authors) transformer models
83
+ * Cleanup input_size/img_size override handling and testing for all vision transformer models
84
+ * Add `efficientnetv2_rw_m` model and weights (started training before official code). 84.8 top-1, 53M params.
85
+
86
+ ### May 14, 2021
87
+ * Add EfficientNet-V2 official model defs w/ ported weights from official [Tensorflow/Keras](https://github.com/google/automl/tree/master/efficientnetv2) impl.
88
+ * 1k trained variants: `tf_efficientnetv2_s/m/l`
89
+ * 21k trained variants: `tf_efficientnetv2_s/m/l_in21k`
90
+ * 21k pretrained -> 1k fine-tuned: `tf_efficientnetv2_s/m/l_in21ft1k`
91
+ * v2 models w/ v1 scaling: `tf_efficientnetv2_b0` through `b3`
92
+ * Rename my prev V2 guess `efficientnet_v2s` -> `efficientnetv2_rw_s`
93
+ * Some blank `efficientnetv2_*` models in-place for future native PyTorch training
94
+
95
+ ### May 5, 2021
96
+ * Add MLP-Mixer models and port pretrained weights from [Google JAX impl](https://github.com/google-research/vision_transformer/tree/linen)
97
+ * Add CaiT models and pretrained weights from [FB](https://github.com/facebookresearch/deit)
98
+ * Add ResNet-RS models and weights from [TF](https://github.com/tensorflow/tpu/tree/master/models/official/resnet/resnet_rs). Thanks [Aman Arora](https://github.com/amaarora)
99
+ * Add CoaT models and weights. Thanks [Mohammed Rizin](https://github.com/morizin)
100
+ * Add new ImageNet-21k weights & finetuned weights for TResNet, MobileNet-V3, ViT models. Thanks [mrT](https://github.com/mrT23)
101
+ * Add GhostNet models and weights. Thanks [Kai Han](https://github.com/iamhankai)
102
+ * Update ByoaNet attention modles
103
+ * Improve SA module inits
104
+ * Hack together experimental stand-alone Swin based attn module and `swinnet`
105
+ * Consistent '26t' model defs for experiments.
106
+ * Add improved Efficientnet-V2S (prelim model def) weights. 83.8 top-1.
107
+ * WandB logging support
108
+
109
+ ### April 13, 2021
110
+ * Add Swin Transformer models and weights from https://github.com/microsoft/Swin-Transformer
111
+
112
+ ### April 12, 2021
113
+ * Add ECA-NFNet-L1 (slimmed down F1 w/ SiLU, 41M params) trained with this code. 84% top-1 @ 320x320. Trained at 256x256.
114
+ * Add EfficientNet-V2S model (unverified model definition) weights. 83.3 top-1 @ 288x288. Only trained single res 224. Working on progressive training.
115
+ * Add ByoaNet model definition (Bring-your-own-attention) w/ SelfAttention block and corresponding SA/SA-like modules and model defs
116
+ * Lambda Networks - https://arxiv.org/abs/2102.08602
117
+ * Bottleneck Transformers - https://arxiv.org/abs/2101.11605
118
+ * Halo Nets - https://arxiv.org/abs/2103.12731
119
+ * Adabelief optimizer contributed by Juntang Zhuang
120
+
121
+ ### April 1, 2021
122
+ * Add snazzy `benchmark.py` script for bulk `timm` model benchmarking of train and/or inference
123
+ * Add Pooling-based Vision Transformer (PiT) models (from https://github.com/naver-ai/pit)
124
+ * Merged distilled variant into main for torchscript compatibility
125
+ * Some `timm` cleanup/style tweaks and weights have hub download support
126
+ * Cleanup Vision Transformer (ViT) models
127
+ * Merge distilled (DeiT) model into main so that torchscript can work
128
+ * Support updated weight init (defaults to old still) that closer matches original JAX impl (possibly better training from scratch)
129
+ * Separate hybrid model defs into different file and add several new model defs to fiddle with, support patch_size != 1 for hybrids
130
+ * Fix fine-tuning num_class changes (PiT and ViT) and pos_embed resizing (Vit) with distilled variants
131
+ * nn.Sequential for block stack (does not break downstream compat)
132
+ * TnT (Transformer-in-Transformer) models contributed by author (from https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/TNT)
133
+ * Add RegNetY-160 weights from DeiT teacher model
134
+ * Add new NFNet-L0 w/ SE attn (rename `nfnet_l0b`->`nfnet_l0`) weights 82.75 top-1 @ 288x288
135
+ * Some fixes/improvements for TFDS dataset wrapper
136
+
137
+ ### March 7, 2021
138
+ * First 0.4.x PyPi release w/ NFNets (& related), ByoB (GPU-Efficient, RepVGG, etc).
139
+ * Change feature extraction for pre-activation nets (NFNets, ResNetV2) to return features before activation.
140
+
141
+ ### Feb 18, 2021
142
+ * Add pretrained weights and model variants for NFNet-F* models from [DeepMind Haiku impl](https://github.com/deepmind/deepmind-research/tree/master/nfnets).
143
+ * Models are prefixed with `dm_`. They require SAME padding conv, skipinit enabled, and activation gains applied in act fn.
144
+ * These models are big, expect to run out of GPU memory. With the GELU activiation + other options, they are roughly 1/2 the inference speed of my SiLU PyTorch optimized `s` variants.
145
+ * Original model results are based on pre-processing that is not the same as all other models so you'll see different results in the results csv (once updated).
146
+ * Matching the original pre-processing as closely as possible I get these results:
147
+ * `dm_nfnet_f6` - 86.352
148
+ * `dm_nfnet_f5` - 86.100
149
+ * `dm_nfnet_f4` - 85.834
150
+ * `dm_nfnet_f3` - 85.676
151
+ * `dm_nfnet_f2` - 85.178
152
+ * `dm_nfnet_f1` - 84.696
153
+ * `dm_nfnet_f0` - 83.464
154
+
155
+ ### Feb 16, 2021
156
+ * Add Adaptive Gradient Clipping (AGC) as per https://arxiv.org/abs/2102.06171. Integrated w/ PyTorch gradient clipping via mode arg that defaults to prev 'norm' mode. For backward arg compat, clip-grad arg must be specified to enable when using train.py.
157
+ * AGC w/ default clipping factor `--clip-grad .01 --clip-mode agc`
158
+ * PyTorch global norm of 1.0 (old behaviour, always norm), `--clip-grad 1.0`
159
+ * PyTorch value clipping of 10, `--clip-grad 10. --clip-mode value`
160
+ * AGC performance is definitely sensitive to the clipping factor. More experimentation needed to determine good values for smaller batch sizes and optimizers besides those in paper. So far I've found .001-.005 is necessary for stable RMSProp training w/ NFNet/NF-ResNet.
161
+
162
+ ### Feb 12, 2021
163
+ * Update Normalization-Free nets to include new NFNet-F (https://arxiv.org/abs/2102.06171) model defs
164
+
165
+ ### Feb 10, 2021
166
+ * More model archs, incl a flexible ByobNet backbone ('Bring-your-own-blocks')
167
+ * GPU-Efficient-Networks (https://github.com/idstcv/GPU-Efficient-Networks), impl in `byobnet.py`
168
+ * RepVGG (https://github.com/DingXiaoH/RepVGG), impl in `byobnet.py`
169
+ * classic VGG (from torchvision, impl in `vgg`)
170
+ * Refinements to normalizer layer arg handling and normalizer+act layer handling in some models
171
+ * Default AMP mode changed to native PyTorch AMP instead of APEX. Issues not being fixed with APEX. Native works with `--channels-last` and `--torchscript` model training, APEX does not.
172
+ * Fix a few bugs introduced since last pypi release
173
+
174
+ ### Feb 8, 2021
175
+ * Add several ResNet weights with ECA attention. 26t & 50t trained @ 256, test @ 320. 269d train @ 256, fine-tune @320, test @ 352.
176
+ * `ecaresnet26t` - 79.88 top-1 @ 320x320, 79.08 @ 256x256
177
+ * `ecaresnet50t` - 82.35 top-1 @ 320x320, 81.52 @ 256x256
178
+ * `ecaresnet269d` - 84.93 top-1 @ 352x352, 84.87 @ 320x320
179
+ * Remove separate tiered (`t`) vs tiered_narrow (`tn`) ResNet model defs, all `tn` changed to `t` and `t` models removed (`seresnext26t_32x4d` only model w/ weights that was removed).
180
+ * Support model default_cfgs with separate train vs test resolution `test_input_size` and remove extra `_320` suffix ResNet model defs that were just for test.
181
+
182
+ ### Jan 30, 2021
183
+ * Add initial "Normalization Free" NF-RegNet-B* and NF-ResNet model definitions based on [paper](https://arxiv.org/abs/2101.08692)
184
+
185
+ ### Jan 25, 2021
186
+ * Add ResNetV2 Big Transfer (BiT) models w/ ImageNet-1k and 21k weights from https://github.com/google-research/big_transfer
187
+ * Add official R50+ViT-B/16 hybrid models + weights from https://github.com/google-research/vision_transformer
188
+ * ImageNet-21k ViT weights are added w/ model defs and representation layer (pre logits) support
189
+ * NOTE: ImageNet-21k classifier heads were zero'd in original weights, they are only useful for transfer learning
190
+ * Add model defs and weights for DeiT Vision Transformer models from https://github.com/facebookresearch/deit
191
+ * Refactor dataset classes into ImageDataset/IterableImageDataset + dataset specific parser classes
192
+ * Add Tensorflow-Datasets (TFDS) wrapper to allow use of TFDS image classification sets with train script
193
+ * Ex: `train.py /data/tfds --dataset tfds/oxford_iiit_pet --val-split test --model resnet50 -b 256 --amp --num-classes 37 --opt adamw --lr 3e-4 --weight-decay .001 --pretrained -j 2`
194
+ * Add improved .tar dataset parser that reads images from .tar, folder of .tar files, or .tar within .tar
195
+ * Run validation on full ImageNet-21k directly from tar w/ BiT model: `validate.py /data/fall11_whole.tar --model resnetv2_50x1_bitm_in21k --amp`
196
+ * Models in this update should be stable w/ possible exception of ViT/BiT, possibility of some regressions with train/val scripts and dataset handling
197
+
198
+ ### Jan 3, 2021
199
+ * Add SE-ResNet-152D weights
200
+ * 256x256 val, 0.94 crop top-1 - 83.75
201
+ * 320x320 val, 1.0 crop - 84.36
202
+ * Update results files
203
+
204
+ ### Dec 18, 2020
205
+ * Add ResNet-101D, ResNet-152D, and ResNet-200D weights trained @ 256x256
206
+ * 256x256 val, 0.94 crop (top-1) - 101D (82.33), 152D (83.08), 200D (83.25)
207
+ * 288x288 val, 1.0 crop - 101D (82.64), 152D (83.48), 200D (83.76)
208
+ * 320x320 val, 1.0 crop - 101D (83.00), 152D (83.66), 200D (84.01)
209
+
210
+ ### Dec 7, 2020
211
+ * Simplify EMA module (ModelEmaV2), compatible with fully torchscripted models
212
+ * Misc fixes for SiLU ONNX export, default_cfg missing from Feature extraction models, Linear layer w/ AMP + torchscript
213
+ * PyPi release @ 0.3.2 (needed by EfficientDet)
214
+
215
+
216
+ ### Oct 30, 2020
217
+ * Test with PyTorch 1.7 and fix a small top-n metric view vs reshape issue.
218
+ * Convert newly added 224x224 Vision Transformer weights from official JAX repo. 81.8 top-1 for B/16, 83.1 L/16.
219
+ * Support PyTorch 1.7 optimized, native SiLU (aka Swish) activation. Add mapping to 'silu' name, custom swish will eventually be deprecated.
220
+ * Fix regression for loading pretrained classifier via direct model entrypoint functions. Didn't impact create_model() factory usage.
221
+ * PyPi release @ 0.3.0 version!
222
+
223
+ ### Oct 26, 2020
224
+ * Update Vision Transformer models to be compatible with official code release at https://github.com/google-research/vision_transformer
225
+ * Add Vision Transformer weights (ImageNet-21k pretrain) for 384x384 base and large models converted from official jax impl
226
+ * ViT-B/16 - 84.2
227
+ * ViT-B/32 - 81.7
228
+ * ViT-L/16 - 85.2
229
+ * ViT-L/32 - 81.5
230
+
231
+ ### Oct 21, 2020
232
+ * Weights added for Vision Transformer (ViT) models. 77.86 top-1 for 'small' and 79.35 for 'base'. Thanks to [Christof](https://www.kaggle.com/christofhenkel) for training the base model w/ lots of GPUs.
233
+
234
+ ### Oct 13, 2020
235
+ * Initial impl of Vision Transformer models. Both patch and hybrid (CNN backbone) variants. Currently trying to train...
236
+ * Adafactor and AdaHessian (FP32 only, no AMP) optimizers
237
+ * EdgeTPU-M (`efficientnet_em`) model trained in PyTorch, 79.3 top-1
238
+ * Pip release, doc updates pending a few more changes...
239
+
240
+ ### Sept 18, 2020
241
+ * New ResNet 'D' weights. 72.7 (top-1) ResNet-18-D, 77.1 ResNet-34-D, 80.5 ResNet-50-D
242
+ * Added a few untrained defs for other ResNet models (66D, 101D, 152D, 200/200D)
243
+
244
+ ### Sept 3, 2020
245
+ * New weights
246
+ * Wide-ResNet50 - 81.5 top-1 (vs 78.5 torchvision)
247
+ * SEResNeXt50-32x4d - 81.3 top-1 (vs 79.1 cadene)
248
+ * Support for native Torch AMP and channels_last memory format added to train/validate scripts (`--channels-last`, `--native-amp` vs `--apex-amp`)
249
+ * Models tested with channels_last on latest NGC 20.08 container. AdaptiveAvgPool in attn layers changed to mean((2,3)) to work around bug with NHWC kernel.
250
+
251
+ ### Aug 12, 2020
252
+ * New/updated weights from training experiments
253
+ * EfficientNet-B3 - 82.1 top-1 (vs 81.6 for official with AA and 81.9 for AdvProp)
254
+ * RegNetY-3.2GF - 82.0 top-1 (78.9 from official ver)
255
+ * CSPResNet50 - 79.6 top-1 (76.6 from official ver)
256
+ * Add CutMix integrated w/ Mixup. See [pull request](https://github.com/rwightman/pytorch-image-models/pull/218) for some usage examples
257
+ * Some fixes for using pretrained weights with `in_chans` != 3 on several models.
258
+
259
+ ### Aug 5, 2020
260
+ Universal feature extraction, new models, new weights, new test sets.
261
+ * All models support the `features_only=True` argument for `create_model` call to return a network that extracts feature maps from the deepest layer at each stride.
262
+ * New models
263
+ * CSPResNet, CSPResNeXt, CSPDarkNet, DarkNet
264
+ * ReXNet
265
+ * (Modified Aligned) Xception41/65/71 (a proper port of TF models)
266
+ * New trained weights
267
+ * SEResNet50 - 80.3 top-1
268
+ * CSPDarkNet53 - 80.1 top-1
269
+ * CSPResNeXt50 - 80.0 top-1
270
+ * DPN68b - 79.2 top-1
271
+ * EfficientNet-Lite0 (non-TF ver) - 75.5 (submitted by [@hal-314](https://github.com/hal-314))
272
+ * Add 'real' labels for ImageNet and ImageNet-Renditions test set, see [`results/README.md`](results/README.md)
273
+ * Test set ranking/top-n diff script by [@KushajveerSingh](https://github.com/KushajveerSingh)
274
+ * Train script and loader/transform tweaks to punch through more aug arguments
275
+ * README and documentation overhaul. See initial (WIP) documentation at https://rwightman.github.io/pytorch-image-models/
276
+ * adamp and sgdp optimizers added by [@hellbell](https://github.com/hellbell)
277
+
278
+ ### June 11, 2020
279
+ Bunch of changes:
280
+ * DenseNet models updated with memory efficient addition from torchvision (fixed a bug), blur pooling and deep stem additions
281
+ * VoVNet V1 and V2 models added, 39 V2 variant (ese_vovnet_39b) trained to 79.3 top-1
282
+ * Activation factory added along with new activations:
283
+ * select act at model creation time for more flexibility in using activations compatible with scripting or tracing (ONNX export)
284
+ * hard_mish (experimental) added with memory-efficient grad, along with ME hard_swish
285
+ * context mgr for setting exportable/scriptable/no_jit states
286
+ * Norm + Activation combo layers added with initial trial support in DenseNet and VoVNet along with impl of EvoNorm and InplaceAbn wrapper that fit the interface
287
+ * Torchscript works for all but two of the model types as long as using Pytorch 1.5+, tests added for this
288
+ * Some import cleanup and classifier reset changes, all models will have classifier reset to nn.Identity on reset_classifer(0) call
289
+ * Prep for 0.1.28 pip release
290
+
291
+ ### May 12, 2020
292
+ * Add ResNeSt models (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955))
293
+
294
+ ### May 3, 2020
295
+ * Pruned EfficientNet B1, B2, and B3 (https://arxiv.org/abs/2002.08258) contributed by [Yonathan Aflalo](https://github.com/yoniaflalo)
296
+
297
+ ### May 1, 2020
298
+ * Merged a number of execellent contributions in the ResNet model family over the past month
299
+ * BlurPool2D and resnetblur models initiated by [Chris Ha](https://github.com/VRandme), I trained resnetblur50 to 79.3.
300
+ * TResNet models and SpaceToDepth, AntiAliasDownsampleLayer layers by [mrT23](https://github.com/mrT23)
301
+ * ecaresnet (50d, 101d, light) models and two pruned variants using pruning as per (https://arxiv.org/abs/2002.08258) by [Yonathan Aflalo](https://github.com/yoniaflalo)
302
+ * 200 pretrained models in total now with updated results csv in results folder
303
+
304
+ ### April 5, 2020
305
+ * Add some newly trained MobileNet-V2 models trained with latest h-params, rand augment. They compare quite favourably to EfficientNet-Lite
306
+ * 3.5M param MobileNet-V2 100 @ 73%
307
+ * 4.5M param MobileNet-V2 110d @ 75%
308
+ * 6.1M param MobileNet-V2 140 @ 76.5%
309
+ * 5.8M param MobileNet-V2 120d @ 77.3%
310
+
311
+ ### March 18, 2020
312
+ * Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
313
+ * Add RandAugment trained ResNeXt-50 32x4d weights with 79.8 top-1. Trained by [Andrew Lavin](https://github.com/andravin) (see Training section for hparams)
314
+
315
+ ### April 5, 2020
316
+ * Add some newly trained MobileNet-V2 models trained with latest h-params, rand augment. They compare quite favourably to EfficientNet-Lite
317
+ * 3.5M param MobileNet-V2 100 @ 73%
318
+ * 4.5M param MobileNet-V2 110d @ 75%
319
+ * 6.1M param MobileNet-V2 140 @ 76.5%
320
+ * 5.8M param MobileNet-V2 120d @ 77.3%
321
+
322
+ ### March 18, 2020
323
+ * Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
324
+ * Add RandAugment trained ResNeXt-50 32x4d weights with 79.8 top-1. Trained by [Andrew Lavin](https://github.com/andravin) (see Training section for hparams)
325
+
326
+ ### Feb 29, 2020
327
+ * New MobileNet-V3 Large weights trained from stratch with this code to 75.77% top-1
328
+ * IMPORTANT CHANGE - default weight init changed for all MobilenetV3 / EfficientNet / related models
329
+ * overall results similar to a bit better training from scratch on a few smaller models tried
330
+ * performance early in training seems consistently improved but less difference by end
331
+ * set `fix_group_fanout=False` in `_init_weight_goog` fn if you need to reproducte past behaviour
332
+ * Experimental LR noise feature added applies a random perturbation to LR each epoch in specified range of training
333
+
334
+ ### Feb 18, 2020
335
+ * Big refactor of model layers and addition of several attention mechanisms. Several additions motivated by 'Compounding the Performance Improvements...' (https://arxiv.org/abs/2001.06268):
336
+ * Move layer/module impl into `layers` subfolder/module of `models` and organize in a more granular fashion
337
+ * ResNet downsample paths now properly support dilation (output stride != 32) for avg_pool ('D' variant) and 3x3 (SENets) networks
338
+ * Add Selective Kernel Nets on top of ResNet base, pretrained weights
339
+ * skresnet18 - 73% top-1
340
+ * skresnet34 - 76.9% top-1
341
+ * skresnext50_32x4d (equiv to SKNet50) - 80.2% top-1
342
+ * ECA and CECA (circular padding) attention layer contributed by [Chris Ha](https://github.com/VRandme)
343
+ * CBAM attention experiment (not the best results so far, may remove)
344
+ * Attention factory to allow dynamically selecting one of SE, ECA, CBAM in the `.se` position for all ResNets
345
+ * Add DropBlock and DropPath (formerly DropConnect for EfficientNet/MobileNetv3) support to all ResNet variants
346
+ * Full dataset results updated that incl NoisyStudent weights and 2 of the 3 SK weights
347
+
348
+ ### Feb 12, 2020
349
+ * Add EfficientNet-L2 and B0-B7 NoisyStudent weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)
350
+
351
+ ### Feb 6, 2020
352
+ * Add RandAugment trained EfficientNet-ES (EdgeTPU-Small) weights with 78.1 top-1. Trained by [Andrew Lavin](https://github.com/andravin) (see Training section for hparams)
353
+
354
+ ### Feb 1/2, 2020
355
+ * Port new EfficientNet-B8 (RandAugment) weights, these are different than the B8 AdvProp, different input normalization.
356
+ * Update results csv files on all models for ImageNet validation and three other test sets
357
+ * Push PyPi package update
358
+
359
+ ### Jan 31, 2020
360
+ * Update ResNet50 weights with a new 79.038 result from further JSD / AugMix experiments. Full command line for reproduction in training section below.
361
+
362
+ ### Jan 11/12, 2020
363
+ * Master may be a bit unstable wrt to training, these changes have been tested but not all combos
364
+ * Implementations of AugMix added to existing RA and AA. Including numerous supporting pieces like JSD loss (Jensen-Shannon divergence + CE), and AugMixDataset
365
+ * SplitBatchNorm adaptation layer added for implementing Auxiliary BN as per AdvProp paper
366
+ * ResNet-50 AugMix trained model w/ 79% top-1 added
367
+ * `seresnext26tn_32x4d` - 77.99 top-1, 93.75 top-5 added to tiered experiment, higher img/s than 't' and 'd'
368
+
369
+ ### Jan 3, 2020
370
+ * Add RandAugment trained EfficientNet-B0 weight with 77.7 top-1. Trained by [Michael Klachko](https://github.com/michaelklachko) with this code and recent hparams (see Training section)
371
+ * Add `avg_checkpoints.py` script for post training weight averaging and update all scripts with header docstrings and shebangs.
372
+
373
+ ### Dec 30, 2019
374
+ * Merge [Dushyant Mehta's](https://github.com/mehtadushy) PR for SelecSLS (Selective Short and Long Range Skip Connections) networks. Good GPU memory consumption and throughput. Original: https://github.com/mehtadushy/SelecSLS-Pytorch
375
+
376
+ ### Dec 28, 2019
377
+ * Add new model weights and training hparams (see Training Hparams section)
378
+ * `efficientnet_b3` - 81.5 top-1, 95.7 top-5 at default res/crop, 81.9, 95.8 at 320x320 1.0 crop-pct
379
+ * trained with RandAugment, ended up with an interesting but less than perfect result (see training section)
380
+ * `seresnext26d_32x4d`- 77.6 top-1, 93.6 top-5
381
+ * deep stem (32, 32, 64), avgpool downsample
382
+ * stem/dowsample from bag-of-tricks paper
383
+ * `seresnext26t_32x4d`- 78.0 top-1, 93.7 top-5
384
+ * deep tiered stem (24, 48, 64), avgpool downsample (a modified 'D' variant)
385
+ * stem sizing mods from Jeremy Howard and fastai devs discussing ResNet architecture experiments
386
+
387
+ ### Dec 23, 2019
388
+ * Add RandAugment trained MixNet-XL weights with 80.48 top-1.
389
+ * `--dist-bn` argument added to train.py, will distribute BN stats between nodes after each train epoch, before eval
390
+
391
+ ### Dec 4, 2019
392
+ * Added weights from the first training from scratch of an EfficientNet (B2) with my new RandAugment implementation. Much better than my previous B2 and very close to the official AdvProp ones (80.4 top-1, 95.08 top-5).
393
+
394
+ ### Nov 29, 2019
395
+ * Brought EfficientNet and MobileNetV3 up to date with my https://github.com/rwightman/gen-efficientnet-pytorch code. Torchscript and ONNX export compat excluded.
396
+ * AdvProp weights added
397
+ * Official TF MobileNetv3 weights added
398
+ * EfficientNet and MobileNetV3 hook based 'feature extraction' classes added. Will serve as basis for using models as backbones in obj detection/segmentation tasks. Lots more to be done here...
399
+ * HRNet classification models and weights added from https://github.com/HRNet/HRNet-Image-Classification
400
+ * Consistency in global pooling, `reset_classifer`, and `forward_features` across models
401
+ * `forward_features` always returns unpooled feature maps now
402
+ * Reasonable chance I broke something... let me know
403
+
404
+ ### Nov 22, 2019
405
+ * Add ImageNet training RandAugment implementation alongside AutoAugment. PyTorch Transform compatible format, using PIL. Currently training two EfficientNet models from scratch with promising results... will update.
406
+ * `drop-connect` cmd line arg finally added to `train.py`, no need to hack model fns. Works for efficientnet/mobilenetv3 based models, ignored otherwise.
docs/changes.md ADDED
@@ -0,0 +1,314 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Recent Changes
2
+ ### Jan 5, 2023
3
+ * ConvNeXt-V2 models and weights added to existing `convnext.py`
4
+ * Paper: [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](http://arxiv.org/abs/2301.00808)
5
+ * Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)
6
+
7
+ ### Dec 23, 2022 🎄☃
8
+ * Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
9
+ * NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
10
+ * Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
11
+ * More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
12
+ * More ImageNet-12k (subset of 22k) pretrain models popping up:
13
+ * `efficientnet_b5.in12k_ft_in1k` - 85.9 @ 448x448
14
+ * `vit_medium_patch16_gap_384.in12k_ft_in1k` - 85.5 @ 384x384
15
+ * `vit_medium_patch16_gap_256.in12k_ft_in1k` - 84.5 @ 256x256
16
+ * `convnext_nano.in12k_ft_in1k` - 82.9 @ 288x288
17
+
18
+ ### Dec 8, 2022
19
+ * Add 'EVA l' to `vision_transformer.py`, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
20
+ * original source: https://github.com/baaivision/EVA
21
+
22
+ | model | top1 | param_count | gmac | macts | hub |
23
+ |:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------|
24
+ | eva_large_patch14_336.in22k_ft_in22k_in1k | 89.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
25
+ | eva_large_patch14_336.in22k_ft_in1k | 88.7 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
26
+ | eva_large_patch14_196.in22k_ft_in22k_in1k | 88.6 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
27
+ | eva_large_patch14_196.in22k_ft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
28
+
29
+ ### Dec 6, 2022
30
+ * Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to `beit.py`.
31
+ * original source: https://github.com/baaivision/EVA
32
+ * paper: https://arxiv.org/abs/2211.07636
33
+
34
+ | model | top1 | param_count | gmac | macts | hub |
35
+ |:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------|
36
+ | eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | [link](https://huggingface.co/BAAI/EVA) |
37
+ | eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.6 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
38
+ | eva_giant_patch14_336.clip_ft_in1k | 89.4 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
39
+ | eva_giant_patch14_224.clip_ft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | [link](https://huggingface.co/BAAI/EVA) |
40
+
41
+ ### Dec 5, 2022
42
+
43
+ * Pre-release (`0.8.0dev0`) of multi-weight support (`model_arch.pretrained_tag`). Install with `pip install --pre timm`
44
+ * vision_transformer, maxvit, convnext are the first three model impl w/ support
45
+ * model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
46
+ * bugs are likely, but I need feedback so please try it out
47
+ * if stability is needed, please use 0.6.x pypi releases or clone from [0.6.x branch](https://github.com/rwightman/pytorch-image-models/tree/0.6.x)
48
+ * Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use `--torchcompile` argument
49
+ * Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
50
+ * Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
51
+
52
+ | model | top1 | param_count | gmac | macts | hub |
53
+ |:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------|
54
+ | vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k | 88.6 | 632.5 | 391 | 407.5 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k) |
55
+ | vit_large_patch14_clip_336.openai_ft_in12k_in1k | 88.3 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.openai_ft_in12k_in1k) |
56
+ | vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k | 88.2 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k) |
57
+ | vit_large_patch14_clip_336.laion2b_ft_in12k_in1k | 88.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k) |
58
+ | vit_large_patch14_clip_224.openai_ft_in12k_in1k | 88.2 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k) |
59
+ | vit_large_patch14_clip_224.laion2b_ft_in12k_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in12k_in1k) |
60
+ | vit_large_patch14_clip_224.openai_ft_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in1k) |
61
+ | vit_large_patch14_clip_336.laion2b_ft_in1k | 87.9 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in1k) |
62
+ | vit_huge_patch14_clip_224.laion2b_ft_in1k | 87.6 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in1k) |
63
+ | vit_large_patch14_clip_224.laion2b_ft_in1k | 87.3 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in1k) |
64
+ | vit_base_patch16_clip_384.laion2b_ft_in12k_in1k | 87.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k) |
65
+ | vit_base_patch16_clip_384.openai_ft_in12k_in1k | 87 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k) |
66
+ | vit_base_patch16_clip_384.laion2b_ft_in1k | 86.6 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k) |
67
+ | vit_base_patch16_clip_384.openai_ft_in1k | 86.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k) |
68
+ | vit_base_patch16_clip_224.laion2b_ft_in12k_in1k | 86.2 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k) |
69
+ | vit_base_patch16_clip_224.openai_ft_in12k_in1k | 85.9 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k) |
70
+ | vit_base_patch32_clip_448.laion2b_ft_in12k_in1k | 85.8 | 88.3 | 17.9 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k) |
71
+ | vit_base_patch16_clip_224.laion2b_ft_in1k | 85.5 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k) |
72
+ | vit_base_patch32_clip_384.laion2b_ft_in12k_in1k | 85.4 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k) |
73
+ | vit_base_patch16_clip_224.openai_ft_in1k | 85.3 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k) |
74
+ | vit_base_patch32_clip_384.openai_ft_in12k_in1k | 85.2 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k) |
75
+ | vit_base_patch32_clip_224.laion2b_ft_in12k_in1k | 83.3 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k) |
76
+ | vit_base_patch32_clip_224.laion2b_ft_in1k | 82.6 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k) |
77
+ | vit_base_patch32_clip_224.openai_ft_in1k | 81.9 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k) |
78
+
79
+ * Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
80
+ * There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
81
+
82
+ | model | top1 | param_count | gmac | macts | hub |
83
+ |:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------|
84
+ | maxvit_xlarge_tf_512.in21k_ft_in1k | 88.5 | 475.8 | 534.1 | 1413.2 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k) |
85
+ | maxvit_xlarge_tf_384.in21k_ft_in1k | 88.3 | 475.3 | 292.8 | 668.8 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k) |
86
+ | maxvit_base_tf_512.in21k_ft_in1k | 88.2 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k) |
87
+ | maxvit_large_tf_512.in21k_ft_in1k | 88 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k) |
88
+ | maxvit_large_tf_384.in21k_ft_in1k | 88 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k) |
89
+ | maxvit_base_tf_384.in21k_ft_in1k | 87.9 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k) |
90
+ | maxvit_base_tf_512.in1k | 86.6 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in1k) |
91
+ | maxvit_large_tf_512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in1k) |
92
+ | maxvit_base_tf_384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in1k) |
93
+ | maxvit_large_tf_384.in1k | 86.2 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in1k) |
94
+ | maxvit_small_tf_512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | [link](https://huggingface.co/timm/maxvit_small_tf_512.in1k) |
95
+ | maxvit_tiny_tf_512.in1k | 85.7 | 31 | 33.5 | 257.6 | [link](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k) |
96
+ | maxvit_small_tf_384.in1k | 85.5 | 69 | 35.9 | 183.6 | [link](https://huggingface.co/timm/maxvit_small_tf_384.in1k) |
97
+ | maxvit_tiny_tf_384.in1k | 85.1 | 31 | 17.5 | 123.4 | [link](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k) |
98
+ | maxvit_large_tf_224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | [link](https://huggingface.co/timm/maxvit_large_tf_224.in1k) |
99
+ | maxvit_base_tf_224.in1k | 84.9 | 119.5 | 24 | 95 | [link](https://huggingface.co/timm/maxvit_base_tf_224.in1k) |
100
+ | maxvit_small_tf_224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | [link](https://huggingface.co/timm/maxvit_small_tf_224.in1k) |
101
+ | maxvit_tiny_tf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | [link](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k) |
102
+
103
+ ### Oct 15, 2022
104
+ * Train and validation script enhancements
105
+ * Non-GPU (ie CPU) device support
106
+ * SLURM compatibility for train script
107
+ * HF datasets support (via ReaderHfds)
108
+ * TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
109
+ * in_chans !=3 support for scripts / loader
110
+ * Adan optimizer
111
+ * Can enable per-step LR scheduling via args
112
+ * Dataset 'parsers' renamed to 'readers', more descriptive of purpose
113
+ * AMP args changed, APEX via `--amp-impl apex`, bfloat16 supportedf via `--amp-dtype bfloat16`
114
+ * main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
115
+ * master -> main branch rename
116
+
117
+ ### Oct 10, 2022
118
+ * More weights in `maxxvit` series, incl first ConvNeXt block based `coatnext` and `maxxvit` experiments:
119
+ * `coatnext_nano_rw_224` - 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm)
120
+ * `maxxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)
121
+ * `maxvit_rmlp_small_rw_224` - 84.5 @ 224, 85.1 @ 320 (G)
122
+ * `maxxvit_rmlp_small_rw_256` - 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparams need tuning (uses ConvNeXt block, no BN)
123
+ * `coatnet_rmlp_2_rw_224` - 84.6 @ 224, 85 @ 320 (T)
124
+ * NOTE: official MaxVit weights (in1k) have been released at https://github.com/google-research/maxvit -- some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun.
125
+
126
+ ### Sept 23, 2022
127
+ * LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier)
128
+ * vit_base_patch32_224_clip_laion2b
129
+ * vit_large_patch14_224_clip_laion2b
130
+ * vit_huge_patch14_224_clip_laion2b
131
+ * vit_giant_patch14_224_clip_laion2b
132
+
133
+ ### Sept 7, 2022
134
+ * Hugging Face [`timm` docs](https://huggingface.co/docs/hub/timm) home now exists, look for more here in the future
135
+ * Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
136
+ * Add more weights in `maxxvit` series incl a `pico` (7.5M params, 1.9 GMACs), two `tiny` variants:
137
+ * `maxvit_rmlp_pico_rw_256` - 80.5 @ 256, 81.3 @ 320 (T)
138
+ * `maxvit_tiny_rw_224` - 83.5 @ 224 (G)
139
+ * `maxvit_rmlp_tiny_rw_256` - 84.2 @ 256, 84.8 @ 320 (T)
140
+
141
+ ### Aug 29, 2022
142
+ * MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
143
+ * `maxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.6 @ 320 (T)
144
+
145
+ ### Aug 26, 2022
146
+ * CoAtNet (https://arxiv.org/abs/2106.04803) and MaxVit (https://arxiv.org/abs/2204.01697) `timm` original models
147
+ * both found in [`maxxvit.py`](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/maxxvit.py) model def, contains numerous experiments outside scope of original papers
148
+ * an unfinished Tensorflow version from MaxVit authors can be found https://github.com/google-research/maxvit
149
+ * Initial CoAtNet and MaxVit timm pretrained weights (working on more):
150
+ * `coatnet_nano_rw_224` - 81.7 @ 224 (T)
151
+ * `coatnet_rmlp_nano_rw_224` - 82.0 @ 224, 82.8 @ 320 (T)
152
+ * `coatnet_0_rw_224` - 82.4 (T) -- NOTE timm '0' coatnets have 2 more 3rd stage blocks
153
+ * `coatnet_bn_0_rw_224` - 82.4 (T)
154
+ * `maxvit_nano_rw_256` - 82.9 @ 256 (T)
155
+ * `coatnet_rmlp_1_rw_224` - 83.4 @ 224, 84 @ 320 (T)
156
+ * `coatnet_1_rw_224` - 83.6 @ 224 (G)
157
+ * (T) = TPU trained with `bits_and_tpu` branch training code, (G) = GPU trained
158
+ * GCVit (weights adapted from https://github.com/NVlabs/GCVit, code 100% `timm` re-write for license purposes)
159
+ * MViT-V2 (multi-scale vit, adapted from https://github.com/facebookresearch/mvit)
160
+ * EfficientFormer (adapted from https://github.com/snap-research/EfficientFormer)
161
+ * PyramidVisionTransformer-V2 (adapted from https://github.com/whai362/PVT)
162
+ * 'Fast Norm' support for LayerNorm and GroupNorm that avoids float32 upcast w/ AMP (uses APEX LN if available for further boost)
163
+
164
+
165
+ ### Aug 15, 2022
166
+ * ConvNeXt atto weights added
167
+ * `convnext_atto` - 75.7 @ 224, 77.0 @ 288
168
+ * `convnext_atto_ols` - 75.9 @ 224, 77.2 @ 288
169
+
170
+ ### Aug 5, 2022
171
+ * More custom ConvNeXt smaller model defs with weights
172
+ * `convnext_femto` - 77.5 @ 224, 78.7 @ 288
173
+ * `convnext_femto_ols` - 77.9 @ 224, 78.9 @ 288
174
+ * `convnext_pico` - 79.5 @ 224, 80.4 @ 288
175
+ * `convnext_pico_ols` - 79.5 @ 224, 80.5 @ 288
176
+ * `convnext_nano_ols` - 80.9 @ 224, 81.6 @ 288
177
+ * Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)
178
+
179
+ ### July 28, 2022
180
+ * Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks [Hugo Touvron](https://github.com/TouvronHugo)!
181
+
182
+ ### July 27, 2022
183
+ * All runtime benchmark and validation result csv files are up-to-date!
184
+ * A few more weights & model defs added:
185
+ * `darknetaa53` - 79.8 @ 256, 80.5 @ 288
186
+ * `convnext_nano` - 80.8 @ 224, 81.5 @ 288
187
+ * `cs3sedarknet_l` - 81.2 @ 256, 81.8 @ 288
188
+ * `cs3darknet_x` - 81.8 @ 256, 82.2 @ 288
189
+ * `cs3sedarknet_x` - 82.2 @ 256, 82.7 @ 288
190
+ * `cs3edgenet_x` - 82.2 @ 256, 82.7 @ 288
191
+ * `cs3se_edgenet_x` - 82.8 @ 256, 83.5 @ 320
192
+ * `cs3*` weights above all trained on TPU w/ `bits_and_tpu` branch. Thanks to TRC program!
193
+ * Add output_stride=8 and 16 support to ConvNeXt (dilation)
194
+ * deit3 models not being able to resize pos_emb fixed
195
+ * Version 0.6.7 PyPi release (/w above bug fixes and new weighs since 0.6.5)
196
+
197
+ ### July 8, 2022
198
+ More models, more fixes
199
+ * Official research models (w/ weights) added:
200
+ * EdgeNeXt from (https://github.com/mmaaz60/EdgeNeXt)
201
+ * MobileViT-V2 from (https://github.com/apple/ml-cvnets)
202
+ * DeiT III (Revenge of the ViT) from (https://github.com/facebookresearch/deit)
203
+ * My own models:
204
+ * Small `ResNet` defs added by request with 1 block repeats for both basic and bottleneck (resnet10 and resnet14)
205
+ * `CspNet` refactored with dataclass config, simplified CrossStage3 (`cs3`) option. These are closer to YOLO-v5+ backbone defs.
206
+ * More relative position vit fiddling. Two `srelpos` (shared relative position) models trained, and a medium w/ class token.
207
+ * Add an alternate downsample mode to EdgeNeXt and train a `small` model. Better than original small, but not their new USI trained weights.
208
+ * My own model weight results (all ImageNet-1k training)
209
+ * `resnet10t` - 66.5 @ 176, 68.3 @ 224
210
+ * `resnet14t` - 71.3 @ 176, 72.3 @ 224
211
+ * `resnetaa50` - 80.6 @ 224 , 81.6 @ 288
212
+ * `darknet53` - 80.0 @ 256, 80.5 @ 288
213
+ * `cs3darknet_m` - 77.0 @ 256, 77.6 @ 288
214
+ * `cs3darknet_focus_m` - 76.7 @ 256, 77.3 @ 288
215
+ * `cs3darknet_l` - 80.4 @ 256, 80.9 @ 288
216
+ * `cs3darknet_focus_l` - 80.3 @ 256, 80.9 @ 288
217
+ * `vit_srelpos_small_patch16_224` - 81.1 @ 224, 82.1 @ 320
218
+ * `vit_srelpos_medium_patch16_224` - 82.3 @ 224, 83.1 @ 320
219
+ * `vit_relpos_small_patch16_cls_224` - 82.6 @ 224, 83.6 @ 320
220
+ * `edgnext_small_rw` - 79.6 @ 224, 80.4 @ 320
221
+ * `cs3`, `darknet`, and `vit_*relpos` weights above all trained on TPU thanks to TRC program! Rest trained on overheating GPUs.
222
+ * Hugging Face Hub support fixes verified, demo notebook TBA
223
+ * Pretrained weights / configs can be loaded externally (ie from local disk) w/ support for head adaptation.
224
+ * Add support to change image extensions scanned by `timm` datasets/parsers. See (https://github.com/rwightman/pytorch-image-models/pull/1274#issuecomment-1178303103)
225
+ * Default ConvNeXt LayerNorm impl to use `F.layer_norm(x.permute(0, 2, 3, 1), ...).permute(0, 3, 1, 2)` via `LayerNorm2d` in all cases.
226
+ * a bit slower than previous custom impl on some hardware (ie Ampere w/ CL), but overall fewer regressions across wider HW / PyTorch version ranges.
227
+ * previous impl exists as `LayerNormExp2d` in `models/layers/norm.py`
228
+ * Numerous bug fixes
229
+ * Currently testing for imminent PyPi 0.6.x release
230
+ * LeViT pretraining of larger models still a WIP, they don't train well / easily without distillation. Time to add distill support (finally)?
231
+ * ImageNet-22k weight training + finetune ongoing, work on multi-weight support (slowly) chugging along (there are a LOT of weights, sigh) ...
232
+
233
+ ### May 13, 2022
234
+ * Official Swin-V2 models and weights added from (https://github.com/microsoft/Swin-Transformer). Cleaned up to support torchscript.
235
+ * Some refactoring for existing `timm` Swin-V2-CR impl, will likely do a bit more to bring parts closer to official and decide whether to merge some aspects.
236
+ * More Vision Transformer relative position / residual post-norm experiments (all trained on TPU thanks to TRC program)
237
+ * `vit_relpos_small_patch16_224` - 81.5 @ 224, 82.5 @ 320 -- rel pos, layer scale, no class token, avg pool
238
+ * `vit_relpos_medium_patch16_rpn_224` - 82.3 @ 224, 83.1 @ 320 -- rel pos + res-post-norm, no class token, avg pool
239
+ * `vit_relpos_medium_patch16_224` - 82.5 @ 224, 83.3 @ 320 -- rel pos, layer scale, no class token, avg pool
240
+ * `vit_relpos_base_patch16_gapcls_224` - 82.8 @ 224, 83.9 @ 320 -- rel pos, layer scale, class token, avg pool (by mistake)
241
+ * Bring 512 dim, 8-head 'medium' ViT model variant back to life (after using in a pre DeiT 'small' model for first ViT impl back in 2020)
242
+ * Add ViT relative position support for switching btw existing impl and some additions in official Swin-V2 impl for future trials
243
+ * Sequencer2D impl (https://arxiv.org/abs/2205.01972), added via PR from author (https://github.com/okojoalg)
244
+
245
+ ### May 2, 2022
246
+ * Vision Transformer experiments adding Relative Position (Swin-V2 log-coord) (`vision_transformer_relpos.py`) and Residual Post-Norm branches (from Swin-V2) (`vision_transformer*.py`)
247
+ * `vit_relpos_base_patch32_plus_rpn_256` - 79.5 @ 256, 80.6 @ 320 -- rel pos + extended width + res-post-norm, no class token, avg pool
248
+ * `vit_relpos_base_patch16_224` - 82.5 @ 224, 83.6 @ 320 -- rel pos, layer scale, no class token, avg pool
249
+ * `vit_base_patch16_rpn_224` - 82.3 @ 224 -- rel pos + res-post-norm, no class token, avg pool
250
+ * Vision Transformer refactor to remove representation layer that was only used in initial vit and rarely used since with newer pretrain (ie `How to Train Your ViT`)
251
+ * `vit_*` models support removal of class token, use of global average pool, use of fc_norm (ala beit, mae).
252
+
253
+ ### April 22, 2022
254
+ * `timm` models are now officially supported in [fast.ai](https://www.fast.ai/)! Just in time for the new Practical Deep Learning course. `timmdocs` documentation link updated to [timm.fast.ai](http://timm.fast.ai/).
255
+ * Two more model weights added in the TPU trained [series](https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights). Some In22k pretrain still in progress.
256
+ * `seresnext101d_32x8d` - 83.69 @ 224, 84.35 @ 288
257
+ * `seresnextaa101d_32x8d` (anti-aliased w/ AvgPool2d) - 83.85 @ 224, 84.57 @ 288
258
+
259
+ ### March 23, 2022
260
+ * Add `ParallelBlock` and `LayerScale` option to base vit models to support model configs in [Three things everyone should know about ViT](https://arxiv.org/abs/2203.09795)
261
+ * `convnext_tiny_hnf` (head norm first) weights trained with (close to) A2 recipe, 82.2% top-1, could do better with more epochs.
262
+
263
+ ### March 21, 2022
264
+ * Merge `norm_norm_norm`. **IMPORTANT** this update for a coming 0.6.x release will likely de-stabilize the master branch for a while. Branch [`0.5.x`](https://github.com/rwightman/pytorch-image-models/tree/0.5.x) or a previous 0.5.x release can be used if stability is required.
265
+ * Significant weights update (all TPU trained) as described in this [release](https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights)
266
+ * `regnety_040` - 82.3 @ 224, 82.96 @ 288
267
+ * `regnety_064` - 83.0 @ 224, 83.65 @ 288
268
+ * `regnety_080` - 83.17 @ 224, 83.86 @ 288
269
+ * `regnetv_040` - 82.44 @ 224, 83.18 @ 288 (timm pre-act)
270
+ * `regnetv_064` - 83.1 @ 224, 83.71 @ 288 (timm pre-act)
271
+ * `regnetz_040` - 83.67 @ 256, 84.25 @ 320
272
+ * `regnetz_040h` - 83.77 @ 256, 84.5 @ 320 (w/ extra fc in head)
273
+ * `resnetv2_50d_gn` - 80.8 @ 224, 81.96 @ 288 (pre-act GroupNorm)
274
+ * `resnetv2_50d_evos` 80.77 @ 224, 82.04 @ 288 (pre-act EvoNormS)
275
+ * `regnetz_c16_evos` - 81.9 @ 256, 82.64 @ 320 (EvoNormS)
276
+ * `regnetz_d8_evos` - 83.42 @ 256, 84.04 @ 320 (EvoNormS)
277
+ * `xception41p` - 82 @ 299 (timm pre-act)
278
+ * `xception65` - 83.17 @ 299
279
+ * `xception65p` - 83.14 @ 299 (timm pre-act)
280
+ * `resnext101_64x4d` - 82.46 @ 224, 83.16 @ 288
281
+ * `seresnext101_32x8d` - 83.57 @ 224, 84.270 @ 288
282
+ * `resnetrs200` - 83.85 @ 256, 84.44 @ 320
283
+ * HuggingFace hub support fixed w/ initial groundwork for allowing alternative 'config sources' for pretrained model definitions and weights (generic local file / remote url support soon)
284
+ * SwinTransformer-V2 implementation added. Submitted by [Christoph Reich](https://github.com/ChristophReich1996). Training experiments and model changes by myself are ongoing so expect compat breaks.
285
+ * Swin-S3 (AutoFormerV2) models / weights added from https://github.com/microsoft/Cream/tree/main/AutoFormerV2
286
+ * MobileViT models w/ weights adapted from https://github.com/apple/ml-cvnets
287
+ * PoolFormer models w/ weights adapted from https://github.com/sail-sg/poolformer
288
+ * VOLO models w/ weights adapted from https://github.com/sail-sg/volo
289
+ * Significant work experimenting with non-BatchNorm norm layers such as EvoNorm, FilterResponseNorm, GroupNorm, etc
290
+ * Enhance support for alternate norm + act ('NormAct') layers added to a number of models, esp EfficientNet/MobileNetV3, RegNet, and aligned Xception
291
+ * Grouped conv support added to EfficientNet family
292
+ * Add 'group matching' API to all models to allow grouping model parameters for application of 'layer-wise' LR decay, lr scale added to LR scheduler
293
+ * Gradient checkpointing support added to many models
294
+ * `forward_head(x, pre_logits=False)` fn added to all models to allow separate calls of `forward_features` + `forward_head`
295
+ * All vision transformer and vision MLP models update to return non-pooled / non-token selected features from `foward_features`, for consistency with CNN models, token selection or pooling now applied in `forward_head`
296
+
297
+ ### Feb 2, 2022
298
+ * [Chris Hughes](https://github.com/Chris-hughes10) posted an exhaustive run through of `timm` on his blog yesterday. Well worth a read. [Getting Started with PyTorch Image Models (timm): A Practitioner’s Guide](https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055)
299
+ * I'm currently prepping to merge the `norm_norm_norm` branch back to master (ver 0.6.x) in next week or so.
300
+ * The changes are more extensive than usual and may destabilize and break some model API use (aiming for full backwards compat). So, beware `pip install git+https://github.com/rwightman/pytorch-image-models` installs!
301
+ * `0.5.x` releases and a `0.5.x` branch will remain stable with a cherry pick or two until dust clears. Recommend sticking to pypi install for a bit if you want stable.
302
+
303
+ ### Jan 14, 2022
304
+ * Version 0.5.4 w/ release to be pushed to pypi. It's been a while since last pypi update and riskier changes will be merged to main branch soon....
305
+ * Add ConvNeXT models /w weights from official impl (https://github.com/facebookresearch/ConvNeXt), a few perf tweaks, compatible with timm features
306
+ * Tried training a few small (~1.8-3M param) / mobile optimized models, a few are good so far, more on the way...
307
+ * `mnasnet_small` - 65.6 top-1
308
+ * `mobilenetv2_050` - 65.9
309
+ * `lcnet_100/075/050` - 72.1 / 68.8 / 63.1
310
+ * `semnasnet_075` - 73
311
+ * `fbnetv3_b/d/g` - 79.1 / 79.7 / 82.0
312
+ * TinyNet models added by [rsomani95](https://github.com/rsomani95)
313
+ * LCNet added via MobileNetV3 architecture
314
+
docs/feature_extraction.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Feature Extraction
2
+
3
+ All of the models in `timm` have consistent mechanisms for obtaining various types of features from the model for tasks besides classification.
4
+
5
+ ## Penultimate Layer Features (Pre-Classifier Features)
6
+
7
+ The features from the penultimate model layer can be obtained in several ways without requiring model surgery (although feel free to do surgery). One must first decide if they want pooled or un-pooled features.
8
+
9
+ ### Unpooled
10
+
11
+ There are three ways to obtain unpooled features.
12
+
13
+ Without modifying the network, one can call `model.forward_features(input)` on any model instead of the usual `model(input)`. This will bypass the head classifier and global pooling for networks.
14
+
15
+ If one wants to explicitly modify the network to return unpooled features, they can either create the model without a classifier and pooling, or remove it later. Both paths remove the parameters associated with the classifier from the network.
16
+
17
+ #### forward_features()
18
+ ```python hl_lines="3 6"
19
+ import torch
20
+ import timm
21
+ m = timm.create_model('xception41', pretrained=True)
22
+ o = m(torch.randn(2, 3, 299, 299))
23
+ print(f'Original shape: {o.shape}')
24
+ o = m.forward_features(torch.randn(2, 3, 299, 299))
25
+ print(f'Unpooled shape: {o.shape}')
26
+ ```
27
+ Output:
28
+ ```text
29
+ Original shape: torch.Size([2, 1000])
30
+ Unpooled shape: torch.Size([2, 2048, 10, 10])
31
+ ```
32
+
33
+ #### Create with no classifier and pooling
34
+ ```python hl_lines="3"
35
+ import torch
36
+ import timm
37
+ m = timm.create_model('resnet50', pretrained=True, num_classes=0, global_pool='')
38
+ o = m(torch.randn(2, 3, 224, 224))
39
+ print(f'Unpooled shape: {o.shape}')
40
+ ```
41
+ Output:
42
+ ```text
43
+ Unpooled shape: torch.Size([2, 2048, 7, 7])
44
+ ```
45
+
46
+ #### Remove it later
47
+ ```python hl_lines="3 6"
48
+ import torch
49
+ import timm
50
+ m = timm.create_model('densenet121', pretrained=True)
51
+ o = m(torch.randn(2, 3, 224, 224))
52
+ print(f'Original shape: {o.shape}')
53
+ m.reset_classifier(0, '')
54
+ o = m(torch.randn(2, 3, 224, 224))
55
+ print(f'Unpooled shape: {o.shape}')
56
+ ```
57
+ Output:
58
+ ```text
59
+ Original shape: torch.Size([2, 1000])
60
+ Unpooled shape: torch.Size([2, 1024, 7, 7])
61
+ ```
62
+
63
+ ### Pooled
64
+
65
+ To modify the network to return pooled features, one can use `forward_features()` and pool/flatten the result themselves, or modify the network like above but keep pooling intact.
66
+
67
+ #### Create with no classifier
68
+ ```python hl_lines="3"
69
+ import torch
70
+ import timm
71
+ m = timm.create_model('resnet50', pretrained=True, num_classes=0)
72
+ o = m(torch.randn(2, 3, 224, 224))
73
+ print(f'Pooled shape: {o.shape}')
74
+ ```
75
+ Output:
76
+ ```text
77
+ Pooled shape: torch.Size([2, 2048])
78
+ ```
79
+
80
+ #### Remove it later
81
+ ```python hl_lines="3 6"
82
+ import torch
83
+ import timm
84
+ m = timm.create_model('ese_vovnet19b_dw', pretrained=True)
85
+ o = m(torch.randn(2, 3, 224, 224))
86
+ print(f'Original shape: {o.shape}')
87
+ m.reset_classifier(0)
88
+ o = m(torch.randn(2, 3, 224, 224))
89
+ print(f'Pooled shape: {o.shape}')
90
+ ```
91
+ Output:
92
+ ```text
93
+ Original shape: torch.Size([2, 1000])
94
+ Pooled shape: torch.Size([2, 1024])
95
+ ```
96
+
97
+
98
+ ## Multi-scale Feature Maps (Feature Pyramid)
99
+
100
+ Object detection, segmentation, keypoint, and a variety of dense pixel tasks require access to feature maps from the backbone network at multiple scales. This is often done by modifying the original classification network. Since each network varies quite a bit in structure, it's not uncommon to see only a few backbones supported in any given obj detection or segmentation library.
101
+
102
+ `timm` allows a consistent interface for creating any of the included models as feature backbones that output feature maps for selected levels.
103
+
104
+ A feature backbone can be created by adding the argument `features_only=True` to any `create_model` call. By default 5 strides will be output from most models (not all have that many), with the first starting at 2 (some start at 1 or 4).
105
+
106
+ ### Create a feature map extraction model
107
+ ```python hl_lines="3"
108
+ import torch
109
+ import timm
110
+ m = timm.create_model('resnest26d', features_only=True, pretrained=True)
111
+ o = m(torch.randn(2, 3, 224, 224))
112
+ for x in o:
113
+ print(x.shape)
114
+ ```
115
+ Output:
116
+ ```text
117
+ torch.Size([2, 64, 112, 112])
118
+ torch.Size([2, 256, 56, 56])
119
+ torch.Size([2, 512, 28, 28])
120
+ torch.Size([2, 1024, 14, 14])
121
+ torch.Size([2, 2048, 7, 7])
122
+ ```
123
+
124
+ ### Query the feature information
125
+
126
+ After a feature backbone has been created, it can be queried to provide channel or resolution reduction information to the downstream heads without requiring static config or hardcoded constants. The `.feature_info` attribute is a class encapsulating the information about the feature extraction points.
127
+
128
+ ```python hl_lines="3 4"
129
+ import torch
130
+ import timm
131
+ m = timm.create_model('regnety_032', features_only=True, pretrained=True)
132
+ print(f'Feature channels: {m.feature_info.channels()}')
133
+ o = m(torch.randn(2, 3, 224, 224))
134
+ for x in o:
135
+ print(x.shape)
136
+ ```
137
+ Output:
138
+ ```text
139
+ Feature channels: [32, 72, 216, 576, 1512]
140
+ torch.Size([2, 32, 112, 112])
141
+ torch.Size([2, 72, 56, 56])
142
+ torch.Size([2, 216, 28, 28])
143
+ torch.Size([2, 576, 14, 14])
144
+ torch.Size([2, 1512, 7, 7])
145
+ ```
146
+
147
+ ### Select specific feature levels or limit the stride
148
+
149
+ There are two additional creation arguments impacting the output features.
150
+
151
+ * `out_indices` selects which indices to output
152
+ * `output_stride` limits the feature output stride of the network (also works in classification mode BTW)
153
+
154
+ `out_indices` is supported by all models, but not all models have the same index to feature stride mapping. Look at the code or check feature_info to compare. The out indices generally correspond to the `C(i+1)th` feature level (a `2^(i+1)` reduction). For most models, index 0 is the stride 2 features, and index 4 is stride 32.
155
+
156
+ `output_stride` is achieved by converting layers to use dilated convolutions. Doing so is not always straightforward, some networks only support `output_stride=32`.
157
+
158
+ ```python hl_lines="3 4 5"
159
+ import torch
160
+ import timm
161
+ m = timm.create_model('ecaresnet101d', features_only=True, output_stride=8, out_indices=(2, 4), pretrained=True)
162
+ print(f'Feature channels: {m.feature_info.channels()}')
163
+ print(f'Feature reduction: {m.feature_info.reduction()}')
164
+ o = m(torch.randn(2, 3, 320, 320))
165
+ for x in o:
166
+ print(x.shape)
167
+ ```
168
+ Output:
169
+ ```text
170
+ Feature channels: [512, 2048]
171
+ Feature reduction: [8, 8]
172
+ torch.Size([2, 512, 40, 40])
173
+ torch.Size([2, 2048, 40, 40])
174
+ ```
docs/index.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Getting Started
2
+
3
+ ## Welcome
4
+
5
+ Welcome to the `timm` documentation, a lean set of docs that covers the basics of `timm`.
6
+
7
+ For a more comprehensive set of docs (currently under development), please visit [timmdocs](http://timm.fast.ai) by [Aman Arora](https://github.com/amaarora).
8
+
9
+ ## Install
10
+
11
+ The library can be installed with pip:
12
+
13
+ ```
14
+ pip install timm
15
+ ```
16
+
17
+ I update the PyPi (pip) packages when I'm confident there are no significant model regressions from previous releases. If you want to pip install the bleeding edge from GitHub, use:
18
+ ```
19
+ pip install git+https://github.com/rwightman/pytorch-image-models.git
20
+ ```
21
+
22
+ !!! info "Conda Environment"
23
+ All development and testing has been done in Conda Python 3 environments on Linux x86-64 systems, specifically 3.7, 3.8, 3.9, 3.10
24
+
25
+ Little to no care has been taken to be Python 2.x friendly and will not support it. If you run into any challenges running on Windows, or other OS, I'm definitely open to looking into those issues so long as it's in a reproducible (read Conda) environment.
26
+
27
+ PyTorch versions 1.9, 1.10, 1.11 have been tested with the latest versions of this code.
28
+
29
+ I've tried to keep the dependencies minimal, the setup is as per the PyTorch default install instructions for Conda:
30
+ ```
31
+ conda create -n torch-env
32
+ conda activate torch-env
33
+ conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
34
+ conda install pyyaml
35
+ ```
36
+
37
+ ## Load a Pretrained Model
38
+
39
+ Pretrained models can be loaded using `timm.create_model`
40
+
41
+ ```python
42
+ import timm
43
+
44
+ m = timm.create_model('mobilenetv3_large_100', pretrained=True)
45
+ m.eval()
46
+ ```
47
+
48
+ ## List Models with Pretrained Weights
49
+ ```python
50
+ import timm
51
+ from pprint import pprint
52
+ model_names = timm.list_models(pretrained=True)
53
+ pprint(model_names)
54
+ >>> ['adv_inception_v3',
55
+ 'cspdarknet53',
56
+ 'cspresnext50',
57
+ 'densenet121',
58
+ 'densenet161',
59
+ 'densenet169',
60
+ 'densenet201',
61
+ 'densenetblur121d',
62
+ 'dla34',
63
+ 'dla46_c',
64
+ ...
65
+ ]
66
+ ```
67
+
68
+ ## List Model Architectures by Wildcard
69
+ ```python
70
+ import timm
71
+ from pprint import pprint
72
+ model_names = timm.list_models('*resne*t*')
73
+ pprint(model_names)
74
+ >>> ['cspresnet50',
75
+ 'cspresnet50d',
76
+ 'cspresnet50w',
77
+ 'cspresnext50',
78
+ ...
79
+ ]
80
+ ```
docs/javascripts/tables.js ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ app.location$.subscribe(function() {
2
+ var tables = document.querySelectorAll("article table")
3
+ tables.forEach(function(table) {
4
+ new Tablesort(table)
5
+ })
6
+ })
docs/models.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Summaries
2
+
3
+ The model architectures included come from a wide variety of sources. Sources, including papers, original impl ("reference code") that I rewrote / adapted, and PyTorch impl that I leveraged directly ("code") are listed below.
4
+
5
+ Most included models have pretrained weights. The weights are either:
6
+
7
+ 1. from their original sources
8
+ 2. ported by myself from their original impl in a different framework (e.g. Tensorflow models)
9
+ 3. trained from scratch using the included training script
10
+
11
+ The validation results for the pretrained weights are [here](results.md)
12
+
13
+ A more exciting view (with pretty pictures) of the models within `timm` can be found at [paperswithcode](https://paperswithcode.com/lib/timm).
14
+
15
+ ## Big Transfer ResNetV2 (BiT) [[resnetv2.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnetv2.py)]
16
+ * Paper: `Big Transfer (BiT): General Visual Representation Learning` - https://arxiv.org/abs/1912.11370
17
+ * Reference code: https://github.com/google-research/big_transfer
18
+
19
+ ## Cross-Stage Partial Networks [[cspnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/cspnet.py)]
20
+ * Paper: `CSPNet: A New Backbone that can Enhance Learning Capability of CNN` - https://arxiv.org/abs/1911.11929
21
+ * Reference impl: https://github.com/WongKinYiu/CrossStagePartialNetworks
22
+
23
+ ## DenseNet [[densenet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/densenet.py)]
24
+ * Paper: `Densely Connected Convolutional Networks` - https://arxiv.org/abs/1608.06993
25
+ * Code: https://github.com/pytorch/vision/tree/master/torchvision/models
26
+
27
+ ## DLA [[dla.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dla.py)]
28
+ * Paper: https://arxiv.org/abs/1707.06484
29
+ * Code: https://github.com/ucbdrive/dla
30
+
31
+ ## Dual-Path Networks [[dpn.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/dpn.py)]
32
+ * Paper: `Dual Path Networks` - https://arxiv.org/abs/1707.01629
33
+ * My PyTorch code: https://github.com/rwightman/pytorch-dpn-pretrained
34
+ * Reference code: https://github.com/cypw/DPNs
35
+
36
+ ## GPU-Efficient Networks [[byobnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/byobnet.py)]
37
+ * Paper: `Neural Architecture Design for GPU-Efficient Networks` - https://arxiv.org/abs/2006.14090
38
+ * Reference code: https://github.com/idstcv/GPU-Efficient-Networks
39
+
40
+ ## HRNet [[hrnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/hrnet.py)]
41
+ * Paper: `Deep High-Resolution Representation Learning for Visual Recognition` - https://arxiv.org/abs/1908.07919
42
+ * Code: https://github.com/HRNet/HRNet-Image-Classification
43
+
44
+ ## Inception-V3 [[inception_v3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v3.py)]
45
+ * Paper: `Rethinking the Inception Architecture for Computer Vision` - https://arxiv.org/abs/1512.00567
46
+ * Code: https://github.com/pytorch/vision/tree/master/torchvision/models
47
+
48
+ ## Inception-V4 [[inception_v4.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_v4.py)]
49
+ * Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
50
+ * Code: https://github.com/Cadene/pretrained-models.pytorch
51
+ * Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
52
+
53
+ ## Inception-ResNet-V2 [[inception_resnet_v2.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/inception_resnet_v2.py)]
54
+ * Paper: `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` - https://arxiv.org/abs/1602.07261
55
+ * Code: https://github.com/Cadene/pretrained-models.pytorch
56
+ * Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets
57
+
58
+ ## NASNet-A [[nasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/nasnet.py)]
59
+ * Papers: `Learning Transferable Architectures for Scalable Image Recognition` - https://arxiv.org/abs/1707.07012
60
+ * Code: https://github.com/Cadene/pretrained-models.pytorch
61
+ * Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
62
+
63
+ ## PNasNet-5 [[pnasnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/pnasnet.py)]
64
+ * Papers: `Progressive Neural Architecture Search` - https://arxiv.org/abs/1712.00559
65
+ * Code: https://github.com/Cadene/pretrained-models.pytorch
66
+ * Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
67
+
68
+ ## EfficientNet [[efficientnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/efficientnet.py)]
69
+
70
+ * Papers:
71
+ * EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
72
+ * EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
73
+ * EfficientNet (B0-B7) - https://arxiv.org/abs/1905.11946
74
+ * EfficientNet-EdgeTPU (S, M, L) - https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html
75
+ * MixNet - https://arxiv.org/abs/1907.09595
76
+ * MNASNet B1, A1 (Squeeze-Excite), and Small - https://arxiv.org/abs/1807.11626
77
+ * MobileNet-V2 - https://arxiv.org/abs/1801.04381
78
+ * FBNet-C - https://arxiv.org/abs/1812.03443
79
+ * Single-Path NAS - https://arxiv.org/abs/1904.02877
80
+ * My PyTorch code: https://github.com/rwightman/gen-efficientnet-pytorch
81
+ * Reference code: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
82
+
83
+ ## MobileNet-V3 [[mobilenetv3.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/mobilenetv3.py)]
84
+ * Paper: `Searching for MobileNetV3` - https://arxiv.org/abs/1905.02244
85
+ * Reference code: https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
86
+
87
+ ## RegNet [[regnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/regnet.py)]
88
+ * Paper: `Designing Network Design Spaces` - https://arxiv.org/abs/2003.13678
89
+ * Reference code: https://github.com/facebookresearch/pycls/blob/master/pycls/models/regnet.py
90
+
91
+ ## RepVGG [[byobnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/byobnet.py)]
92
+ * Paper: `Making VGG-style ConvNets Great Again` - https://arxiv.org/abs/2101.03697
93
+ * Reference code: https://github.com/DingXiaoH/RepVGG
94
+
95
+ ## ResNet, ResNeXt [[resnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnet.py)]
96
+
97
+ * ResNet (V1B)
98
+ * Paper: `Deep Residual Learning for Image Recognition` - https://arxiv.org/abs/1512.03385
99
+ * Code: https://github.com/pytorch/vision/tree/master/torchvision/models
100
+ * ResNeXt
101
+ * Paper: `Aggregated Residual Transformations for Deep Neural Networks` - https://arxiv.org/abs/1611.05431
102
+ * Code: https://github.com/pytorch/vision/tree/master/torchvision/models
103
+ * 'Bag of Tricks' / Gluon C, D, E, S ResNet variants
104
+ * Paper: `Bag of Tricks for Image Classification with CNNs` - https://arxiv.org/abs/1812.01187
105
+ * Code: https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/resnetv1b.py
106
+ * Instagram pretrained / ImageNet tuned ResNeXt101
107
+ * Paper: `Exploring the Limits of Weakly Supervised Pretraining` - https://arxiv.org/abs/1805.00932
108
+ * Weights: https://pytorch.org/hub/facebookresearch_WSL-Images_resnext (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
109
+ * Semi-supervised (SSL) / Semi-weakly Supervised (SWSL) ResNet and ResNeXts
110
+ * Paper: `Billion-scale semi-supervised learning for image classification` - https://arxiv.org/abs/1905.00546
111
+ * Weights: https://github.com/facebookresearch/semi-supervised-ImageNet1K-models (NOTE: CC BY-NC 4.0 License, NOT commercial friendly)
112
+ * Squeeze-and-Excitation Networks
113
+ * Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
114
+ * Code: Added to ResNet base, this is current version going forward, old `senet.py` is being deprecated
115
+ * ECAResNet (ECA-Net)
116
+ * Paper: `ECA-Net: Efficient Channel Attention for Deep CNN` - https://arxiv.org/abs/1910.03151v4
117
+ * Code: Added to ResNet base, ECA module contributed by @VRandme, reference https://github.com/BangguWu/ECANet
118
+
119
+ ## Res2Net [[res2net.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/res2net.py)]
120
+ * Paper: `Res2Net: A New Multi-scale Backbone Architecture` - https://arxiv.org/abs/1904.01169
121
+ * Code: https://github.com/gasvn/Res2Net
122
+
123
+ ## ResNeSt [[resnest.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnest.py)]
124
+ * Paper: `ResNeSt: Split-Attention Networks` - https://arxiv.org/abs/2004.08955
125
+ * Code: https://github.com/zhanghang1989/ResNeSt
126
+
127
+ ## ReXNet [[rexnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/rexnet.py)]
128
+ * Paper: `ReXNet: Diminishing Representational Bottleneck on CNN` - https://arxiv.org/abs/2007.00992
129
+ * Code: https://github.com/clovaai/rexnet
130
+
131
+ ## Selective-Kernel Networks [[sknet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/sknet.py)]
132
+ * Paper: `Selective-Kernel Networks` - https://arxiv.org/abs/1903.06586
133
+ * Code: https://github.com/implus/SKNet, https://github.com/clovaai/assembled-cnn
134
+
135
+ ## SelecSLS [[selecsls.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/selecsls.py)]
136
+ * Paper: `XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera` - https://arxiv.org/abs/1907.00837
137
+ * Code: https://github.com/mehtadushy/SelecSLS-Pytorch
138
+
139
+ ## Squeeze-and-Excitation Networks [[senet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/senet.py)]
140
+ NOTE: I am deprecating this version of the networks, the new ones are part of `resnet.py`
141
+
142
+ * Paper: `Squeeze-and-Excitation Networks` - https://arxiv.org/abs/1709.01507
143
+ * Code: https://github.com/Cadene/pretrained-models.pytorch
144
+
145
+ ## TResNet [[tresnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tresnet.py)]
146
+ * Paper: `TResNet: High Performance GPU-Dedicated Architecture` - https://arxiv.org/abs/2003.13630
147
+ * Code: https://github.com/mrT23/TResNet
148
+
149
+ ## VGG [[vgg.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vgg.py)]
150
+ * Paper: `Very Deep Convolutional Networks For Large-Scale Image Recognition` - https://arxiv.org/pdf/1409.1556.pdf
151
+ * Reference code: https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py
152
+
153
+ ## Vision Transformer [[vision_transformer.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py)]
154
+ * Paper: `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale` - https://arxiv.org/abs/2010.11929
155
+ * Reference code and pretrained weights: https://github.com/google-research/vision_transformer
156
+
157
+ ## VovNet V2 and V1 [[vovnet.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vovnet.py)]
158
+ * Paper: `CenterMask : Real-Time Anchor-Free Instance Segmentation` - https://arxiv.org/abs/1911.06667
159
+ * Reference code: https://github.com/youngwanLEE/vovnet-detectron2
160
+
161
+ ## Xception [[xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/xception.py)]
162
+ * Paper: `Xception: Deep Learning with Depthwise Separable Convolutions` - https://arxiv.org/abs/1610.02357
163
+ * Code: https://github.com/Cadene/pretrained-models.pytorch
164
+
165
+ ## Xception (Modified Aligned, Gluon) [[gluon_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/gluon_xception.py)]
166
+ * Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
167
+ * Reference code: https://github.com/dmlc/gluon-cv/tree/master/gluoncv/model_zoo, https://github.com/jfzhang95/pytorch-deeplab-xception/
168
+
169
+ ## Xception (Modified Aligned, TF) [[aligned_xception.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/aligned_xception.py)]
170
+ * Paper: `Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation` - https://arxiv.org/abs/1802.02611
171
+ * Reference code: https://github.com/tensorflow/models/tree/master/research/deeplab
docs/models/.pages ADDED
@@ -0,0 +1 @@
 
 
1
+ title: Model Pages
docs/models/.templates/code_snippets.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## How do I use this model on an image?
2
+ To load a pretrained model:
3
+
4
+ ```python
5
+ import timm
6
+ model = timm.create_model('{{ model_name }}', pretrained=True)
7
+ model.eval()
8
+ ```
9
+
10
+ To load and preprocess the image:
11
+ ```python
12
+ import urllib
13
+ from PIL import Image
14
+ from timm.data import resolve_data_config
15
+ from timm.data.transforms_factory import create_transform
16
+
17
+ config = resolve_data_config({}, model=model)
18
+ transform = create_transform(**config)
19
+
20
+ url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
21
+ urllib.request.urlretrieve(url, filename)
22
+ img = Image.open(filename).convert('RGB')
23
+ tensor = transform(img).unsqueeze(0) # transform and add batch dimension
24
+ ```
25
+
26
+ To get the model predictions:
27
+ ```python
28
+ import torch
29
+ with torch.no_grad():
30
+ out = model(tensor)
31
+ probabilities = torch.nn.functional.softmax(out[0], dim=0)
32
+ print(probabilities.shape)
33
+ # prints: torch.Size([1000])
34
+ ```
35
+
36
+ To get the top-5 predictions class names:
37
+ ```python
38
+ # Get imagenet class mappings
39
+ url, filename = ("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt", "imagenet_classes.txt")
40
+ urllib.request.urlretrieve(url, filename)
41
+ with open("imagenet_classes.txt", "r") as f:
42
+ categories = [s.strip() for s in f.readlines()]
43
+
44
+ # Print top categories per image
45
+ top5_prob, top5_catid = torch.topk(probabilities, 5)
46
+ for i in range(top5_prob.size(0)):
47
+ print(categories[top5_catid[i]], top5_prob[i].item())
48
+ # prints class names and probabilities like:
49
+ # [('Samoyed', 0.6425196528434753), ('Pomeranian', 0.04062102362513542), ('keeshond', 0.03186424449086189), ('white wolf', 0.01739676296710968), ('Eskimo dog', 0.011717947199940681)]
50
+ ```
51
+
52
+ Replace the model name with the variant you want to use, e.g. `{{ model_name }}`. You can find the IDs in the model summaries at the top of this page.
53
+
54
+ To extract image features with this model, follow the [timm feature extraction examples](https://rwightman.github.io/pytorch-image-models/feature_extraction/), just change the name of the model you want to use.
55
+
56
+ ## How do I finetune this model?
57
+ You can finetune any of the pre-trained models just by changing the classifier (the last layer).
58
+ ```python
59
+ model = timm.create_model('{{ model_name }}', pretrained=True, num_classes=NUM_FINETUNE_CLASSES)
60
+ ```
61
+ To finetune on your own dataset, you have to write a training loop or adapt [timm's training
62
+ script](https://github.com/rwightman/pytorch-image-models/blob/master/train.py) to use your dataset.
docs/models/.templates/generate_readmes.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Run this script to generate the model-index files in `models` from the templates in `.templates/models`.
3
+ """
4
+
5
+ import argparse
6
+ from pathlib import Path
7
+
8
+ from jinja2 import Environment, FileSystemLoader
9
+
10
+ import modelindex
11
+
12
+
13
+ def generate_readmes(templates_path: Path, dest_path: Path):
14
+ """Add the code snippet template to the readmes"""
15
+ readme_templates_path = templates_path / "models"
16
+ code_template_path = templates_path / "code_snippets.md"
17
+
18
+ env = Environment(
19
+ loader=FileSystemLoader([readme_templates_path, readme_templates_path.parent]),
20
+ )
21
+
22
+ for readme in readme_templates_path.iterdir():
23
+ if readme.suffix == ".md":
24
+ template = env.get_template(readme.name)
25
+
26
+ # get the first model_name for this model family
27
+ mi = modelindex.load(str(readme))
28
+ model_name = mi.models[0].name
29
+
30
+ full_content = template.render(model_name=model_name)
31
+
32
+ # generate full_readme
33
+ with open(dest_path / readme.name, "w") as f:
34
+ f.write(full_content)
35
+
36
+
37
+ def main():
38
+ parser = argparse.ArgumentParser(description="Model index generation config")
39
+ parser.add_argument(
40
+ "-t",
41
+ "--templates",
42
+ default=Path(__file__).parent / ".templates",
43
+ type=str,
44
+ help="Location of the markdown templates",
45
+ )
46
+ parser.add_argument(
47
+ "-d",
48
+ "--dest",
49
+ default=Path(__file__).parent / "models",
50
+ type=str,
51
+ help="Destination folder that contains the generated model-index files.",
52
+ )
53
+ args = parser.parse_args()
54
+ templates_path = Path(args.templates)
55
+ dest_readmes_path = Path(args.dest)
56
+
57
+ generate_readmes(
58
+ templates_path,
59
+ dest_readmes_path,
60
+ )
61
+
62
+
63
+ if __name__ == "__main__":
64
+ main()
docs/models/.templates/models/adversarial-inception-v3.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Adversarial Inception v3
2
+
3
+ **Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
4
+
5
+ This particular model was trained for study of adversarial examples (adversarial training).
6
+
7
+ The weights from this model were ported from [Tensorflow/Models](https://github.com/tensorflow/models).
8
+
9
+ {% include 'code_snippets.md' %}
10
+
11
+ ## How do I train this model?
12
+
13
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
14
+
15
+ ## Citation
16
+
17
+ ```BibTeX
18
+ @article{DBLP:journals/corr/abs-1804-00097,
19
+ author = {Alexey Kurakin and
20
+ Ian J. Goodfellow and
21
+ Samy Bengio and
22
+ Yinpeng Dong and
23
+ Fangzhou Liao and
24
+ Ming Liang and
25
+ Tianyu Pang and
26
+ Jun Zhu and
27
+ Xiaolin Hu and
28
+ Cihang Xie and
29
+ Jianyu Wang and
30
+ Zhishuai Zhang and
31
+ Zhou Ren and
32
+ Alan L. Yuille and
33
+ Sangxia Huang and
34
+ Yao Zhao and
35
+ Yuzhe Zhao and
36
+ Zhonglin Han and
37
+ Junjiajia Long and
38
+ Yerkebulan Berdibekov and
39
+ Takuya Akiba and
40
+ Seiya Tokui and
41
+ Motoki Abe},
42
+ title = {Adversarial Attacks and Defences Competition},
43
+ journal = {CoRR},
44
+ volume = {abs/1804.00097},
45
+ year = {2018},
46
+ url = {http://arxiv.org/abs/1804.00097},
47
+ archivePrefix = {arXiv},
48
+ eprint = {1804.00097},
49
+ timestamp = {Thu, 31 Oct 2019 16:31:22 +0100},
50
+ biburl = {https://dblp.org/rec/journals/corr/abs-1804-00097.bib},
51
+ bibsource = {dblp computer science bibliography, https://dblp.org}
52
+ }
53
+ ```
54
+
55
+ <!--
56
+ Type: model-index
57
+ Collections:
58
+ - Name: Adversarial Inception v3
59
+ Paper:
60
+ Title: Adversarial Attacks and Defences Competition
61
+ URL: https://paperswithcode.com/paper/adversarial-attacks-and-defences-competition
62
+ Models:
63
+ - Name: adv_inception_v3
64
+ In Collection: Adversarial Inception v3
65
+ Metadata:
66
+ FLOPs: 7352418880
67
+ Parameters: 23830000
68
+ File Size: 95549439
69
+ Architecture:
70
+ - 1x1 Convolution
71
+ - Auxiliary Classifier
72
+ - Average Pooling
73
+ - Average Pooling
74
+ - Batch Normalization
75
+ - Convolution
76
+ - Dense Connections
77
+ - Dropout
78
+ - Inception-v3 Module
79
+ - Max Pooling
80
+ - ReLU
81
+ - Softmax
82
+ Tasks:
83
+ - Image Classification
84
+ Training Data:
85
+ - ImageNet
86
+ ID: adv_inception_v3
87
+ Crop Pct: '0.875'
88
+ Image Size: '299'
89
+ Interpolation: bicubic
90
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_v3.py#L456
91
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/adv_inception_v3-9e27bd63.pth
92
+ Results:
93
+ - Task: Image Classification
94
+ Dataset: ImageNet
95
+ Metrics:
96
+ Top 1 Accuracy: 77.58%
97
+ Top 5 Accuracy: 93.74%
98
+ -->
docs/models/.templates/models/advprop.md ADDED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AdvProp (EfficientNet)
2
+
3
+ **AdvProp** is an adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to the method is the usage of a separate auxiliary batch norm for adversarial examples, as they have different underlying distributions to normal examples.
4
+
5
+ The weights from this model were ported from [Tensorflow/TPU](https://github.com/tensorflow/tpu).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{xie2020adversarial,
17
+ title={Adversarial Examples Improve Image Recognition},
18
+ author={Cihang Xie and Mingxing Tan and Boqing Gong and Jiang Wang and Alan Yuille and Quoc V. Le},
19
+ year={2020},
20
+ eprint={1911.09665},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: AdvProp
30
+ Paper:
31
+ Title: Adversarial Examples Improve Image Recognition
32
+ URL: https://paperswithcode.com/paper/adversarial-examples-improve-image
33
+ Models:
34
+ - Name: tf_efficientnet_b0_ap
35
+ In Collection: AdvProp
36
+ Metadata:
37
+ FLOPs: 488688572
38
+ Parameters: 5290000
39
+ File Size: 21385973
40
+ Architecture:
41
+ - 1x1 Convolution
42
+ - Average Pooling
43
+ - Batch Normalization
44
+ - Convolution
45
+ - Dense Connections
46
+ - Dropout
47
+ - Inverted Residual Block
48
+ - Squeeze-and-Excitation Block
49
+ - Swish
50
+ Tasks:
51
+ - Image Classification
52
+ Training Techniques:
53
+ - AdvProp
54
+ - AutoAugment
55
+ - Label Smoothing
56
+ - RMSProp
57
+ - Stochastic Depth
58
+ - Weight Decay
59
+ Training Data:
60
+ - ImageNet
61
+ ID: tf_efficientnet_b0_ap
62
+ LR: 0.256
63
+ Epochs: 350
64
+ Crop Pct: '0.875'
65
+ Momentum: 0.9
66
+ Batch Size: 2048
67
+ Image Size: '224'
68
+ Weight Decay: 1.0e-05
69
+ Interpolation: bicubic
70
+ RMSProp Decay: 0.9
71
+ Label Smoothing: 0.1
72
+ BatchNorm Momentum: 0.99
73
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1334
74
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b0_ap-f262efe1.pth
75
+ Results:
76
+ - Task: Image Classification
77
+ Dataset: ImageNet
78
+ Metrics:
79
+ Top 1 Accuracy: 77.1%
80
+ Top 5 Accuracy: 93.26%
81
+ - Name: tf_efficientnet_b1_ap
82
+ In Collection: AdvProp
83
+ Metadata:
84
+ FLOPs: 883633200
85
+ Parameters: 7790000
86
+ File Size: 31515350
87
+ Architecture:
88
+ - 1x1 Convolution
89
+ - Average Pooling
90
+ - Batch Normalization
91
+ - Convolution
92
+ - Dense Connections
93
+ - Dropout
94
+ - Inverted Residual Block
95
+ - Squeeze-and-Excitation Block
96
+ - Swish
97
+ Tasks:
98
+ - Image Classification
99
+ Training Techniques:
100
+ - AdvProp
101
+ - AutoAugment
102
+ - Label Smoothing
103
+ - RMSProp
104
+ - Stochastic Depth
105
+ - Weight Decay
106
+ Training Data:
107
+ - ImageNet
108
+ ID: tf_efficientnet_b1_ap
109
+ LR: 0.256
110
+ Epochs: 350
111
+ Crop Pct: '0.882'
112
+ Momentum: 0.9
113
+ Batch Size: 2048
114
+ Image Size: '240'
115
+ Weight Decay: 1.0e-05
116
+ Interpolation: bicubic
117
+ RMSProp Decay: 0.9
118
+ Label Smoothing: 0.1
119
+ BatchNorm Momentum: 0.99
120
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1344
121
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b1_ap-44ef0a3d.pth
122
+ Results:
123
+ - Task: Image Classification
124
+ Dataset: ImageNet
125
+ Metrics:
126
+ Top 1 Accuracy: 79.28%
127
+ Top 5 Accuracy: 94.3%
128
+ - Name: tf_efficientnet_b2_ap
129
+ In Collection: AdvProp
130
+ Metadata:
131
+ FLOPs: 1234321170
132
+ Parameters: 9110000
133
+ File Size: 36800745
134
+ Architecture:
135
+ - 1x1 Convolution
136
+ - Average Pooling
137
+ - Batch Normalization
138
+ - Convolution
139
+ - Dense Connections
140
+ - Dropout
141
+ - Inverted Residual Block
142
+ - Squeeze-and-Excitation Block
143
+ - Swish
144
+ Tasks:
145
+ - Image Classification
146
+ Training Techniques:
147
+ - AdvProp
148
+ - AutoAugment
149
+ - Label Smoothing
150
+ - RMSProp
151
+ - Stochastic Depth
152
+ - Weight Decay
153
+ Training Data:
154
+ - ImageNet
155
+ ID: tf_efficientnet_b2_ap
156
+ LR: 0.256
157
+ Epochs: 350
158
+ Crop Pct: '0.89'
159
+ Momentum: 0.9
160
+ Batch Size: 2048
161
+ Image Size: '260'
162
+ Weight Decay: 1.0e-05
163
+ Interpolation: bicubic
164
+ RMSProp Decay: 0.9
165
+ Label Smoothing: 0.1
166
+ BatchNorm Momentum: 0.99
167
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1354
168
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b2_ap-2f8e7636.pth
169
+ Results:
170
+ - Task: Image Classification
171
+ Dataset: ImageNet
172
+ Metrics:
173
+ Top 1 Accuracy: 80.3%
174
+ Top 5 Accuracy: 95.03%
175
+ - Name: tf_efficientnet_b3_ap
176
+ In Collection: AdvProp
177
+ Metadata:
178
+ FLOPs: 2275247568
179
+ Parameters: 12230000
180
+ File Size: 49384538
181
+ Architecture:
182
+ - 1x1 Convolution
183
+ - Average Pooling
184
+ - Batch Normalization
185
+ - Convolution
186
+ - Dense Connections
187
+ - Dropout
188
+ - Inverted Residual Block
189
+ - Squeeze-and-Excitation Block
190
+ - Swish
191
+ Tasks:
192
+ - Image Classification
193
+ Training Techniques:
194
+ - AdvProp
195
+ - AutoAugment
196
+ - Label Smoothing
197
+ - RMSProp
198
+ - Stochastic Depth
199
+ - Weight Decay
200
+ Training Data:
201
+ - ImageNet
202
+ ID: tf_efficientnet_b3_ap
203
+ LR: 0.256
204
+ Epochs: 350
205
+ Crop Pct: '0.904'
206
+ Momentum: 0.9
207
+ Batch Size: 2048
208
+ Image Size: '300'
209
+ Weight Decay: 1.0e-05
210
+ Interpolation: bicubic
211
+ RMSProp Decay: 0.9
212
+ Label Smoothing: 0.1
213
+ BatchNorm Momentum: 0.99
214
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1364
215
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b3_ap-aad25bdd.pth
216
+ Results:
217
+ - Task: Image Classification
218
+ Dataset: ImageNet
219
+ Metrics:
220
+ Top 1 Accuracy: 81.82%
221
+ Top 5 Accuracy: 95.62%
222
+ - Name: tf_efficientnet_b4_ap
223
+ In Collection: AdvProp
224
+ Metadata:
225
+ FLOPs: 5749638672
226
+ Parameters: 19340000
227
+ File Size: 77993585
228
+ Architecture:
229
+ - 1x1 Convolution
230
+ - Average Pooling
231
+ - Batch Normalization
232
+ - Convolution
233
+ - Dense Connections
234
+ - Dropout
235
+ - Inverted Residual Block
236
+ - Squeeze-and-Excitation Block
237
+ - Swish
238
+ Tasks:
239
+ - Image Classification
240
+ Training Techniques:
241
+ - AdvProp
242
+ - AutoAugment
243
+ - Label Smoothing
244
+ - RMSProp
245
+ - Stochastic Depth
246
+ - Weight Decay
247
+ Training Data:
248
+ - ImageNet
249
+ ID: tf_efficientnet_b4_ap
250
+ LR: 0.256
251
+ Epochs: 350
252
+ Crop Pct: '0.922'
253
+ Momentum: 0.9
254
+ Batch Size: 2048
255
+ Image Size: '380'
256
+ Weight Decay: 1.0e-05
257
+ Interpolation: bicubic
258
+ RMSProp Decay: 0.9
259
+ Label Smoothing: 0.1
260
+ BatchNorm Momentum: 0.99
261
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1374
262
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b4_ap-dedb23e6.pth
263
+ Results:
264
+ - Task: Image Classification
265
+ Dataset: ImageNet
266
+ Metrics:
267
+ Top 1 Accuracy: 83.26%
268
+ Top 5 Accuracy: 96.39%
269
+ - Name: tf_efficientnet_b5_ap
270
+ In Collection: AdvProp
271
+ Metadata:
272
+ FLOPs: 13176501888
273
+ Parameters: 30390000
274
+ File Size: 122403150
275
+ Architecture:
276
+ - 1x1 Convolution
277
+ - Average Pooling
278
+ - Batch Normalization
279
+ - Convolution
280
+ - Dense Connections
281
+ - Dropout
282
+ - Inverted Residual Block
283
+ - Squeeze-and-Excitation Block
284
+ - Swish
285
+ Tasks:
286
+ - Image Classification
287
+ Training Techniques:
288
+ - AdvProp
289
+ - AutoAugment
290
+ - Label Smoothing
291
+ - RMSProp
292
+ - Stochastic Depth
293
+ - Weight Decay
294
+ Training Data:
295
+ - ImageNet
296
+ ID: tf_efficientnet_b5_ap
297
+ LR: 0.256
298
+ Epochs: 350
299
+ Crop Pct: '0.934'
300
+ Momentum: 0.9
301
+ Batch Size: 2048
302
+ Image Size: '456'
303
+ Weight Decay: 1.0e-05
304
+ Interpolation: bicubic
305
+ RMSProp Decay: 0.9
306
+ Label Smoothing: 0.1
307
+ BatchNorm Momentum: 0.99
308
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1384
309
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b5_ap-9e82fae8.pth
310
+ Results:
311
+ - Task: Image Classification
312
+ Dataset: ImageNet
313
+ Metrics:
314
+ Top 1 Accuracy: 84.25%
315
+ Top 5 Accuracy: 96.97%
316
+ - Name: tf_efficientnet_b6_ap
317
+ In Collection: AdvProp
318
+ Metadata:
319
+ FLOPs: 24180518488
320
+ Parameters: 43040000
321
+ File Size: 173237466
322
+ Architecture:
323
+ - 1x1 Convolution
324
+ - Average Pooling
325
+ - Batch Normalization
326
+ - Convolution
327
+ - Dense Connections
328
+ - Dropout
329
+ - Inverted Residual Block
330
+ - Squeeze-and-Excitation Block
331
+ - Swish
332
+ Tasks:
333
+ - Image Classification
334
+ Training Techniques:
335
+ - AdvProp
336
+ - AutoAugment
337
+ - Label Smoothing
338
+ - RMSProp
339
+ - Stochastic Depth
340
+ - Weight Decay
341
+ Training Data:
342
+ - ImageNet
343
+ ID: tf_efficientnet_b6_ap
344
+ LR: 0.256
345
+ Epochs: 350
346
+ Crop Pct: '0.942'
347
+ Momentum: 0.9
348
+ Batch Size: 2048
349
+ Image Size: '528'
350
+ Weight Decay: 1.0e-05
351
+ Interpolation: bicubic
352
+ RMSProp Decay: 0.9
353
+ Label Smoothing: 0.1
354
+ BatchNorm Momentum: 0.99
355
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1394
356
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b6_ap-4ffb161f.pth
357
+ Results:
358
+ - Task: Image Classification
359
+ Dataset: ImageNet
360
+ Metrics:
361
+ Top 1 Accuracy: 84.79%
362
+ Top 5 Accuracy: 97.14%
363
+ - Name: tf_efficientnet_b7_ap
364
+ In Collection: AdvProp
365
+ Metadata:
366
+ FLOPs: 48205304880
367
+ Parameters: 66349999
368
+ File Size: 266850607
369
+ Architecture:
370
+ - 1x1 Convolution
371
+ - Average Pooling
372
+ - Batch Normalization
373
+ - Convolution
374
+ - Dense Connections
375
+ - Dropout
376
+ - Inverted Residual Block
377
+ - Squeeze-and-Excitation Block
378
+ - Swish
379
+ Tasks:
380
+ - Image Classification
381
+ Training Techniques:
382
+ - AdvProp
383
+ - AutoAugment
384
+ - Label Smoothing
385
+ - RMSProp
386
+ - Stochastic Depth
387
+ - Weight Decay
388
+ Training Data:
389
+ - ImageNet
390
+ ID: tf_efficientnet_b7_ap
391
+ LR: 0.256
392
+ Epochs: 350
393
+ Crop Pct: '0.949'
394
+ Momentum: 0.9
395
+ Batch Size: 2048
396
+ Image Size: '600'
397
+ Weight Decay: 1.0e-05
398
+ Interpolation: bicubic
399
+ RMSProp Decay: 0.9
400
+ Label Smoothing: 0.1
401
+ BatchNorm Momentum: 0.99
402
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1405
403
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b7_ap-ddb28fec.pth
404
+ Results:
405
+ - Task: Image Classification
406
+ Dataset: ImageNet
407
+ Metrics:
408
+ Top 1 Accuracy: 85.12%
409
+ Top 5 Accuracy: 97.25%
410
+ - Name: tf_efficientnet_b8_ap
411
+ In Collection: AdvProp
412
+ Metadata:
413
+ FLOPs: 80962956270
414
+ Parameters: 87410000
415
+ File Size: 351412563
416
+ Architecture:
417
+ - 1x1 Convolution
418
+ - Average Pooling
419
+ - Batch Normalization
420
+ - Convolution
421
+ - Dense Connections
422
+ - Dropout
423
+ - Inverted Residual Block
424
+ - Squeeze-and-Excitation Block
425
+ - Swish
426
+ Tasks:
427
+ - Image Classification
428
+ Training Techniques:
429
+ - AdvProp
430
+ - AutoAugment
431
+ - Label Smoothing
432
+ - RMSProp
433
+ - Stochastic Depth
434
+ - Weight Decay
435
+ Training Data:
436
+ - ImageNet
437
+ ID: tf_efficientnet_b8_ap
438
+ LR: 0.128
439
+ Epochs: 350
440
+ Crop Pct: '0.954'
441
+ Momentum: 0.9
442
+ Batch Size: 2048
443
+ Image Size: '672'
444
+ Weight Decay: 1.0e-05
445
+ Interpolation: bicubic
446
+ RMSProp Decay: 0.9
447
+ Label Smoothing: 0.1
448
+ BatchNorm Momentum: 0.99
449
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L1416
450
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b8_ap-00e169fa.pth
451
+ Results:
452
+ - Task: Image Classification
453
+ Dataset: ImageNet
454
+ Metrics:
455
+ Top 1 Accuracy: 85.37%
456
+ Top 5 Accuracy: 97.3%
457
+ -->
docs/models/.templates/models/big-transfer.md ADDED
@@ -0,0 +1,295 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Big Transfer (BiT)
2
+
3
+ **Big Transfer (BiT)** is a type of pretraining recipe that pre-trains on a large supervised source dataset, and fine-tunes the weights on the target task. Models are trained on the JFT-300M dataset. The finetuned models contained in this collection are finetuned on ImageNet.
4
+
5
+ {% include 'code_snippets.md' %}
6
+
7
+ ## How do I train this model?
8
+
9
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
10
+
11
+ ## Citation
12
+
13
+ ```BibTeX
14
+ @misc{kolesnikov2020big,
15
+ title={Big Transfer (BiT): General Visual Representation Learning},
16
+ author={Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Joan Puigcerver and Jessica Yung and Sylvain Gelly and Neil Houlsby},
17
+ year={2020},
18
+ eprint={1912.11370},
19
+ archivePrefix={arXiv},
20
+ primaryClass={cs.CV}
21
+ }
22
+ ```
23
+
24
+ <!--
25
+ Type: model-index
26
+ Collections:
27
+ - Name: Big Transfer
28
+ Paper:
29
+ Title: 'Big Transfer (BiT): General Visual Representation Learning'
30
+ URL: https://paperswithcode.com/paper/large-scale-learning-of-general-visual
31
+ Models:
32
+ - Name: resnetv2_101x1_bitm
33
+ In Collection: Big Transfer
34
+ Metadata:
35
+ FLOPs: 5330896
36
+ Parameters: 44540000
37
+ File Size: 178256468
38
+ Architecture:
39
+ - 1x1 Convolution
40
+ - Bottleneck Residual Block
41
+ - Convolution
42
+ - Global Average Pooling
43
+ - Group Normalization
44
+ - Max Pooling
45
+ - ReLU
46
+ - Residual Block
47
+ - Residual Connection
48
+ - Softmax
49
+ - Weight Standardization
50
+ Tasks:
51
+ - Image Classification
52
+ Training Techniques:
53
+ - Mixup
54
+ - SGD with Momentum
55
+ - Weight Decay
56
+ Training Data:
57
+ - ImageNet
58
+ - JFT-300M
59
+ Training Resources: Cloud TPUv3-512
60
+ ID: resnetv2_101x1_bitm
61
+ LR: 0.03
62
+ Epochs: 90
63
+ Layers: 101
64
+ Crop Pct: '1.0'
65
+ Momentum: 0.9
66
+ Batch Size: 4096
67
+ Image Size: '480'
68
+ Weight Decay: 0.0001
69
+ Interpolation: bilinear
70
+ Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L444
71
+ Weights: https://storage.googleapis.com/bit_models/BiT-M-R101x1-ILSVRC2012.npz
72
+ Results:
73
+ - Task: Image Classification
74
+ Dataset: ImageNet
75
+ Metrics:
76
+ Top 1 Accuracy: 82.21%
77
+ Top 5 Accuracy: 96.47%
78
+ - Name: resnetv2_101x3_bitm
79
+ In Collection: Big Transfer
80
+ Metadata:
81
+ FLOPs: 15988688
82
+ Parameters: 387930000
83
+ File Size: 1551830100
84
+ Architecture:
85
+ - 1x1 Convolution
86
+ - Bottleneck Residual Block
87
+ - Convolution
88
+ - Global Average Pooling
89
+ - Group Normalization
90
+ - Max Pooling
91
+ - ReLU
92
+ - Residual Block
93
+ - Residual Connection
94
+ - Softmax
95
+ - Weight Standardization
96
+ Tasks:
97
+ - Image Classification
98
+ Training Techniques:
99
+ - Mixup
100
+ - SGD with Momentum
101
+ - Weight Decay
102
+ Training Data:
103
+ - ImageNet
104
+ - JFT-300M
105
+ Training Resources: Cloud TPUv3-512
106
+ ID: resnetv2_101x3_bitm
107
+ LR: 0.03
108
+ Epochs: 90
109
+ Layers: 101
110
+ Crop Pct: '1.0'
111
+ Momentum: 0.9
112
+ Batch Size: 4096
113
+ Image Size: '480'
114
+ Weight Decay: 0.0001
115
+ Interpolation: bilinear
116
+ Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L451
117
+ Weights: https://storage.googleapis.com/bit_models/BiT-M-R101x3-ILSVRC2012.npz
118
+ Results:
119
+ - Task: Image Classification
120
+ Dataset: ImageNet
121
+ Metrics:
122
+ Top 1 Accuracy: 84.38%
123
+ Top 5 Accuracy: 97.37%
124
+ - Name: resnetv2_152x2_bitm
125
+ In Collection: Big Transfer
126
+ Metadata:
127
+ FLOPs: 10659792
128
+ Parameters: 236340000
129
+ File Size: 945476668
130
+ Architecture:
131
+ - 1x1 Convolution
132
+ - Bottleneck Residual Block
133
+ - Convolution
134
+ - Global Average Pooling
135
+ - Group Normalization
136
+ - Max Pooling
137
+ - ReLU
138
+ - Residual Block
139
+ - Residual Connection
140
+ - Softmax
141
+ - Weight Standardization
142
+ Tasks:
143
+ - Image Classification
144
+ Training Techniques:
145
+ - Mixup
146
+ - SGD with Momentum
147
+ - Weight Decay
148
+ Training Data:
149
+ - ImageNet
150
+ - JFT-300M
151
+ ID: resnetv2_152x2_bitm
152
+ Crop Pct: '1.0'
153
+ Image Size: '480'
154
+ Interpolation: bilinear
155
+ Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L458
156
+ Weights: https://storage.googleapis.com/bit_models/BiT-M-R152x2-ILSVRC2012.npz
157
+ Results:
158
+ - Task: Image Classification
159
+ Dataset: ImageNet
160
+ Metrics:
161
+ Top 1 Accuracy: 84.4%
162
+ Top 5 Accuracy: 97.43%
163
+ - Name: resnetv2_152x4_bitm
164
+ In Collection: Big Transfer
165
+ Metadata:
166
+ FLOPs: 21317584
167
+ Parameters: 936530000
168
+ File Size: 3746270104
169
+ Architecture:
170
+ - 1x1 Convolution
171
+ - Bottleneck Residual Block
172
+ - Convolution
173
+ - Global Average Pooling
174
+ - Group Normalization
175
+ - Max Pooling
176
+ - ReLU
177
+ - Residual Block
178
+ - Residual Connection
179
+ - Softmax
180
+ - Weight Standardization
181
+ Tasks:
182
+ - Image Classification
183
+ Training Techniques:
184
+ - Mixup
185
+ - SGD with Momentum
186
+ - Weight Decay
187
+ Training Data:
188
+ - ImageNet
189
+ - JFT-300M
190
+ Training Resources: Cloud TPUv3-512
191
+ ID: resnetv2_152x4_bitm
192
+ Crop Pct: '1.0'
193
+ Image Size: '480'
194
+ Interpolation: bilinear
195
+ Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L465
196
+ Weights: https://storage.googleapis.com/bit_models/BiT-M-R152x4-ILSVRC2012.npz
197
+ Results:
198
+ - Task: Image Classification
199
+ Dataset: ImageNet
200
+ Metrics:
201
+ Top 1 Accuracy: 84.95%
202
+ Top 5 Accuracy: 97.45%
203
+ - Name: resnetv2_50x1_bitm
204
+ In Collection: Big Transfer
205
+ Metadata:
206
+ FLOPs: 5330896
207
+ Parameters: 25550000
208
+ File Size: 102242668
209
+ Architecture:
210
+ - 1x1 Convolution
211
+ - Bottleneck Residual Block
212
+ - Convolution
213
+ - Global Average Pooling
214
+ - Group Normalization
215
+ - Max Pooling
216
+ - ReLU
217
+ - Residual Block
218
+ - Residual Connection
219
+ - Softmax
220
+ - Weight Standardization
221
+ Tasks:
222
+ - Image Classification
223
+ Training Techniques:
224
+ - Mixup
225
+ - SGD with Momentum
226
+ - Weight Decay
227
+ Training Data:
228
+ - ImageNet
229
+ - JFT-300M
230
+ Training Resources: Cloud TPUv3-512
231
+ ID: resnetv2_50x1_bitm
232
+ LR: 0.03
233
+ Epochs: 90
234
+ Layers: 50
235
+ Crop Pct: '1.0'
236
+ Momentum: 0.9
237
+ Batch Size: 4096
238
+ Image Size: '480'
239
+ Weight Decay: 0.0001
240
+ Interpolation: bilinear
241
+ Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L430
242
+ Weights: https://storage.googleapis.com/bit_models/BiT-M-R50x1-ILSVRC2012.npz
243
+ Results:
244
+ - Task: Image Classification
245
+ Dataset: ImageNet
246
+ Metrics:
247
+ Top 1 Accuracy: 80.19%
248
+ Top 5 Accuracy: 95.63%
249
+ - Name: resnetv2_50x3_bitm
250
+ In Collection: Big Transfer
251
+ Metadata:
252
+ FLOPs: 15988688
253
+ Parameters: 217320000
254
+ File Size: 869321580
255
+ Architecture:
256
+ - 1x1 Convolution
257
+ - Bottleneck Residual Block
258
+ - Convolution
259
+ - Global Average Pooling
260
+ - Group Normalization
261
+ - Max Pooling
262
+ - ReLU
263
+ - Residual Block
264
+ - Residual Connection
265
+ - Softmax
266
+ - Weight Standardization
267
+ Tasks:
268
+ - Image Classification
269
+ Training Techniques:
270
+ - Mixup
271
+ - SGD with Momentum
272
+ - Weight Decay
273
+ Training Data:
274
+ - ImageNet
275
+ - JFT-300M
276
+ Training Resources: Cloud TPUv3-512
277
+ ID: resnetv2_50x3_bitm
278
+ LR: 0.03
279
+ Epochs: 90
280
+ Layers: 50
281
+ Crop Pct: '1.0'
282
+ Momentum: 0.9
283
+ Batch Size: 4096
284
+ Image Size: '480'
285
+ Weight Decay: 0.0001
286
+ Interpolation: bilinear
287
+ Code: https://github.com/rwightman/pytorch-image-models/blob/b9843f954b0457af2db4f9dea41a8538f51f5d78/timm/models/resnetv2.py#L437
288
+ Weights: https://storage.googleapis.com/bit_models/BiT-M-R50x3-ILSVRC2012.npz
289
+ Results:
290
+ - Task: Image Classification
291
+ Dataset: ImageNet
292
+ Metrics:
293
+ Top 1 Accuracy: 83.75%
294
+ Top 5 Accuracy: 97.12%
295
+ -->
docs/models/.templates/models/csp-darknet.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CSP-DarkNet
2
+
3
+ **CSPDarknet53** is a convolutional neural network and backbone for object detection that uses [DarkNet-53](https://paperswithcode.com/method/darknet-53). It employs a CSPNet strategy to partition the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. The use of a split and merge strategy allows for more gradient flow through the network.
4
+
5
+ This CNN is used as the backbone for [YOLOv4](https://paperswithcode.com/method/yolov4).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{bochkovskiy2020yolov4,
17
+ title={YOLOv4: Optimal Speed and Accuracy of Object Detection},
18
+ author={Alexey Bochkovskiy and Chien-Yao Wang and Hong-Yuan Mark Liao},
19
+ year={2020},
20
+ eprint={2004.10934},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: CSP DarkNet
30
+ Paper:
31
+ Title: 'YOLOv4: Optimal Speed and Accuracy of Object Detection'
32
+ URL: https://paperswithcode.com/paper/yolov4-optimal-speed-and-accuracy-of-object
33
+ Models:
34
+ - Name: cspdarknet53
35
+ In Collection: CSP DarkNet
36
+ Metadata:
37
+ FLOPs: 8545018880
38
+ Parameters: 27640000
39
+ File Size: 110775135
40
+ Architecture:
41
+ - 1x1 Convolution
42
+ - Batch Normalization
43
+ - Convolution
44
+ - Global Average Pooling
45
+ - Mish
46
+ - Residual Connection
47
+ - Softmax
48
+ Tasks:
49
+ - Image Classification
50
+ Training Techniques:
51
+ - CutMix
52
+ - Label Smoothing
53
+ - Mosaic
54
+ - Polynomial Learning Rate Decay
55
+ - SGD with Momentum
56
+ - Self-Adversarial Training
57
+ - Weight Decay
58
+ Training Data:
59
+ - ImageNet
60
+ Training Resources: 1x NVIDIA RTX 2070 GPU
61
+ ID: cspdarknet53
62
+ LR: 0.1
63
+ Layers: 53
64
+ Crop Pct: '0.887'
65
+ Momentum: 0.9
66
+ Batch Size: 128
67
+ Image Size: '256'
68
+ Warmup Steps: 1000
69
+ Weight Decay: 0.0005
70
+ Interpolation: bilinear
71
+ Training Steps: 8000000
72
+ FPS (GPU RTX 2070): 66
73
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/cspnet.py#L441
74
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/cspdarknet53_ra_256-d05c7c21.pth
75
+ Results:
76
+ - Task: Image Classification
77
+ Dataset: ImageNet
78
+ Metrics:
79
+ Top 1 Accuracy: 80.05%
80
+ Top 5 Accuracy: 95.09%
81
+ -->
docs/models/.templates/models/csp-resnet.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CSP-ResNet
2
+
3
+ **CSPResNet** is a convolutional neural network where we apply the Cross Stage Partial Network (CSPNet) approach to [ResNet](https://paperswithcode.com/method/resnet). The CSPNet partitions the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. The use of a split and merge strategy allows for more gradient flow through the network.
4
+
5
+ {% include 'code_snippets.md' %}
6
+
7
+ ## How do I train this model?
8
+
9
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
10
+
11
+ ## Citation
12
+
13
+ ```BibTeX
14
+ @misc{wang2019cspnet,
15
+ title={CSPNet: A New Backbone that can Enhance Learning Capability of CNN},
16
+ author={Chien-Yao Wang and Hong-Yuan Mark Liao and I-Hau Yeh and Yueh-Hua Wu and Ping-Yang Chen and Jun-Wei Hsieh},
17
+ year={2019},
18
+ eprint={1911.11929},
19
+ archivePrefix={arXiv},
20
+ primaryClass={cs.CV}
21
+ }
22
+ ```
23
+
24
+ <!--
25
+ Type: model-index
26
+ Collections:
27
+ - Name: CSP ResNet
28
+ Paper:
29
+ Title: 'CSPNet: A New Backbone that can Enhance Learning Capability of CNN'
30
+ URL: https://paperswithcode.com/paper/cspnet-a-new-backbone-that-can-enhance
31
+ Models:
32
+ - Name: cspresnet50
33
+ In Collection: CSP ResNet
34
+ Metadata:
35
+ FLOPs: 5924992000
36
+ Parameters: 21620000
37
+ File Size: 86679303
38
+ Architecture:
39
+ - 1x1 Convolution
40
+ - Batch Normalization
41
+ - Bottleneck Residual Block
42
+ - Convolution
43
+ - Global Average Pooling
44
+ - Max Pooling
45
+ - ReLU
46
+ - Residual Block
47
+ - Residual Connection
48
+ - Softmax
49
+ Tasks:
50
+ - Image Classification
51
+ Training Techniques:
52
+ - Label Smoothing
53
+ - Polynomial Learning Rate Decay
54
+ - SGD with Momentum
55
+ - Weight Decay
56
+ Training Data:
57
+ - ImageNet
58
+ ID: cspresnet50
59
+ LR: 0.1
60
+ Layers: 50
61
+ Crop Pct: '0.887'
62
+ Momentum: 0.9
63
+ Batch Size: 128
64
+ Image Size: '256'
65
+ Weight Decay: 0.005
66
+ Interpolation: bilinear
67
+ Training Steps: 8000000
68
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/cspnet.py#L415
69
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/cspresnet50_ra-d3e8d487.pth
70
+ Results:
71
+ - Task: Image Classification
72
+ Dataset: ImageNet
73
+ Metrics:
74
+ Top 1 Accuracy: 79.57%
75
+ Top 5 Accuracy: 94.71%
76
+ -->
docs/models/.templates/models/csp-resnext.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CSP-ResNeXt
2
+
3
+ **CSPResNeXt** is a convolutional neural network where we apply the Cross Stage Partial Network (CSPNet) approach to [ResNeXt](https://paperswithcode.com/method/resnext). The CSPNet partitions the feature map of the base layer into two parts and then merges them through a cross-stage hierarchy. The use of a split and merge strategy allows for more gradient flow through the network.
4
+
5
+ {% include 'code_snippets.md' %}
6
+
7
+ ## How do I train this model?
8
+
9
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
10
+
11
+ ## Citation
12
+
13
+ ```BibTeX
14
+ @misc{wang2019cspnet,
15
+ title={CSPNet: A New Backbone that can Enhance Learning Capability of CNN},
16
+ author={Chien-Yao Wang and Hong-Yuan Mark Liao and I-Hau Yeh and Yueh-Hua Wu and Ping-Yang Chen and Jun-Wei Hsieh},
17
+ year={2019},
18
+ eprint={1911.11929},
19
+ archivePrefix={arXiv},
20
+ primaryClass={cs.CV}
21
+ }
22
+ ```
23
+
24
+ <!--
25
+ Type: model-index
26
+ Collections:
27
+ - Name: CSP ResNeXt
28
+ Paper:
29
+ Title: 'CSPNet: A New Backbone that can Enhance Learning Capability of CNN'
30
+ URL: https://paperswithcode.com/paper/cspnet-a-new-backbone-that-can-enhance
31
+ Models:
32
+ - Name: cspresnext50
33
+ In Collection: CSP ResNeXt
34
+ Metadata:
35
+ FLOPs: 3962945536
36
+ Parameters: 20570000
37
+ File Size: 82562887
38
+ Architecture:
39
+ - 1x1 Convolution
40
+ - Batch Normalization
41
+ - Convolution
42
+ - Global Average Pooling
43
+ - Grouped Convolution
44
+ - Max Pooling
45
+ - ReLU
46
+ - ResNeXt Block
47
+ - Residual Connection
48
+ - Softmax
49
+ Tasks:
50
+ - Image Classification
51
+ Training Techniques:
52
+ - Label Smoothing
53
+ - Polynomial Learning Rate Decay
54
+ - SGD with Momentum
55
+ - Weight Decay
56
+ Training Data:
57
+ - ImageNet
58
+ Training Resources: 1x GPU
59
+ ID: cspresnext50
60
+ LR: 0.1
61
+ Layers: 50
62
+ Crop Pct: '0.875'
63
+ Momentum: 0.9
64
+ Batch Size: 128
65
+ Image Size: '224'
66
+ Weight Decay: 0.005
67
+ Interpolation: bilinear
68
+ Training Steps: 8000000
69
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/cspnet.py#L430
70
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/cspresnext50_ra_224-648b4713.pth
71
+ Results:
72
+ - Task: Image Classification
73
+ Dataset: ImageNet
74
+ Metrics:
75
+ Top 1 Accuracy: 80.05%
76
+ Top 5 Accuracy: 94.94%
77
+ -->
docs/models/.templates/models/densenet.md ADDED
@@ -0,0 +1,305 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DenseNet
2
+
3
+ **DenseNet** is a type of convolutional neural network that utilises dense connections between layers, through [Dense Blocks](http://www.paperswithcode.com/method/dense-block), where we connect *all layers* (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.
4
+
5
+ The **DenseNet Blur** variant in this collection by Ross Wightman employs [Blur Pooling](http://www.paperswithcode.com/method/blur-pooling)
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @article{DBLP:journals/corr/HuangLW16a,
17
+ author = {Gao Huang and
18
+ Zhuang Liu and
19
+ Kilian Q. Weinberger},
20
+ title = {Densely Connected Convolutional Networks},
21
+ journal = {CoRR},
22
+ volume = {abs/1608.06993},
23
+ year = {2016},
24
+ url = {http://arxiv.org/abs/1608.06993},
25
+ archivePrefix = {arXiv},
26
+ eprint = {1608.06993},
27
+ timestamp = {Mon, 10 Sep 2018 15:49:32 +0200},
28
+ biburl = {https://dblp.org/rec/journals/corr/HuangLW16a.bib},
29
+ bibsource = {dblp computer science bibliography, https://dblp.org}
30
+ }
31
+ ```
32
+
33
+ ```
34
+ @misc{rw2019timm,
35
+ author = {Ross Wightman},
36
+ title = {PyTorch Image Models},
37
+ year = {2019},
38
+ publisher = {GitHub},
39
+ journal = {GitHub repository},
40
+ doi = {10.5281/zenodo.4414861},
41
+ howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
42
+ }
43
+ ```
44
+
45
+ <!--
46
+ Type: model-index
47
+ Collections:
48
+ - Name: DenseNet
49
+ Paper:
50
+ Title: Densely Connected Convolutional Networks
51
+ URL: https://paperswithcode.com/paper/densely-connected-convolutional-networks
52
+ Models:
53
+ - Name: densenet121
54
+ In Collection: DenseNet
55
+ Metadata:
56
+ FLOPs: 3641843200
57
+ Parameters: 7980000
58
+ File Size: 32376726
59
+ Architecture:
60
+ - 1x1 Convolution
61
+ - Average Pooling
62
+ - Batch Normalization
63
+ - Convolution
64
+ - Dense Block
65
+ - Dense Connections
66
+ - Dropout
67
+ - Max Pooling
68
+ - ReLU
69
+ - Softmax
70
+ Tasks:
71
+ - Image Classification
72
+ Training Techniques:
73
+ - Kaiming Initialization
74
+ - Nesterov Accelerated Gradient
75
+ - Weight Decay
76
+ Training Data:
77
+ - ImageNet
78
+ ID: densenet121
79
+ LR: 0.1
80
+ Epochs: 90
81
+ Layers: 121
82
+ Dropout: 0.2
83
+ Crop Pct: '0.875'
84
+ Momentum: 0.9
85
+ Batch Size: 256
86
+ Image Size: '224'
87
+ Weight Decay: 0.0001
88
+ Interpolation: bicubic
89
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L295
90
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/densenet121_ra-50efcf5c.pth
91
+ Results:
92
+ - Task: Image Classification
93
+ Dataset: ImageNet
94
+ Metrics:
95
+ Top 1 Accuracy: 75.56%
96
+ Top 5 Accuracy: 92.65%
97
+ - Name: densenet161
98
+ In Collection: DenseNet
99
+ Metadata:
100
+ FLOPs: 9931959264
101
+ Parameters: 28680000
102
+ File Size: 115730790
103
+ Architecture:
104
+ - 1x1 Convolution
105
+ - Average Pooling
106
+ - Batch Normalization
107
+ - Convolution
108
+ - Dense Block
109
+ - Dense Connections
110
+ - Dropout
111
+ - Max Pooling
112
+ - ReLU
113
+ - Softmax
114
+ Tasks:
115
+ - Image Classification
116
+ Training Techniques:
117
+ - Kaiming Initialization
118
+ - Nesterov Accelerated Gradient
119
+ - Weight Decay
120
+ Training Data:
121
+ - ImageNet
122
+ ID: densenet161
123
+ LR: 0.1
124
+ Epochs: 90
125
+ Layers: 161
126
+ Dropout: 0.2
127
+ Crop Pct: '0.875'
128
+ Momentum: 0.9
129
+ Batch Size: 256
130
+ Image Size: '224'
131
+ Weight Decay: 0.0001
132
+ Interpolation: bicubic
133
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L347
134
+ Weights: https://download.pytorch.org/models/densenet161-8d451a50.pth
135
+ Results:
136
+ - Task: Image Classification
137
+ Dataset: ImageNet
138
+ Metrics:
139
+ Top 1 Accuracy: 77.36%
140
+ Top 5 Accuracy: 93.63%
141
+ - Name: densenet169
142
+ In Collection: DenseNet
143
+ Metadata:
144
+ FLOPs: 4316945792
145
+ Parameters: 14150000
146
+ File Size: 57365526
147
+ Architecture:
148
+ - 1x1 Convolution
149
+ - Average Pooling
150
+ - Batch Normalization
151
+ - Convolution
152
+ - Dense Block
153
+ - Dense Connections
154
+ - Dropout
155
+ - Max Pooling
156
+ - ReLU
157
+ - Softmax
158
+ Tasks:
159
+ - Image Classification
160
+ Training Techniques:
161
+ - Kaiming Initialization
162
+ - Nesterov Accelerated Gradient
163
+ - Weight Decay
164
+ Training Data:
165
+ - ImageNet
166
+ ID: densenet169
167
+ LR: 0.1
168
+ Epochs: 90
169
+ Layers: 169
170
+ Dropout: 0.2
171
+ Crop Pct: '0.875'
172
+ Momentum: 0.9
173
+ Batch Size: 256
174
+ Image Size: '224'
175
+ Weight Decay: 0.0001
176
+ Interpolation: bicubic
177
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L327
178
+ Weights: https://download.pytorch.org/models/densenet169-b2777c0a.pth
179
+ Results:
180
+ - Task: Image Classification
181
+ Dataset: ImageNet
182
+ Metrics:
183
+ Top 1 Accuracy: 75.9%
184
+ Top 5 Accuracy: 93.02%
185
+ - Name: densenet201
186
+ In Collection: DenseNet
187
+ Metadata:
188
+ FLOPs: 5514321024
189
+ Parameters: 20010000
190
+ File Size: 81131730
191
+ Architecture:
192
+ - 1x1 Convolution
193
+ - Average Pooling
194
+ - Batch Normalization
195
+ - Convolution
196
+ - Dense Block
197
+ - Dense Connections
198
+ - Dropout
199
+ - Max Pooling
200
+ - ReLU
201
+ - Softmax
202
+ Tasks:
203
+ - Image Classification
204
+ Training Techniques:
205
+ - Kaiming Initialization
206
+ - Nesterov Accelerated Gradient
207
+ - Weight Decay
208
+ Training Data:
209
+ - ImageNet
210
+ ID: densenet201
211
+ LR: 0.1
212
+ Epochs: 90
213
+ Layers: 201
214
+ Dropout: 0.2
215
+ Crop Pct: '0.875'
216
+ Momentum: 0.9
217
+ Batch Size: 256
218
+ Image Size: '224'
219
+ Weight Decay: 0.0001
220
+ Interpolation: bicubic
221
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L337
222
+ Weights: https://download.pytorch.org/models/densenet201-c1103571.pth
223
+ Results:
224
+ - Task: Image Classification
225
+ Dataset: ImageNet
226
+ Metrics:
227
+ Top 1 Accuracy: 77.29%
228
+ Top 5 Accuracy: 93.48%
229
+ - Name: densenetblur121d
230
+ In Collection: DenseNet
231
+ Metadata:
232
+ FLOPs: 3947812864
233
+ Parameters: 8000000
234
+ File Size: 32456500
235
+ Architecture:
236
+ - 1x1 Convolution
237
+ - Batch Normalization
238
+ - Blur Pooling
239
+ - Convolution
240
+ - Dense Block
241
+ - Dense Connections
242
+ - Dropout
243
+ - Max Pooling
244
+ - ReLU
245
+ - Softmax
246
+ Tasks:
247
+ - Image Classification
248
+ Training Data:
249
+ - ImageNet
250
+ ID: densenetblur121d
251
+ Crop Pct: '0.875'
252
+ Image Size: '224'
253
+ Interpolation: bicubic
254
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L305
255
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/densenetblur121d_ra-100dcfbc.pth
256
+ Results:
257
+ - Task: Image Classification
258
+ Dataset: ImageNet
259
+ Metrics:
260
+ Top 1 Accuracy: 76.59%
261
+ Top 5 Accuracy: 93.2%
262
+ - Name: tv_densenet121
263
+ In Collection: DenseNet
264
+ Metadata:
265
+ FLOPs: 3641843200
266
+ Parameters: 7980000
267
+ File Size: 32342954
268
+ Architecture:
269
+ - 1x1 Convolution
270
+ - Average Pooling
271
+ - Batch Normalization
272
+ - Convolution
273
+ - Dense Block
274
+ - Dense Connections
275
+ - Dropout
276
+ - Max Pooling
277
+ - ReLU
278
+ - Softmax
279
+ Tasks:
280
+ - Image Classification
281
+ Training Techniques:
282
+ - SGD with Momentum
283
+ - Weight Decay
284
+ Training Data:
285
+ - ImageNet
286
+ ID: tv_densenet121
287
+ LR: 0.1
288
+ Epochs: 90
289
+ Crop Pct: '0.875'
290
+ LR Gamma: 0.1
291
+ Momentum: 0.9
292
+ Batch Size: 32
293
+ Image Size: '224'
294
+ LR Step Size: 30
295
+ Weight Decay: 0.0001
296
+ Interpolation: bicubic
297
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/densenet.py#L379
298
+ Weights: https://download.pytorch.org/models/densenet121-a639ec97.pth
299
+ Results:
300
+ - Task: Image Classification
301
+ Dataset: ImageNet
302
+ Metrics:
303
+ Top 1 Accuracy: 74.74%
304
+ Top 5 Accuracy: 92.15%
305
+ -->
docs/models/.templates/models/dla.md ADDED
@@ -0,0 +1,545 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deep Layer Aggregation
2
+
3
+ Extending “shallow” skip connections, **Dense Layer Aggregation (DLA)** incorporates more depth and sharing. The authors introduce two structures for deep layer aggregation (DLA): iterative deep aggregation (IDA) and hierarchical deep aggregation (HDA). These structures are expressed through an architectural framework, independent of the choice of backbone, for compatibility with current and future networks.
4
+
5
+ IDA focuses on fusing resolutions and scales while HDA focuses on merging features from all modules and channels. IDA follows the base hierarchy to refine resolution and aggregate scale stage-bystage. HDA assembles its own hierarchy of tree-structured connections that cross and merge stages to aggregate different levels of representation.
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{yu2019deep,
17
+ title={Deep Layer Aggregation},
18
+ author={Fisher Yu and Dequan Wang and Evan Shelhamer and Trevor Darrell},
19
+ year={2019},
20
+ eprint={1707.06484},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: DLA
30
+ Paper:
31
+ Title: Deep Layer Aggregation
32
+ URL: https://paperswithcode.com/paper/deep-layer-aggregation
33
+ Models:
34
+ - Name: dla102
35
+ In Collection: DLA
36
+ Metadata:
37
+ FLOPs: 7192952808
38
+ Parameters: 33270000
39
+ File Size: 135290579
40
+ Architecture:
41
+ - 1x1 Convolution
42
+ - Batch Normalization
43
+ - Convolution
44
+ - DLA Bottleneck Residual Block
45
+ - DLA Residual Block
46
+ - Global Average Pooling
47
+ - Max Pooling
48
+ - ReLU
49
+ - Residual Block
50
+ - Residual Connection
51
+ - Softmax
52
+ Tasks:
53
+ - Image Classification
54
+ Training Techniques:
55
+ - SGD with Momentum
56
+ - Weight Decay
57
+ Training Data:
58
+ - ImageNet
59
+ Training Resources: 8x GPUs
60
+ ID: dla102
61
+ LR: 0.1
62
+ Epochs: 120
63
+ Layers: 102
64
+ Crop Pct: '0.875'
65
+ Momentum: 0.9
66
+ Batch Size: 256
67
+ Image Size: '224'
68
+ Weight Decay: 0.0001
69
+ Interpolation: bilinear
70
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L410
71
+ Weights: http://dl.yf.io/dla/models/imagenet/dla102-d94d9790.pth
72
+ Results:
73
+ - Task: Image Classification
74
+ Dataset: ImageNet
75
+ Metrics:
76
+ Top 1 Accuracy: 78.03%
77
+ Top 5 Accuracy: 93.95%
78
+ - Name: dla102x
79
+ In Collection: DLA
80
+ Metadata:
81
+ FLOPs: 5886821352
82
+ Parameters: 26310000
83
+ File Size: 107552695
84
+ Architecture:
85
+ - 1x1 Convolution
86
+ - Batch Normalization
87
+ - Convolution
88
+ - DLA Bottleneck Residual Block
89
+ - DLA Residual Block
90
+ - Global Average Pooling
91
+ - Max Pooling
92
+ - ReLU
93
+ - Residual Block
94
+ - Residual Connection
95
+ - Softmax
96
+ Tasks:
97
+ - Image Classification
98
+ Training Techniques:
99
+ - SGD with Momentum
100
+ - Weight Decay
101
+ Training Data:
102
+ - ImageNet
103
+ Training Resources: 8x GPUs
104
+ ID: dla102x
105
+ LR: 0.1
106
+ Epochs: 120
107
+ Layers: 102
108
+ Crop Pct: '0.875'
109
+ Momentum: 0.9
110
+ Batch Size: 256
111
+ Image Size: '224'
112
+ Weight Decay: 0.0001
113
+ Interpolation: bilinear
114
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L418
115
+ Weights: http://dl.yf.io/dla/models/imagenet/dla102x-ad62be81.pth
116
+ Results:
117
+ - Task: Image Classification
118
+ Dataset: ImageNet
119
+ Metrics:
120
+ Top 1 Accuracy: 78.51%
121
+ Top 5 Accuracy: 94.23%
122
+ - Name: dla102x2
123
+ In Collection: DLA
124
+ Metadata:
125
+ FLOPs: 9343847400
126
+ Parameters: 41280000
127
+ File Size: 167645295
128
+ Architecture:
129
+ - 1x1 Convolution
130
+ - Batch Normalization
131
+ - Convolution
132
+ - DLA Bottleneck Residual Block
133
+ - DLA Residual Block
134
+ - Global Average Pooling
135
+ - Max Pooling
136
+ - ReLU
137
+ - Residual Block
138
+ - Residual Connection
139
+ - Softmax
140
+ Tasks:
141
+ - Image Classification
142
+ Training Techniques:
143
+ - SGD with Momentum
144
+ - Weight Decay
145
+ Training Data:
146
+ - ImageNet
147
+ Training Resources: 8x GPUs
148
+ ID: dla102x2
149
+ LR: 0.1
150
+ Epochs: 120
151
+ Layers: 102
152
+ Crop Pct: '0.875'
153
+ Momentum: 0.9
154
+ Batch Size: 256
155
+ Image Size: '224'
156
+ Weight Decay: 0.0001
157
+ Interpolation: bilinear
158
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L426
159
+ Weights: http://dl.yf.io/dla/models/imagenet/dla102x2-262837b6.pth
160
+ Results:
161
+ - Task: Image Classification
162
+ Dataset: ImageNet
163
+ Metrics:
164
+ Top 1 Accuracy: 79.44%
165
+ Top 5 Accuracy: 94.65%
166
+ - Name: dla169
167
+ In Collection: DLA
168
+ Metadata:
169
+ FLOPs: 11598004200
170
+ Parameters: 53390000
171
+ File Size: 216547113
172
+ Architecture:
173
+ - 1x1 Convolution
174
+ - Batch Normalization
175
+ - Convolution
176
+ - DLA Bottleneck Residual Block
177
+ - DLA Residual Block
178
+ - Global Average Pooling
179
+ - Max Pooling
180
+ - ReLU
181
+ - Residual Block
182
+ - Residual Connection
183
+ - Softmax
184
+ Tasks:
185
+ - Image Classification
186
+ Training Techniques:
187
+ - SGD with Momentum
188
+ - Weight Decay
189
+ Training Data:
190
+ - ImageNet
191
+ Training Resources: 8x GPUs
192
+ ID: dla169
193
+ LR: 0.1
194
+ Epochs: 120
195
+ Layers: 169
196
+ Crop Pct: '0.875'
197
+ Momentum: 0.9
198
+ Batch Size: 256
199
+ Image Size: '224'
200
+ Weight Decay: 0.0001
201
+ Interpolation: bilinear
202
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L434
203
+ Weights: http://dl.yf.io/dla/models/imagenet/dla169-0914e092.pth
204
+ Results:
205
+ - Task: Image Classification
206
+ Dataset: ImageNet
207
+ Metrics:
208
+ Top 1 Accuracy: 78.69%
209
+ Top 5 Accuracy: 94.33%
210
+ - Name: dla34
211
+ In Collection: DLA
212
+ Metadata:
213
+ FLOPs: 3070105576
214
+ Parameters: 15740000
215
+ File Size: 63228658
216
+ Architecture:
217
+ - 1x1 Convolution
218
+ - Batch Normalization
219
+ - Convolution
220
+ - DLA Bottleneck Residual Block
221
+ - DLA Residual Block
222
+ - Global Average Pooling
223
+ - Max Pooling
224
+ - ReLU
225
+ - Residual Block
226
+ - Residual Connection
227
+ - Softmax
228
+ Tasks:
229
+ - Image Classification
230
+ Training Techniques:
231
+ - SGD with Momentum
232
+ - Weight Decay
233
+ Training Data:
234
+ - ImageNet
235
+ ID: dla34
236
+ LR: 0.1
237
+ Epochs: 120
238
+ Layers: 32
239
+ Crop Pct: '0.875'
240
+ Momentum: 0.9
241
+ Batch Size: 256
242
+ Image Size: '224'
243
+ Weight Decay: 0.0001
244
+ Interpolation: bilinear
245
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L362
246
+ Weights: http://dl.yf.io/dla/models/imagenet/dla34-ba72cf86.pth
247
+ Results:
248
+ - Task: Image Classification
249
+ Dataset: ImageNet
250
+ Metrics:
251
+ Top 1 Accuracy: 74.62%
252
+ Top 5 Accuracy: 92.06%
253
+ - Name: dla46_c
254
+ In Collection: DLA
255
+ Metadata:
256
+ FLOPs: 583277288
257
+ Parameters: 1300000
258
+ File Size: 5307963
259
+ Architecture:
260
+ - 1x1 Convolution
261
+ - Batch Normalization
262
+ - Convolution
263
+ - DLA Bottleneck Residual Block
264
+ - DLA Residual Block
265
+ - Global Average Pooling
266
+ - Max Pooling
267
+ - ReLU
268
+ - Residual Block
269
+ - Residual Connection
270
+ - Softmax
271
+ Tasks:
272
+ - Image Classification
273
+ Training Techniques:
274
+ - SGD with Momentum
275
+ - Weight Decay
276
+ Training Data:
277
+ - ImageNet
278
+ ID: dla46_c
279
+ LR: 0.1
280
+ Epochs: 120
281
+ Layers: 46
282
+ Crop Pct: '0.875'
283
+ Momentum: 0.9
284
+ Batch Size: 256
285
+ Image Size: '224'
286
+ Weight Decay: 0.0001
287
+ Interpolation: bilinear
288
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L369
289
+ Weights: http://dl.yf.io/dla/models/imagenet/dla46_c-2bfd52c3.pth
290
+ Results:
291
+ - Task: Image Classification
292
+ Dataset: ImageNet
293
+ Metrics:
294
+ Top 1 Accuracy: 64.87%
295
+ Top 5 Accuracy: 86.29%
296
+ - Name: dla46x_c
297
+ In Collection: DLA
298
+ Metadata:
299
+ FLOPs: 544052200
300
+ Parameters: 1070000
301
+ File Size: 4387641
302
+ Architecture:
303
+ - 1x1 Convolution
304
+ - Batch Normalization
305
+ - Convolution
306
+ - DLA Bottleneck Residual Block
307
+ - DLA Residual Block
308
+ - Global Average Pooling
309
+ - Max Pooling
310
+ - ReLU
311
+ - Residual Block
312
+ - Residual Connection
313
+ - Softmax
314
+ Tasks:
315
+ - Image Classification
316
+ Training Techniques:
317
+ - SGD with Momentum
318
+ - Weight Decay
319
+ Training Data:
320
+ - ImageNet
321
+ ID: dla46x_c
322
+ LR: 0.1
323
+ Epochs: 120
324
+ Layers: 46
325
+ Crop Pct: '0.875'
326
+ Momentum: 0.9
327
+ Batch Size: 256
328
+ Image Size: '224'
329
+ Weight Decay: 0.0001
330
+ Interpolation: bilinear
331
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L378
332
+ Weights: http://dl.yf.io/dla/models/imagenet/dla46x_c-d761bae7.pth
333
+ Results:
334
+ - Task: Image Classification
335
+ Dataset: ImageNet
336
+ Metrics:
337
+ Top 1 Accuracy: 65.98%
338
+ Top 5 Accuracy: 86.99%
339
+ - Name: dla60
340
+ In Collection: DLA
341
+ Metadata:
342
+ FLOPs: 4256251880
343
+ Parameters: 22040000
344
+ File Size: 89560235
345
+ Architecture:
346
+ - 1x1 Convolution
347
+ - Batch Normalization
348
+ - Convolution
349
+ - DLA Bottleneck Residual Block
350
+ - DLA Residual Block
351
+ - Global Average Pooling
352
+ - Max Pooling
353
+ - ReLU
354
+ - Residual Block
355
+ - Residual Connection
356
+ - Softmax
357
+ Tasks:
358
+ - Image Classification
359
+ Training Techniques:
360
+ - SGD with Momentum
361
+ - Weight Decay
362
+ Training Data:
363
+ - ImageNet
364
+ ID: dla60
365
+ LR: 0.1
366
+ Epochs: 120
367
+ Layers: 60
368
+ Dropout: 0.2
369
+ Crop Pct: '0.875'
370
+ Momentum: 0.9
371
+ Batch Size: 256
372
+ Image Size: '224'
373
+ Weight Decay: 0.0001
374
+ Interpolation: bilinear
375
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L394
376
+ Weights: http://dl.yf.io/dla/models/imagenet/dla60-24839fc4.pth
377
+ Results:
378
+ - Task: Image Classification
379
+ Dataset: ImageNet
380
+ Metrics:
381
+ Top 1 Accuracy: 77.04%
382
+ Top 5 Accuracy: 93.32%
383
+ - Name: dla60_res2net
384
+ In Collection: DLA
385
+ Metadata:
386
+ FLOPs: 4147578504
387
+ Parameters: 20850000
388
+ File Size: 84886593
389
+ Architecture:
390
+ - 1x1 Convolution
391
+ - Batch Normalization
392
+ - Convolution
393
+ - DLA Bottleneck Residual Block
394
+ - DLA Residual Block
395
+ - Global Average Pooling
396
+ - Max Pooling
397
+ - ReLU
398
+ - Residual Block
399
+ - Residual Connection
400
+ - Softmax
401
+ Tasks:
402
+ - Image Classification
403
+ Training Techniques:
404
+ - SGD with Momentum
405
+ - Weight Decay
406
+ Training Data:
407
+ - ImageNet
408
+ ID: dla60_res2net
409
+ Layers: 60
410
+ Crop Pct: '0.875'
411
+ Image Size: '224'
412
+ Interpolation: bilinear
413
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L346
414
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-res2net/res2net_dla60_4s-d88db7f9.pth
415
+ Results:
416
+ - Task: Image Classification
417
+ Dataset: ImageNet
418
+ Metrics:
419
+ Top 1 Accuracy: 78.46%
420
+ Top 5 Accuracy: 94.21%
421
+ - Name: dla60_res2next
422
+ In Collection: DLA
423
+ Metadata:
424
+ FLOPs: 3485335272
425
+ Parameters: 17030000
426
+ File Size: 69639245
427
+ Architecture:
428
+ - 1x1 Convolution
429
+ - Batch Normalization
430
+ - Convolution
431
+ - DLA Bottleneck Residual Block
432
+ - DLA Residual Block
433
+ - Global Average Pooling
434
+ - Max Pooling
435
+ - ReLU
436
+ - Residual Block
437
+ - Residual Connection
438
+ - Softmax
439
+ Tasks:
440
+ - Image Classification
441
+ Training Techniques:
442
+ - SGD with Momentum
443
+ - Weight Decay
444
+ Training Data:
445
+ - ImageNet
446
+ ID: dla60_res2next
447
+ Layers: 60
448
+ Crop Pct: '0.875'
449
+ Image Size: '224'
450
+ Interpolation: bilinear
451
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L354
452
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-res2net/res2next_dla60_4s-d327927b.pth
453
+ Results:
454
+ - Task: Image Classification
455
+ Dataset: ImageNet
456
+ Metrics:
457
+ Top 1 Accuracy: 78.44%
458
+ Top 5 Accuracy: 94.16%
459
+ - Name: dla60x
460
+ In Collection: DLA
461
+ Metadata:
462
+ FLOPs: 3544204264
463
+ Parameters: 17350000
464
+ File Size: 70883139
465
+ Architecture:
466
+ - 1x1 Convolution
467
+ - Batch Normalization
468
+ - Convolution
469
+ - DLA Bottleneck Residual Block
470
+ - DLA Residual Block
471
+ - Global Average Pooling
472
+ - Max Pooling
473
+ - ReLU
474
+ - Residual Block
475
+ - Residual Connection
476
+ - Softmax
477
+ Tasks:
478
+ - Image Classification
479
+ Training Techniques:
480
+ - SGD with Momentum
481
+ - Weight Decay
482
+ Training Data:
483
+ - ImageNet
484
+ ID: dla60x
485
+ LR: 0.1
486
+ Epochs: 120
487
+ Layers: 60
488
+ Crop Pct: '0.875'
489
+ Momentum: 0.9
490
+ Batch Size: 256
491
+ Image Size: '224'
492
+ Weight Decay: 0.0001
493
+ Interpolation: bilinear
494
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L402
495
+ Weights: http://dl.yf.io/dla/models/imagenet/dla60x-d15cacda.pth
496
+ Results:
497
+ - Task: Image Classification
498
+ Dataset: ImageNet
499
+ Metrics:
500
+ Top 1 Accuracy: 78.25%
501
+ Top 5 Accuracy: 94.02%
502
+ - Name: dla60x_c
503
+ In Collection: DLA
504
+ Metadata:
505
+ FLOPs: 593325032
506
+ Parameters: 1320000
507
+ File Size: 5454396
508
+ Architecture:
509
+ - 1x1 Convolution
510
+ - Batch Normalization
511
+ - Convolution
512
+ - DLA Bottleneck Residual Block
513
+ - DLA Residual Block
514
+ - Global Average Pooling
515
+ - Max Pooling
516
+ - ReLU
517
+ - Residual Block
518
+ - Residual Connection
519
+ - Softmax
520
+ Tasks:
521
+ - Image Classification
522
+ Training Techniques:
523
+ - SGD with Momentum
524
+ - Weight Decay
525
+ Training Data:
526
+ - ImageNet
527
+ ID: dla60x_c
528
+ LR: 0.1
529
+ Epochs: 120
530
+ Layers: 60
531
+ Crop Pct: '0.875'
532
+ Momentum: 0.9
533
+ Batch Size: 256
534
+ Image Size: '224'
535
+ Weight Decay: 0.0001
536
+ Interpolation: bilinear
537
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dla.py#L386
538
+ Weights: http://dl.yf.io/dla/models/imagenet/dla60x_c-b870c45c.pth
539
+ Results:
540
+ - Task: Image Classification
541
+ Dataset: ImageNet
542
+ Metrics:
543
+ Top 1 Accuracy: 67.91%
544
+ Top 5 Accuracy: 88.42%
545
+ -->
docs/models/.templates/models/dpn.md ADDED
@@ -0,0 +1,256 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dual Path Network (DPN)
2
+
3
+ A **Dual Path Network (DPN)** is a convolutional neural network which presents a new topology of connection paths internally. The intuition is that [ResNets](https://paperswithcode.com/method/resnet) enables feature re-usage while DenseNet enables new feature exploration, and both are important for learning good representations. To enjoy the benefits from both path topologies, Dual Path Networks share common features while maintaining the flexibility to explore new features through dual path architectures.
4
+
5
+ The principal building block is an [DPN Block](https://paperswithcode.com/method/dpn-block).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{chen2017dual,
17
+ title={Dual Path Networks},
18
+ author={Yunpeng Chen and Jianan Li and Huaxin Xiao and Xiaojie Jin and Shuicheng Yan and Jiashi Feng},
19
+ year={2017},
20
+ eprint={1707.01629},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: DPN
30
+ Paper:
31
+ Title: Dual Path Networks
32
+ URL: https://paperswithcode.com/paper/dual-path-networks
33
+ Models:
34
+ - Name: dpn107
35
+ In Collection: DPN
36
+ Metadata:
37
+ FLOPs: 23524280296
38
+ Parameters: 86920000
39
+ File Size: 348612331
40
+ Architecture:
41
+ - Batch Normalization
42
+ - Convolution
43
+ - DPN Block
44
+ - Dense Connections
45
+ - Global Average Pooling
46
+ - Max Pooling
47
+ - Softmax
48
+ Tasks:
49
+ - Image Classification
50
+ Training Techniques:
51
+ - SGD with Momentum
52
+ - Weight Decay
53
+ Training Data:
54
+ - ImageNet
55
+ Training Resources: 40x K80 GPUs
56
+ ID: dpn107
57
+ LR: 0.316
58
+ Layers: 107
59
+ Crop Pct: '0.875'
60
+ Batch Size: 1280
61
+ Image Size: '224'
62
+ Interpolation: bicubic
63
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L310
64
+ Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn107_extra-1ac7121e2.pth
65
+ Results:
66
+ - Task: Image Classification
67
+ Dataset: ImageNet
68
+ Metrics:
69
+ Top 1 Accuracy: 80.16%
70
+ Top 5 Accuracy: 94.91%
71
+ - Name: dpn131
72
+ In Collection: DPN
73
+ Metadata:
74
+ FLOPs: 20586274792
75
+ Parameters: 79250000
76
+ File Size: 318016207
77
+ Architecture:
78
+ - Batch Normalization
79
+ - Convolution
80
+ - DPN Block
81
+ - Dense Connections
82
+ - Global Average Pooling
83
+ - Max Pooling
84
+ - Softmax
85
+ Tasks:
86
+ - Image Classification
87
+ Training Techniques:
88
+ - SGD with Momentum
89
+ - Weight Decay
90
+ Training Data:
91
+ - ImageNet
92
+ Training Resources: 40x K80 GPUs
93
+ ID: dpn131
94
+ LR: 0.316
95
+ Layers: 131
96
+ Crop Pct: '0.875'
97
+ Batch Size: 960
98
+ Image Size: '224'
99
+ Interpolation: bicubic
100
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L302
101
+ Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn131-71dfe43e0.pth
102
+ Results:
103
+ - Task: Image Classification
104
+ Dataset: ImageNet
105
+ Metrics:
106
+ Top 1 Accuracy: 79.83%
107
+ Top 5 Accuracy: 94.71%
108
+ - Name: dpn68
109
+ In Collection: DPN
110
+ Metadata:
111
+ FLOPs: 2990567880
112
+ Parameters: 12610000
113
+ File Size: 50761994
114
+ Architecture:
115
+ - Batch Normalization
116
+ - Convolution
117
+ - DPN Block
118
+ - Dense Connections
119
+ - Global Average Pooling
120
+ - Max Pooling
121
+ - Softmax
122
+ Tasks:
123
+ - Image Classification
124
+ Training Techniques:
125
+ - SGD with Momentum
126
+ - Weight Decay
127
+ Training Data:
128
+ - ImageNet
129
+ Training Resources: 40x K80 GPUs
130
+ ID: dpn68
131
+ LR: 0.316
132
+ Layers: 68
133
+ Crop Pct: '0.875'
134
+ Batch Size: 1280
135
+ Image Size: '224'
136
+ Interpolation: bicubic
137
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L270
138
+ Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn68-66bebafa7.pth
139
+ Results:
140
+ - Task: Image Classification
141
+ Dataset: ImageNet
142
+ Metrics:
143
+ Top 1 Accuracy: 76.31%
144
+ Top 5 Accuracy: 92.97%
145
+ - Name: dpn68b
146
+ In Collection: DPN
147
+ Metadata:
148
+ FLOPs: 2990567880
149
+ Parameters: 12610000
150
+ File Size: 50781025
151
+ Architecture:
152
+ - Batch Normalization
153
+ - Convolution
154
+ - DPN Block
155
+ - Dense Connections
156
+ - Global Average Pooling
157
+ - Max Pooling
158
+ - Softmax
159
+ Tasks:
160
+ - Image Classification
161
+ Training Techniques:
162
+ - SGD with Momentum
163
+ - Weight Decay
164
+ Training Data:
165
+ - ImageNet
166
+ Training Resources: 40x K80 GPUs
167
+ ID: dpn68b
168
+ LR: 0.316
169
+ Layers: 68
170
+ Crop Pct: '0.875'
171
+ Batch Size: 1280
172
+ Image Size: '224'
173
+ Interpolation: bicubic
174
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L278
175
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/dpn68b_ra-a31ca160.pth
176
+ Results:
177
+ - Task: Image Classification
178
+ Dataset: ImageNet
179
+ Metrics:
180
+ Top 1 Accuracy: 79.21%
181
+ Top 5 Accuracy: 94.42%
182
+ - Name: dpn92
183
+ In Collection: DPN
184
+ Metadata:
185
+ FLOPs: 8357659624
186
+ Parameters: 37670000
187
+ File Size: 151248422
188
+ Architecture:
189
+ - Batch Normalization
190
+ - Convolution
191
+ - DPN Block
192
+ - Dense Connections
193
+ - Global Average Pooling
194
+ - Max Pooling
195
+ - Softmax
196
+ Tasks:
197
+ - Image Classification
198
+ Training Techniques:
199
+ - SGD with Momentum
200
+ - Weight Decay
201
+ Training Data:
202
+ - ImageNet
203
+ Training Resources: 40x K80 GPUs
204
+ ID: dpn92
205
+ LR: 0.316
206
+ Layers: 92
207
+ Crop Pct: '0.875'
208
+ Batch Size: 1280
209
+ Image Size: '224'
210
+ Interpolation: bicubic
211
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L286
212
+ Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn92_extra-b040e4a9b.pth
213
+ Results:
214
+ - Task: Image Classification
215
+ Dataset: ImageNet
216
+ Metrics:
217
+ Top 1 Accuracy: 79.99%
218
+ Top 5 Accuracy: 94.84%
219
+ - Name: dpn98
220
+ In Collection: DPN
221
+ Metadata:
222
+ FLOPs: 15003675112
223
+ Parameters: 61570000
224
+ File Size: 247021307
225
+ Architecture:
226
+ - Batch Normalization
227
+ - Convolution
228
+ - DPN Block
229
+ - Dense Connections
230
+ - Global Average Pooling
231
+ - Max Pooling
232
+ - Softmax
233
+ Tasks:
234
+ - Image Classification
235
+ Training Techniques:
236
+ - SGD with Momentum
237
+ - Weight Decay
238
+ Training Data:
239
+ - ImageNet
240
+ Training Resources: 40x K80 GPUs
241
+ ID: dpn98
242
+ LR: 0.4
243
+ Layers: 98
244
+ Crop Pct: '0.875'
245
+ Batch Size: 1280
246
+ Image Size: '224'
247
+ Interpolation: bicubic
248
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/dpn.py#L294
249
+ Weights: https://github.com/rwightman/pytorch-dpn-pretrained/releases/download/v0.1/dpn98-5b90dec4d.pth
250
+ Results:
251
+ - Task: Image Classification
252
+ Dataset: ImageNet
253
+ Metrics:
254
+ Top 1 Accuracy: 79.65%
255
+ Top 5 Accuracy: 94.61%
256
+ -->
docs/models/.templates/models/ecaresnet.md ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ECA-ResNet
2
+
3
+ An **ECA ResNet** is a variant on a [ResNet](https://paperswithcode.com/method/resnet) that utilises an [Efficient Channel Attention module](https://paperswithcode.com/method/efficient-channel-attention). Efficient Channel Attention is an architectural unit based on [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block) that reduces model complexity without dimensionality reduction.
4
+
5
+ {% include 'code_snippets.md' %}
6
+
7
+ ## How do I train this model?
8
+
9
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
10
+
11
+ ## Citation
12
+
13
+ ```BibTeX
14
+ @misc{wang2020ecanet,
15
+ title={ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks},
16
+ author={Qilong Wang and Banggu Wu and Pengfei Zhu and Peihua Li and Wangmeng Zuo and Qinghua Hu},
17
+ year={2020},
18
+ eprint={1910.03151},
19
+ archivePrefix={arXiv},
20
+ primaryClass={cs.CV}
21
+ }
22
+ ```
23
+
24
+ <!--
25
+ Type: model-index
26
+ Collections:
27
+ - Name: ECAResNet
28
+ Paper:
29
+ Title: 'ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks'
30
+ URL: https://paperswithcode.com/paper/eca-net-efficient-channel-attention-for-deep
31
+ Models:
32
+ - Name: ecaresnet101d
33
+ In Collection: ECAResNet
34
+ Metadata:
35
+ FLOPs: 10377193728
36
+ Parameters: 44570000
37
+ File Size: 178815067
38
+ Architecture:
39
+ - 1x1 Convolution
40
+ - Batch Normalization
41
+ - Bottleneck Residual Block
42
+ - Convolution
43
+ - Efficient Channel Attention
44
+ - Global Average Pooling
45
+ - Max Pooling
46
+ - ReLU
47
+ - Residual Block
48
+ - Residual Connection
49
+ - Softmax
50
+ - Squeeze-and-Excitation Block
51
+ Tasks:
52
+ - Image Classification
53
+ Training Techniques:
54
+ - SGD with Momentum
55
+ - Weight Decay
56
+ Training Data:
57
+ - ImageNet
58
+ Training Resources: 4x RTX 2080Ti GPUs
59
+ ID: ecaresnet101d
60
+ LR: 0.1
61
+ Epochs: 100
62
+ Layers: 101
63
+ Crop Pct: '0.875'
64
+ Batch Size: 256
65
+ Image Size: '224'
66
+ Weight Decay: 0.0001
67
+ Interpolation: bicubic
68
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1087
69
+ Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45402/outputs/ECAResNet101D_281c5844.pth
70
+ Results:
71
+ - Task: Image Classification
72
+ Dataset: ImageNet
73
+ Metrics:
74
+ Top 1 Accuracy: 82.18%
75
+ Top 5 Accuracy: 96.06%
76
+ - Name: ecaresnet101d_pruned
77
+ In Collection: ECAResNet
78
+ Metadata:
79
+ FLOPs: 4463972081
80
+ Parameters: 24880000
81
+ File Size: 99852736
82
+ Architecture:
83
+ - 1x1 Convolution
84
+ - Batch Normalization
85
+ - Bottleneck Residual Block
86
+ - Convolution
87
+ - Efficient Channel Attention
88
+ - Global Average Pooling
89
+ - Max Pooling
90
+ - ReLU
91
+ - Residual Block
92
+ - Residual Connection
93
+ - Softmax
94
+ - Squeeze-and-Excitation Block
95
+ Tasks:
96
+ - Image Classification
97
+ Training Techniques:
98
+ - SGD with Momentum
99
+ - Weight Decay
100
+ Training Data:
101
+ - ImageNet
102
+ ID: ecaresnet101d_pruned
103
+ Layers: 101
104
+ Crop Pct: '0.875'
105
+ Image Size: '224'
106
+ Interpolation: bicubic
107
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1097
108
+ Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45610/outputs/ECAResNet101D_P_75a3370e.pth
109
+ Results:
110
+ - Task: Image Classification
111
+ Dataset: ImageNet
112
+ Metrics:
113
+ Top 1 Accuracy: 80.82%
114
+ Top 5 Accuracy: 95.64%
115
+ - Name: ecaresnet50d
116
+ In Collection: ECAResNet
117
+ Metadata:
118
+ FLOPs: 5591090432
119
+ Parameters: 25580000
120
+ File Size: 102579290
121
+ Architecture:
122
+ - 1x1 Convolution
123
+ - Batch Normalization
124
+ - Bottleneck Residual Block
125
+ - Convolution
126
+ - Efficient Channel Attention
127
+ - Global Average Pooling
128
+ - Max Pooling
129
+ - ReLU
130
+ - Residual Block
131
+ - Residual Connection
132
+ - Softmax
133
+ - Squeeze-and-Excitation Block
134
+ Tasks:
135
+ - Image Classification
136
+ Training Techniques:
137
+ - SGD with Momentum
138
+ - Weight Decay
139
+ Training Data:
140
+ - ImageNet
141
+ Training Resources: 4x RTX 2080Ti GPUs
142
+ ID: ecaresnet50d
143
+ LR: 0.1
144
+ Epochs: 100
145
+ Layers: 50
146
+ Crop Pct: '0.875'
147
+ Batch Size: 256
148
+ Image Size: '224'
149
+ Weight Decay: 0.0001
150
+ Interpolation: bicubic
151
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1045
152
+ Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45402/outputs/ECAResNet50D_833caf58.pth
153
+ Results:
154
+ - Task: Image Classification
155
+ Dataset: ImageNet
156
+ Metrics:
157
+ Top 1 Accuracy: 80.61%
158
+ Top 5 Accuracy: 95.31%
159
+ - Name: ecaresnet50d_pruned
160
+ In Collection: ECAResNet
161
+ Metadata:
162
+ FLOPs: 3250730657
163
+ Parameters: 19940000
164
+ File Size: 79990436
165
+ Architecture:
166
+ - 1x1 Convolution
167
+ - Batch Normalization
168
+ - Bottleneck Residual Block
169
+ - Convolution
170
+ - Efficient Channel Attention
171
+ - Global Average Pooling
172
+ - Max Pooling
173
+ - ReLU
174
+ - Residual Block
175
+ - Residual Connection
176
+ - Softmax
177
+ - Squeeze-and-Excitation Block
178
+ Tasks:
179
+ - Image Classification
180
+ Training Techniques:
181
+ - SGD with Momentum
182
+ - Weight Decay
183
+ Training Data:
184
+ - ImageNet
185
+ ID: ecaresnet50d_pruned
186
+ Layers: 50
187
+ Crop Pct: '0.875'
188
+ Image Size: '224'
189
+ Interpolation: bicubic
190
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1055
191
+ Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45899/outputs/ECAResNet50D_P_9c67f710.pth
192
+ Results:
193
+ - Task: Image Classification
194
+ Dataset: ImageNet
195
+ Metrics:
196
+ Top 1 Accuracy: 79.71%
197
+ Top 5 Accuracy: 94.88%
198
+ - Name: ecaresnetlight
199
+ In Collection: ECAResNet
200
+ Metadata:
201
+ FLOPs: 5276118784
202
+ Parameters: 30160000
203
+ File Size: 120956612
204
+ Architecture:
205
+ - 1x1 Convolution
206
+ - Batch Normalization
207
+ - Bottleneck Residual Block
208
+ - Convolution
209
+ - Efficient Channel Attention
210
+ - Global Average Pooling
211
+ - Max Pooling
212
+ - ReLU
213
+ - Residual Block
214
+ - Residual Connection
215
+ - Softmax
216
+ - Squeeze-and-Excitation Block
217
+ Tasks:
218
+ - Image Classification
219
+ Training Techniques:
220
+ - SGD with Momentum
221
+ - Weight Decay
222
+ Training Data:
223
+ - ImageNet
224
+ ID: ecaresnetlight
225
+ Crop Pct: '0.875'
226
+ Image Size: '224'
227
+ Interpolation: bicubic
228
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/resnet.py#L1077
229
+ Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45402/outputs/ECAResNetLight_4f34b35b.pth
230
+ Results:
231
+ - Task: Image Classification
232
+ Dataset: ImageNet
233
+ Metrics:
234
+ Top 1 Accuracy: 80.46%
235
+ Top 5 Accuracy: 95.25%
236
+ -->
docs/models/.templates/models/efficientnet-pruned.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # EfficientNet (Knapsack Pruned)
2
+
3
+ **EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use $2^N$ times more computational resources, then we can simply increase the network depth by $\alpha ^ N$, width by $\beta ^ N$, and image size by $\gamma ^ N$, where $\alpha, \beta, \gamma$ are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient $\phi$ to uniformly scales network width, depth, and resolution in a principled way.
4
+
5
+ The compound scaling method is justified by the intuition that if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.
6
+
7
+ The base EfficientNet-B0 network is based on the inverted bottleneck residual blocks of [MobileNetV2](https://paperswithcode.com/method/mobilenetv2), in addition to [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block).
8
+
9
+ This collection consists of pruned EfficientNet models.
10
+
11
+ {% include 'code_snippets.md' %}
12
+
13
+ ## How do I train this model?
14
+
15
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
16
+
17
+ ## Citation
18
+
19
+ ```BibTeX
20
+ @misc{tan2020efficientnet,
21
+ title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
22
+ author={Mingxing Tan and Quoc V. Le},
23
+ year={2020},
24
+ eprint={1905.11946},
25
+ archivePrefix={arXiv},
26
+ primaryClass={cs.LG}
27
+ }
28
+ ```
29
+
30
+ ```
31
+ @misc{aflalo2020knapsack,
32
+ title={Knapsack Pruning with Inner Distillation},
33
+ author={Yonathan Aflalo and Asaf Noy and Ming Lin and Itamar Friedman and Lihi Zelnik},
34
+ year={2020},
35
+ eprint={2002.08258},
36
+ archivePrefix={arXiv},
37
+ primaryClass={cs.LG}
38
+ }
39
+ ```
40
+
41
+ <!--
42
+ Type: model-index
43
+ Collections:
44
+ - Name: EfficientNet Pruned
45
+ Paper:
46
+ Title: Knapsack Pruning with Inner Distillation
47
+ URL: https://paperswithcode.com/paper/knapsack-pruning-with-inner-distillation
48
+ Models:
49
+ - Name: efficientnet_b1_pruned
50
+ In Collection: EfficientNet Pruned
51
+ Metadata:
52
+ FLOPs: 489653114
53
+ Parameters: 6330000
54
+ File Size: 25595162
55
+ Architecture:
56
+ - 1x1 Convolution
57
+ - Average Pooling
58
+ - Batch Normalization
59
+ - Convolution
60
+ - Dense Connections
61
+ - Dropout
62
+ - Inverted Residual Block
63
+ - Squeeze-and-Excitation Block
64
+ - Swish
65
+ Tasks:
66
+ - Image Classification
67
+ Training Data:
68
+ - ImageNet
69
+ ID: efficientnet_b1_pruned
70
+ Crop Pct: '0.882'
71
+ Image Size: '240'
72
+ Interpolation: bicubic
73
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1208
74
+ Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45403/outputs/effnetb1_pruned_9ebb3fe6.pth
75
+ Results:
76
+ - Task: Image Classification
77
+ Dataset: ImageNet
78
+ Metrics:
79
+ Top 1 Accuracy: 78.25%
80
+ Top 5 Accuracy: 93.84%
81
+ - Name: efficientnet_b2_pruned
82
+ In Collection: EfficientNet Pruned
83
+ Metadata:
84
+ FLOPs: 878133915
85
+ Parameters: 8310000
86
+ File Size: 33555005
87
+ Architecture:
88
+ - 1x1 Convolution
89
+ - Average Pooling
90
+ - Batch Normalization
91
+ - Convolution
92
+ - Dense Connections
93
+ - Dropout
94
+ - Inverted Residual Block
95
+ - Squeeze-and-Excitation Block
96
+ - Swish
97
+ Tasks:
98
+ - Image Classification
99
+ Training Data:
100
+ - ImageNet
101
+ ID: efficientnet_b2_pruned
102
+ Crop Pct: '0.89'
103
+ Image Size: '260'
104
+ Interpolation: bicubic
105
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1219
106
+ Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45403/outputs/effnetb2_pruned_203f55bc.pth
107
+ Results:
108
+ - Task: Image Classification
109
+ Dataset: ImageNet
110
+ Metrics:
111
+ Top 1 Accuracy: 79.91%
112
+ Top 5 Accuracy: 94.86%
113
+ - Name: efficientnet_b3_pruned
114
+ In Collection: EfficientNet Pruned
115
+ Metadata:
116
+ FLOPs: 1239590641
117
+ Parameters: 9860000
118
+ File Size: 39770812
119
+ Architecture:
120
+ - 1x1 Convolution
121
+ - Average Pooling
122
+ - Batch Normalization
123
+ - Convolution
124
+ - Dense Connections
125
+ - Dropout
126
+ - Inverted Residual Block
127
+ - Squeeze-and-Excitation Block
128
+ - Swish
129
+ Tasks:
130
+ - Image Classification
131
+ Training Data:
132
+ - ImageNet
133
+ ID: efficientnet_b3_pruned
134
+ Crop Pct: '0.904'
135
+ Image Size: '300'
136
+ Interpolation: bicubic
137
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1230
138
+ Weights: https://imvl-automl-sh.oss-cn-shanghai.aliyuncs.com/darts/hyperml/hyperml/job_45403/outputs/effnetb3_pruned_5abcc29f.pth
139
+ Results:
140
+ - Task: Image Classification
141
+ Dataset: ImageNet
142
+ Metrics:
143
+ Top 1 Accuracy: 80.86%
144
+ Top 5 Accuracy: 95.24%
145
+ -->
docs/models/.templates/models/efficientnet.md ADDED
@@ -0,0 +1,325 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # EfficientNet
2
+
3
+ **EfficientNet** is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a *compound coefficient*. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. For example, if we want to use $2^N$ times more computational resources, then we can simply increase the network depth by $\alpha ^ N$, width by $\beta ^ N$, and image size by $\gamma ^ N$, where $\alpha, \beta, \gamma$ are constant coefficients determined by a small grid search on the original small model. EfficientNet uses a compound coefficient $\phi$ to uniformly scales network width, depth, and resolution in a principled way.
4
+
5
+ The compound scaling method is justified by the intuition that if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image.
6
+
7
+ The base EfficientNet-B0 network is based on the inverted bottleneck residual blocks of [MobileNetV2](https://paperswithcode.com/method/mobilenetv2), in addition to [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block).
8
+
9
+ {% include 'code_snippets.md' %}
10
+
11
+ ## How do I train this model?
12
+
13
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
14
+
15
+ ## Citation
16
+
17
+ ```BibTeX
18
+ @misc{tan2020efficientnet,
19
+ title={EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks},
20
+ author={Mingxing Tan and Quoc V. Le},
21
+ year={2020},
22
+ eprint={1905.11946},
23
+ archivePrefix={arXiv},
24
+ primaryClass={cs.LG}
25
+ }
26
+ ```
27
+
28
+ <!--
29
+ Type: model-index
30
+ Collections:
31
+ - Name: EfficientNet
32
+ Paper:
33
+ Title: 'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks'
34
+ URL: https://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for
35
+ Models:
36
+ - Name: efficientnet_b0
37
+ In Collection: EfficientNet
38
+ Metadata:
39
+ FLOPs: 511241564
40
+ Parameters: 5290000
41
+ File Size: 21376743
42
+ Architecture:
43
+ - 1x1 Convolution
44
+ - Average Pooling
45
+ - Batch Normalization
46
+ - Convolution
47
+ - Dense Connections
48
+ - Dropout
49
+ - Inverted Residual Block
50
+ - Squeeze-and-Excitation Block
51
+ - Swish
52
+ Tasks:
53
+ - Image Classification
54
+ Training Data:
55
+ - ImageNet
56
+ ID: efficientnet_b0
57
+ Layers: 18
58
+ Crop Pct: '0.875'
59
+ Image Size: '224'
60
+ Interpolation: bicubic
61
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1002
62
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b0_ra-3dd342df.pth
63
+ Results:
64
+ - Task: Image Classification
65
+ Dataset: ImageNet
66
+ Metrics:
67
+ Top 1 Accuracy: 77.71%
68
+ Top 5 Accuracy: 93.52%
69
+ - Name: efficientnet_b1
70
+ In Collection: EfficientNet
71
+ Metadata:
72
+ FLOPs: 909691920
73
+ Parameters: 7790000
74
+ File Size: 31502706
75
+ Architecture:
76
+ - 1x1 Convolution
77
+ - Average Pooling
78
+ - Batch Normalization
79
+ - Convolution
80
+ - Dense Connections
81
+ - Dropout
82
+ - Inverted Residual Block
83
+ - Squeeze-and-Excitation Block
84
+ - Swish
85
+ Tasks:
86
+ - Image Classification
87
+ Training Data:
88
+ - ImageNet
89
+ ID: efficientnet_b1
90
+ Crop Pct: '0.875'
91
+ Image Size: '240'
92
+ Interpolation: bicubic
93
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1011
94
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b1-533bc792.pth
95
+ Results:
96
+ - Task: Image Classification
97
+ Dataset: ImageNet
98
+ Metrics:
99
+ Top 1 Accuracy: 78.71%
100
+ Top 5 Accuracy: 94.15%
101
+ - Name: efficientnet_b2
102
+ In Collection: EfficientNet
103
+ Metadata:
104
+ FLOPs: 1265324514
105
+ Parameters: 9110000
106
+ File Size: 36788104
107
+ Architecture:
108
+ - 1x1 Convolution
109
+ - Average Pooling
110
+ - Batch Normalization
111
+ - Convolution
112
+ - Dense Connections
113
+ - Dropout
114
+ - Inverted Residual Block
115
+ - Squeeze-and-Excitation Block
116
+ - Swish
117
+ Tasks:
118
+ - Image Classification
119
+ Training Data:
120
+ - ImageNet
121
+ ID: efficientnet_b2
122
+ Crop Pct: '0.875'
123
+ Image Size: '260'
124
+ Interpolation: bicubic
125
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1020
126
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b2_ra-bcdf34b7.pth
127
+ Results:
128
+ - Task: Image Classification
129
+ Dataset: ImageNet
130
+ Metrics:
131
+ Top 1 Accuracy: 80.38%
132
+ Top 5 Accuracy: 95.08%
133
+ - Name: efficientnet_b2a
134
+ In Collection: EfficientNet
135
+ Metadata:
136
+ FLOPs: 1452041554
137
+ Parameters: 9110000
138
+ File Size: 49369973
139
+ Architecture:
140
+ - 1x1 Convolution
141
+ - Average Pooling
142
+ - Batch Normalization
143
+ - Convolution
144
+ - Dense Connections
145
+ - Dropout
146
+ - Inverted Residual Block
147
+ - Squeeze-and-Excitation Block
148
+ - Swish
149
+ Tasks:
150
+ - Image Classification
151
+ Training Data:
152
+ - ImageNet
153
+ ID: efficientnet_b2a
154
+ Crop Pct: '1.0'
155
+ Image Size: '288'
156
+ Interpolation: bicubic
157
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1029
158
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b3_ra2-cf984f9c.pth
159
+ Results:
160
+ - Task: Image Classification
161
+ Dataset: ImageNet
162
+ Metrics:
163
+ Top 1 Accuracy: 80.61%
164
+ Top 5 Accuracy: 95.32%
165
+ - Name: efficientnet_b3
166
+ In Collection: EfficientNet
167
+ Metadata:
168
+ FLOPs: 2327905920
169
+ Parameters: 12230000
170
+ File Size: 49369973
171
+ Architecture:
172
+ - 1x1 Convolution
173
+ - Average Pooling
174
+ - Batch Normalization
175
+ - Convolution
176
+ - Dense Connections
177
+ - Dropout
178
+ - Inverted Residual Block
179
+ - Squeeze-and-Excitation Block
180
+ - Swish
181
+ Tasks:
182
+ - Image Classification
183
+ Training Data:
184
+ - ImageNet
185
+ ID: efficientnet_b3
186
+ Crop Pct: '0.904'
187
+ Image Size: '300'
188
+ Interpolation: bicubic
189
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1038
190
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b3_ra2-cf984f9c.pth
191
+ Results:
192
+ - Task: Image Classification
193
+ Dataset: ImageNet
194
+ Metrics:
195
+ Top 1 Accuracy: 82.08%
196
+ Top 5 Accuracy: 96.03%
197
+ - Name: efficientnet_b3a
198
+ In Collection: EfficientNet
199
+ Metadata:
200
+ FLOPs: 2600628304
201
+ Parameters: 12230000
202
+ File Size: 49369973
203
+ Architecture:
204
+ - 1x1 Convolution
205
+ - Average Pooling
206
+ - Batch Normalization
207
+ - Convolution
208
+ - Dense Connections
209
+ - Dropout
210
+ - Inverted Residual Block
211
+ - Squeeze-and-Excitation Block
212
+ - Swish
213
+ Tasks:
214
+ - Image Classification
215
+ Training Data:
216
+ - ImageNet
217
+ ID: efficientnet_b3a
218
+ Crop Pct: '1.0'
219
+ Image Size: '320'
220
+ Interpolation: bicubic
221
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1047
222
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b3_ra2-cf984f9c.pth
223
+ Results:
224
+ - Task: Image Classification
225
+ Dataset: ImageNet
226
+ Metrics:
227
+ Top 1 Accuracy: 82.25%
228
+ Top 5 Accuracy: 96.11%
229
+ - Name: efficientnet_em
230
+ In Collection: EfficientNet
231
+ Metadata:
232
+ FLOPs: 3935516480
233
+ Parameters: 6900000
234
+ File Size: 27927309
235
+ Architecture:
236
+ - 1x1 Convolution
237
+ - Average Pooling
238
+ - Batch Normalization
239
+ - Convolution
240
+ - Dense Connections
241
+ - Dropout
242
+ - Inverted Residual Block
243
+ - Squeeze-and-Excitation Block
244
+ - Swish
245
+ Tasks:
246
+ - Image Classification
247
+ Training Data:
248
+ - ImageNet
249
+ ID: efficientnet_em
250
+ Crop Pct: '0.882'
251
+ Image Size: '240'
252
+ Interpolation: bicubic
253
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1118
254
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_em_ra2-66250f76.pth
255
+ Results:
256
+ - Task: Image Classification
257
+ Dataset: ImageNet
258
+ Metrics:
259
+ Top 1 Accuracy: 79.26%
260
+ Top 5 Accuracy: 94.79%
261
+ - Name: efficientnet_es
262
+ In Collection: EfficientNet
263
+ Metadata:
264
+ FLOPs: 2317181824
265
+ Parameters: 5440000
266
+ File Size: 22003339
267
+ Architecture:
268
+ - 1x1 Convolution
269
+ - Average Pooling
270
+ - Batch Normalization
271
+ - Convolution
272
+ - Dense Connections
273
+ - Dropout
274
+ - Inverted Residual Block
275
+ - Squeeze-and-Excitation Block
276
+ - Swish
277
+ Tasks:
278
+ - Image Classification
279
+ Training Data:
280
+ - ImageNet
281
+ ID: efficientnet_es
282
+ Crop Pct: '0.875'
283
+ Image Size: '224'
284
+ Interpolation: bicubic
285
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1110
286
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_es_ra-f111e99c.pth
287
+ Results:
288
+ - Task: Image Classification
289
+ Dataset: ImageNet
290
+ Metrics:
291
+ Top 1 Accuracy: 78.09%
292
+ Top 5 Accuracy: 93.93%
293
+ - Name: efficientnet_lite0
294
+ In Collection: EfficientNet
295
+ Metadata:
296
+ FLOPs: 510605024
297
+ Parameters: 4650000
298
+ File Size: 18820005
299
+ Architecture:
300
+ - 1x1 Convolution
301
+ - Average Pooling
302
+ - Batch Normalization
303
+ - Convolution
304
+ - Dense Connections
305
+ - Dropout
306
+ - Inverted Residual Block
307
+ - Squeeze-and-Excitation Block
308
+ - Swish
309
+ Tasks:
310
+ - Image Classification
311
+ Training Data:
312
+ - ImageNet
313
+ ID: efficientnet_lite0
314
+ Crop Pct: '0.875'
315
+ Image Size: '224'
316
+ Interpolation: bicubic
317
+ Code: https://github.com/rwightman/pytorch-image-models/blob/a7f95818e44b281137503bcf4b3e3e94d8ffa52f/timm/models/efficientnet.py#L1163
318
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_lite0_ra-37913777.pth
319
+ Results:
320
+ - Task: Image Classification
321
+ Dataset: ImageNet
322
+ Metrics:
323
+ Top 1 Accuracy: 75.5%
324
+ Top 5 Accuracy: 92.51%
325
+ -->
docs/models/.templates/models/ensemble-adversarial.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # # Ensemble Adversarial Inception ResNet v2
2
+
3
+ **Inception-ResNet-v2** is a convolutional neural architecture that builds on the Inception family of architectures but incorporates [residual connections](https://paperswithcode.com/method/residual-connection) (replacing the filter concatenation stage of the Inception architecture).
4
+
5
+ This particular model was trained for study of adversarial examples (adversarial training).
6
+
7
+ The weights from this model were ported from [Tensorflow/Models](https://github.com/tensorflow/models).
8
+
9
+ {% include 'code_snippets.md' %}
10
+
11
+ ## How do I train this model?
12
+
13
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
14
+
15
+ ## Citation
16
+
17
+ ```BibTeX
18
+ @article{DBLP:journals/corr/abs-1804-00097,
19
+ author = {Alexey Kurakin and
20
+ Ian J. Goodfellow and
21
+ Samy Bengio and
22
+ Yinpeng Dong and
23
+ Fangzhou Liao and
24
+ Ming Liang and
25
+ Tianyu Pang and
26
+ Jun Zhu and
27
+ Xiaolin Hu and
28
+ Cihang Xie and
29
+ Jianyu Wang and
30
+ Zhishuai Zhang and
31
+ Zhou Ren and
32
+ Alan L. Yuille and
33
+ Sangxia Huang and
34
+ Yao Zhao and
35
+ Yuzhe Zhao and
36
+ Zhonglin Han and
37
+ Junjiajia Long and
38
+ Yerkebulan Berdibekov and
39
+ Takuya Akiba and
40
+ Seiya Tokui and
41
+ Motoki Abe},
42
+ title = {Adversarial Attacks and Defences Competition},
43
+ journal = {CoRR},
44
+ volume = {abs/1804.00097},
45
+ year = {2018},
46
+ url = {http://arxiv.org/abs/1804.00097},
47
+ archivePrefix = {arXiv},
48
+ eprint = {1804.00097},
49
+ timestamp = {Thu, 31 Oct 2019 16:31:22 +0100},
50
+ biburl = {https://dblp.org/rec/journals/corr/abs-1804-00097.bib},
51
+ bibsource = {dblp computer science bibliography, https://dblp.org}
52
+ }
53
+ ```
54
+
55
+ <!--
56
+ Type: model-index
57
+ Collections:
58
+ - Name: Ensemble Adversarial
59
+ Paper:
60
+ Title: Adversarial Attacks and Defences Competition
61
+ URL: https://paperswithcode.com/paper/adversarial-attacks-and-defences-competition
62
+ Models:
63
+ - Name: ens_adv_inception_resnet_v2
64
+ In Collection: Ensemble Adversarial
65
+ Metadata:
66
+ FLOPs: 16959133120
67
+ Parameters: 55850000
68
+ File Size: 223774238
69
+ Architecture:
70
+ - 1x1 Convolution
71
+ - Auxiliary Classifier
72
+ - Average Pooling
73
+ - Average Pooling
74
+ - Batch Normalization
75
+ - Convolution
76
+ - Dense Connections
77
+ - Dropout
78
+ - Inception-v3 Module
79
+ - Max Pooling
80
+ - ReLU
81
+ - Softmax
82
+ Tasks:
83
+ - Image Classification
84
+ Training Data:
85
+ - ImageNet
86
+ ID: ens_adv_inception_resnet_v2
87
+ Crop Pct: '0.897'
88
+ Image Size: '299'
89
+ Interpolation: bicubic
90
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_resnet_v2.py#L351
91
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/ens_adv_inception_resnet_v2-2592a550.pth
92
+ Results:
93
+ - Task: Image Classification
94
+ Dataset: ImageNet
95
+ Metrics:
96
+ Top 1 Accuracy: 1.0%
97
+ Top 5 Accuracy: 17.32%
98
+ -->
docs/models/.templates/models/ese-vovnet.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ESE-VoVNet
2
+
3
+ **VoVNet** is a convolutional neural network that seeks to make [DenseNet](https://paperswithcode.com/method/densenet) more efficient by concatenating all features only once in the last feature map, which makes input size constant and enables enlarging new output channel.
4
+
5
+ Read about [one-shot aggregation here](https://paperswithcode.com/method/one-shot-aggregation).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{lee2019energy,
17
+ title={An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection},
18
+ author={Youngwan Lee and Joong-won Hwang and Sangrok Lee and Yuseok Bae and Jongyoul Park},
19
+ year={2019},
20
+ eprint={1904.09730},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: ESE VovNet
30
+ Paper:
31
+ Title: 'CenterMask : Real-Time Anchor-Free Instance Segmentation'
32
+ URL: https://paperswithcode.com/paper/centermask-real-time-anchor-free-instance-1
33
+ Models:
34
+ - Name: ese_vovnet19b_dw
35
+ In Collection: ESE VovNet
36
+ Metadata:
37
+ FLOPs: 1711959904
38
+ Parameters: 6540000
39
+ File Size: 26243175
40
+ Architecture:
41
+ - Batch Normalization
42
+ - Convolution
43
+ - Max Pooling
44
+ - One-Shot Aggregation
45
+ - ReLU
46
+ Tasks:
47
+ - Image Classification
48
+ Training Data:
49
+ - ImageNet
50
+ ID: ese_vovnet19b_dw
51
+ Layers: 19
52
+ Crop Pct: '0.875'
53
+ Image Size: '224'
54
+ Interpolation: bicubic
55
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/vovnet.py#L361
56
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/ese_vovnet19b_dw-a8741004.pth
57
+ Results:
58
+ - Task: Image Classification
59
+ Dataset: ImageNet
60
+ Metrics:
61
+ Top 1 Accuracy: 76.82%
62
+ Top 5 Accuracy: 93.28%
63
+ - Name: ese_vovnet39b
64
+ In Collection: ESE VovNet
65
+ Metadata:
66
+ FLOPs: 9089259008
67
+ Parameters: 24570000
68
+ File Size: 98397138
69
+ Architecture:
70
+ - Batch Normalization
71
+ - Convolution
72
+ - Max Pooling
73
+ - One-Shot Aggregation
74
+ - ReLU
75
+ Tasks:
76
+ - Image Classification
77
+ Training Data:
78
+ - ImageNet
79
+ ID: ese_vovnet39b
80
+ Layers: 39
81
+ Crop Pct: '0.875'
82
+ Image Size: '224'
83
+ Interpolation: bicubic
84
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/vovnet.py#L371
85
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/ese_vovnet39b-f912fe73.pth
86
+ Results:
87
+ - Task: Image Classification
88
+ Dataset: ImageNet
89
+ Metrics:
90
+ Top 1 Accuracy: 79.31%
91
+ Top 5 Accuracy: 94.72%
92
+ -->
docs/models/.templates/models/fbnet.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FBNet
2
+
3
+ **FBNet** is a type of convolutional neural architectures discovered through [DNAS](https://paperswithcode.com/method/dnas) neural architecture search. It utilises a basic type of image model block inspired by [MobileNetv2](https://paperswithcode.com/method/mobilenetv2) that utilises depthwise convolutions and an inverted residual structure (see components).
4
+
5
+ The principal building block is the [FBNet Block](https://paperswithcode.com/method/fbnet-block).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{wu2019fbnet,
17
+ title={FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search},
18
+ author={Bichen Wu and Xiaoliang Dai and Peizhao Zhang and Yanghan Wang and Fei Sun and Yiming Wu and Yuandong Tian and Peter Vajda and Yangqing Jia and Kurt Keutzer},
19
+ year={2019},
20
+ eprint={1812.03443},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: FBNet
30
+ Paper:
31
+ Title: 'FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural
32
+ Architecture Search'
33
+ URL: https://paperswithcode.com/paper/fbnet-hardware-aware-efficient-convnet-design
34
+ Models:
35
+ - Name: fbnetc_100
36
+ In Collection: FBNet
37
+ Metadata:
38
+ FLOPs: 508940064
39
+ Parameters: 5570000
40
+ File Size: 22525094
41
+ Architecture:
42
+ - 1x1 Convolution
43
+ - Convolution
44
+ - Dense Connections
45
+ - Dropout
46
+ - FBNet Block
47
+ - Global Average Pooling
48
+ - Softmax
49
+ Tasks:
50
+ - Image Classification
51
+ Training Techniques:
52
+ - SGD with Momentum
53
+ - Weight Decay
54
+ Training Data:
55
+ - ImageNet
56
+ Training Resources: 8x GPUs
57
+ ID: fbnetc_100
58
+ LR: 0.1
59
+ Epochs: 360
60
+ Layers: 22
61
+ Dropout: 0.2
62
+ Crop Pct: '0.875'
63
+ Momentum: 0.9
64
+ Batch Size: 256
65
+ Image Size: '224'
66
+ Weight Decay: 0.0005
67
+ Interpolation: bilinear
68
+ Code: https://github.com/rwightman/pytorch-image-models/blob/9a25fdf3ad0414b4d66da443fe60ae0aa14edc84/timm/models/efficientnet.py#L985
69
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/fbnetc_100-c345b898.pth
70
+ Results:
71
+ - Task: Image Classification
72
+ Dataset: ImageNet
73
+ Metrics:
74
+ Top 1 Accuracy: 75.12%
75
+ Top 5 Accuracy: 92.37%
76
+ -->
docs/models/.templates/models/gloun-inception-v3.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # (Gluon) Inception v3
2
+
3
+ **Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
4
+
5
+ The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @article{DBLP:journals/corr/SzegedyVISW15,
17
+ author = {Christian Szegedy and
18
+ Vincent Vanhoucke and
19
+ Sergey Ioffe and
20
+ Jonathon Shlens and
21
+ Zbigniew Wojna},
22
+ title = {Rethinking the Inception Architecture for Computer Vision},
23
+ journal = {CoRR},
24
+ volume = {abs/1512.00567},
25
+ year = {2015},
26
+ url = {http://arxiv.org/abs/1512.00567},
27
+ archivePrefix = {arXiv},
28
+ eprint = {1512.00567},
29
+ timestamp = {Mon, 13 Aug 2018 16:49:07 +0200},
30
+ biburl = {https://dblp.org/rec/journals/corr/SzegedyVISW15.bib},
31
+ bibsource = {dblp computer science bibliography, https://dblp.org}
32
+ }
33
+ ```
34
+
35
+ <!--
36
+ Type: model-index
37
+ Collections:
38
+ - Name: Gloun Inception v3
39
+ Paper:
40
+ Title: Rethinking the Inception Architecture for Computer Vision
41
+ URL: https://paperswithcode.com/paper/rethinking-the-inception-architecture-for
42
+ Models:
43
+ - Name: gluon_inception_v3
44
+ In Collection: Gloun Inception v3
45
+ Metadata:
46
+ FLOPs: 7352418880
47
+ Parameters: 23830000
48
+ File Size: 95567055
49
+ Architecture:
50
+ - 1x1 Convolution
51
+ - Auxiliary Classifier
52
+ - Average Pooling
53
+ - Average Pooling
54
+ - Batch Normalization
55
+ - Convolution
56
+ - Dense Connections
57
+ - Dropout
58
+ - Inception-v3 Module
59
+ - Max Pooling
60
+ - ReLU
61
+ - Softmax
62
+ Tasks:
63
+ - Image Classification
64
+ Training Data:
65
+ - ImageNet
66
+ ID: gluon_inception_v3
67
+ Crop Pct: '0.875'
68
+ Image Size: '299'
69
+ Interpolation: bicubic
70
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_v3.py#L464
71
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/gluon_inception_v3-9f746940.pth
72
+ Results:
73
+ - Task: Image Classification
74
+ Dataset: ImageNet
75
+ Metrics:
76
+ Top 1 Accuracy: 78.8%
77
+ Top 5 Accuracy: 94.38%
78
+ -->
docs/models/.templates/models/gloun-resnet.md ADDED
@@ -0,0 +1,504 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # (Gluon) ResNet
2
+
3
+ **Residual Networks**, or **ResNets**, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. They stack [residual blocks](https://paperswithcode.com/method/residual-block) ontop of each other to form network: e.g. a ResNet-50 has fifty layers using these blocks.
4
+
5
+ The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @article{DBLP:journals/corr/HeZRS15,
17
+ author = {Kaiming He and
18
+ Xiangyu Zhang and
19
+ Shaoqing Ren and
20
+ Jian Sun},
21
+ title = {Deep Residual Learning for Image Recognition},
22
+ journal = {CoRR},
23
+ volume = {abs/1512.03385},
24
+ year = {2015},
25
+ url = {http://arxiv.org/abs/1512.03385},
26
+ archivePrefix = {arXiv},
27
+ eprint = {1512.03385},
28
+ timestamp = {Wed, 17 Apr 2019 17:23:45 +0200},
29
+ biburl = {https://dblp.org/rec/journals/corr/HeZRS15.bib},
30
+ bibsource = {dblp computer science bibliography, https://dblp.org}
31
+ }
32
+ ```
33
+
34
+ <!--
35
+ Type: model-index
36
+ Collections:
37
+ - Name: Gloun ResNet
38
+ Paper:
39
+ Title: Deep Residual Learning for Image Recognition
40
+ URL: https://paperswithcode.com/paper/deep-residual-learning-for-image-recognition
41
+ Models:
42
+ - Name: gluon_resnet101_v1b
43
+ In Collection: Gloun ResNet
44
+ Metadata:
45
+ FLOPs: 10068547584
46
+ Parameters: 44550000
47
+ File Size: 178723172
48
+ Architecture:
49
+ - 1x1 Convolution
50
+ - Batch Normalization
51
+ - Bottleneck Residual Block
52
+ - Convolution
53
+ - Global Average Pooling
54
+ - Max Pooling
55
+ - ReLU
56
+ - Residual Block
57
+ - Residual Connection
58
+ - Softmax
59
+ Tasks:
60
+ - Image Classification
61
+ Training Data:
62
+ - ImageNet
63
+ ID: gluon_resnet101_v1b
64
+ Crop Pct: '0.875'
65
+ Image Size: '224'
66
+ Interpolation: bicubic
67
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L89
68
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet101_v1b-3b017079.pth
69
+ Results:
70
+ - Task: Image Classification
71
+ Dataset: ImageNet
72
+ Metrics:
73
+ Top 1 Accuracy: 79.3%
74
+ Top 5 Accuracy: 94.53%
75
+ - Name: gluon_resnet101_v1c
76
+ In Collection: Gloun ResNet
77
+ Metadata:
78
+ FLOPs: 10376567296
79
+ Parameters: 44570000
80
+ File Size: 178802575
81
+ Architecture:
82
+ - 1x1 Convolution
83
+ - Batch Normalization
84
+ - Bottleneck Residual Block
85
+ - Convolution
86
+ - Global Average Pooling
87
+ - Max Pooling
88
+ - ReLU
89
+ - Residual Block
90
+ - Residual Connection
91
+ - Softmax
92
+ Tasks:
93
+ - Image Classification
94
+ Training Data:
95
+ - ImageNet
96
+ ID: gluon_resnet101_v1c
97
+ Crop Pct: '0.875'
98
+ Image Size: '224'
99
+ Interpolation: bicubic
100
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L113
101
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet101_v1c-1f26822a.pth
102
+ Results:
103
+ - Task: Image Classification
104
+ Dataset: ImageNet
105
+ Metrics:
106
+ Top 1 Accuracy: 79.53%
107
+ Top 5 Accuracy: 94.59%
108
+ - Name: gluon_resnet101_v1d
109
+ In Collection: Gloun ResNet
110
+ Metadata:
111
+ FLOPs: 10377018880
112
+ Parameters: 44570000
113
+ File Size: 178802755
114
+ Architecture:
115
+ - 1x1 Convolution
116
+ - Batch Normalization
117
+ - Bottleneck Residual Block
118
+ - Convolution
119
+ - Global Average Pooling
120
+ - Max Pooling
121
+ - ReLU
122
+ - Residual Block
123
+ - Residual Connection
124
+ - Softmax
125
+ Tasks:
126
+ - Image Classification
127
+ Training Data:
128
+ - ImageNet
129
+ ID: gluon_resnet101_v1d
130
+ Crop Pct: '0.875'
131
+ Image Size: '224'
132
+ Interpolation: bicubic
133
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L138
134
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet101_v1d-0f9c8644.pth
135
+ Results:
136
+ - Task: Image Classification
137
+ Dataset: ImageNet
138
+ Metrics:
139
+ Top 1 Accuracy: 80.4%
140
+ Top 5 Accuracy: 95.02%
141
+ - Name: gluon_resnet101_v1s
142
+ In Collection: Gloun ResNet
143
+ Metadata:
144
+ FLOPs: 11805511680
145
+ Parameters: 44670000
146
+ File Size: 179221777
147
+ Architecture:
148
+ - 1x1 Convolution
149
+ - Batch Normalization
150
+ - Bottleneck Residual Block
151
+ - Convolution
152
+ - Global Average Pooling
153
+ - Max Pooling
154
+ - ReLU
155
+ - Residual Block
156
+ - Residual Connection
157
+ - Softmax
158
+ Tasks:
159
+ - Image Classification
160
+ Training Data:
161
+ - ImageNet
162
+ ID: gluon_resnet101_v1s
163
+ Crop Pct: '0.875'
164
+ Image Size: '224'
165
+ Interpolation: bicubic
166
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L166
167
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet101_v1s-60fe0cc1.pth
168
+ Results:
169
+ - Task: Image Classification
170
+ Dataset: ImageNet
171
+ Metrics:
172
+ Top 1 Accuracy: 80.29%
173
+ Top 5 Accuracy: 95.16%
174
+ - Name: gluon_resnet152_v1b
175
+ In Collection: Gloun ResNet
176
+ Metadata:
177
+ FLOPs: 14857660416
178
+ Parameters: 60190000
179
+ File Size: 241534001
180
+ Architecture:
181
+ - 1x1 Convolution
182
+ - Batch Normalization
183
+ - Bottleneck Residual Block
184
+ - Convolution
185
+ - Global Average Pooling
186
+ - Max Pooling
187
+ - ReLU
188
+ - Residual Block
189
+ - Residual Connection
190
+ - Softmax
191
+ Tasks:
192
+ - Image Classification
193
+ Training Data:
194
+ - ImageNet
195
+ ID: gluon_resnet152_v1b
196
+ Crop Pct: '0.875'
197
+ Image Size: '224'
198
+ Interpolation: bicubic
199
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L97
200
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet152_v1b-c1edb0dd.pth
201
+ Results:
202
+ - Task: Image Classification
203
+ Dataset: ImageNet
204
+ Metrics:
205
+ Top 1 Accuracy: 79.69%
206
+ Top 5 Accuracy: 94.73%
207
+ - Name: gluon_resnet152_v1c
208
+ In Collection: Gloun ResNet
209
+ Metadata:
210
+ FLOPs: 15165680128
211
+ Parameters: 60210000
212
+ File Size: 241613404
213
+ Architecture:
214
+ - 1x1 Convolution
215
+ - Batch Normalization
216
+ - Bottleneck Residual Block
217
+ - Convolution
218
+ - Global Average Pooling
219
+ - Max Pooling
220
+ - ReLU
221
+ - Residual Block
222
+ - Residual Connection
223
+ - Softmax
224
+ Tasks:
225
+ - Image Classification
226
+ Training Data:
227
+ - ImageNet
228
+ ID: gluon_resnet152_v1c
229
+ Crop Pct: '0.875'
230
+ Image Size: '224'
231
+ Interpolation: bicubic
232
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L121
233
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet152_v1c-a3bb0b98.pth
234
+ Results:
235
+ - Task: Image Classification
236
+ Dataset: ImageNet
237
+ Metrics:
238
+ Top 1 Accuracy: 79.91%
239
+ Top 5 Accuracy: 94.85%
240
+ - Name: gluon_resnet152_v1d
241
+ In Collection: Gloun ResNet
242
+ Metadata:
243
+ FLOPs: 15166131712
244
+ Parameters: 60210000
245
+ File Size: 241613584
246
+ Architecture:
247
+ - 1x1 Convolution
248
+ - Batch Normalization
249
+ - Bottleneck Residual Block
250
+ - Convolution
251
+ - Global Average Pooling
252
+ - Max Pooling
253
+ - ReLU
254
+ - Residual Block
255
+ - Residual Connection
256
+ - Softmax
257
+ Tasks:
258
+ - Image Classification
259
+ Training Data:
260
+ - ImageNet
261
+ ID: gluon_resnet152_v1d
262
+ Crop Pct: '0.875'
263
+ Image Size: '224'
264
+ Interpolation: bicubic
265
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L147
266
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet152_v1d-bd354e12.pth
267
+ Results:
268
+ - Task: Image Classification
269
+ Dataset: ImageNet
270
+ Metrics:
271
+ Top 1 Accuracy: 80.48%
272
+ Top 5 Accuracy: 95.2%
273
+ - Name: gluon_resnet152_v1s
274
+ In Collection: Gloun ResNet
275
+ Metadata:
276
+ FLOPs: 16594624512
277
+ Parameters: 60320000
278
+ File Size: 242032606
279
+ Architecture:
280
+ - 1x1 Convolution
281
+ - Batch Normalization
282
+ - Bottleneck Residual Block
283
+ - Convolution
284
+ - Global Average Pooling
285
+ - Max Pooling
286
+ - ReLU
287
+ - Residual Block
288
+ - Residual Connection
289
+ - Softmax
290
+ Tasks:
291
+ - Image Classification
292
+ Training Data:
293
+ - ImageNet
294
+ ID: gluon_resnet152_v1s
295
+ Crop Pct: '0.875'
296
+ Image Size: '224'
297
+ Interpolation: bicubic
298
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L175
299
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet152_v1s-dcc41b81.pth
300
+ Results:
301
+ - Task: Image Classification
302
+ Dataset: ImageNet
303
+ Metrics:
304
+ Top 1 Accuracy: 81.02%
305
+ Top 5 Accuracy: 95.42%
306
+ - Name: gluon_resnet18_v1b
307
+ In Collection: Gloun ResNet
308
+ Metadata:
309
+ FLOPs: 2337073152
310
+ Parameters: 11690000
311
+ File Size: 46816736
312
+ Architecture:
313
+ - 1x1 Convolution
314
+ - Batch Normalization
315
+ - Bottleneck Residual Block
316
+ - Convolution
317
+ - Global Average Pooling
318
+ - Max Pooling
319
+ - ReLU
320
+ - Residual Block
321
+ - Residual Connection
322
+ - Softmax
323
+ Tasks:
324
+ - Image Classification
325
+ Training Data:
326
+ - ImageNet
327
+ ID: gluon_resnet18_v1b
328
+ Crop Pct: '0.875'
329
+ Image Size: '224'
330
+ Interpolation: bicubic
331
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L65
332
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet18_v1b-0757602b.pth
333
+ Results:
334
+ - Task: Image Classification
335
+ Dataset: ImageNet
336
+ Metrics:
337
+ Top 1 Accuracy: 70.84%
338
+ Top 5 Accuracy: 89.76%
339
+ - Name: gluon_resnet34_v1b
340
+ In Collection: Gloun ResNet
341
+ Metadata:
342
+ FLOPs: 4718469120
343
+ Parameters: 21800000
344
+ File Size: 87295112
345
+ Architecture:
346
+ - 1x1 Convolution
347
+ - Batch Normalization
348
+ - Bottleneck Residual Block
349
+ - Convolution
350
+ - Global Average Pooling
351
+ - Max Pooling
352
+ - ReLU
353
+ - Residual Block
354
+ - Residual Connection
355
+ - Softmax
356
+ Tasks:
357
+ - Image Classification
358
+ Training Data:
359
+ - ImageNet
360
+ ID: gluon_resnet34_v1b
361
+ Crop Pct: '0.875'
362
+ Image Size: '224'
363
+ Interpolation: bicubic
364
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L73
365
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet34_v1b-c6d82d59.pth
366
+ Results:
367
+ - Task: Image Classification
368
+ Dataset: ImageNet
369
+ Metrics:
370
+ Top 1 Accuracy: 74.59%
371
+ Top 5 Accuracy: 92.0%
372
+ - Name: gluon_resnet50_v1b
373
+ In Collection: Gloun ResNet
374
+ Metadata:
375
+ FLOPs: 5282531328
376
+ Parameters: 25560000
377
+ File Size: 102493763
378
+ Architecture:
379
+ - 1x1 Convolution
380
+ - Batch Normalization
381
+ - Bottleneck Residual Block
382
+ - Convolution
383
+ - Global Average Pooling
384
+ - Max Pooling
385
+ - ReLU
386
+ - Residual Block
387
+ - Residual Connection
388
+ - Softmax
389
+ Tasks:
390
+ - Image Classification
391
+ Training Data:
392
+ - ImageNet
393
+ ID: gluon_resnet50_v1b
394
+ Crop Pct: '0.875'
395
+ Image Size: '224'
396
+ Interpolation: bicubic
397
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L81
398
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet50_v1b-0ebe02e2.pth
399
+ Results:
400
+ - Task: Image Classification
401
+ Dataset: ImageNet
402
+ Metrics:
403
+ Top 1 Accuracy: 77.58%
404
+ Top 5 Accuracy: 93.72%
405
+ - Name: gluon_resnet50_v1c
406
+ In Collection: Gloun ResNet
407
+ Metadata:
408
+ FLOPs: 5590551040
409
+ Parameters: 25580000
410
+ File Size: 102573166
411
+ Architecture:
412
+ - 1x1 Convolution
413
+ - Batch Normalization
414
+ - Bottleneck Residual Block
415
+ - Convolution
416
+ - Global Average Pooling
417
+ - Max Pooling
418
+ - ReLU
419
+ - Residual Block
420
+ - Residual Connection
421
+ - Softmax
422
+ Tasks:
423
+ - Image Classification
424
+ Training Data:
425
+ - ImageNet
426
+ ID: gluon_resnet50_v1c
427
+ Crop Pct: '0.875'
428
+ Image Size: '224'
429
+ Interpolation: bicubic
430
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L105
431
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet50_v1c-48092f55.pth
432
+ Results:
433
+ - Task: Image Classification
434
+ Dataset: ImageNet
435
+ Metrics:
436
+ Top 1 Accuracy: 78.01%
437
+ Top 5 Accuracy: 93.99%
438
+ - Name: gluon_resnet50_v1d
439
+ In Collection: Gloun ResNet
440
+ Metadata:
441
+ FLOPs: 5591002624
442
+ Parameters: 25580000
443
+ File Size: 102573346
444
+ Architecture:
445
+ - 1x1 Convolution
446
+ - Batch Normalization
447
+ - Bottleneck Residual Block
448
+ - Convolution
449
+ - Global Average Pooling
450
+ - Max Pooling
451
+ - ReLU
452
+ - Residual Block
453
+ - Residual Connection
454
+ - Softmax
455
+ Tasks:
456
+ - Image Classification
457
+ Training Data:
458
+ - ImageNet
459
+ ID: gluon_resnet50_v1d
460
+ Crop Pct: '0.875'
461
+ Image Size: '224'
462
+ Interpolation: bicubic
463
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L129
464
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet50_v1d-818a1b1b.pth
465
+ Results:
466
+ - Task: Image Classification
467
+ Dataset: ImageNet
468
+ Metrics:
469
+ Top 1 Accuracy: 79.06%
470
+ Top 5 Accuracy: 94.46%
471
+ - Name: gluon_resnet50_v1s
472
+ In Collection: Gloun ResNet
473
+ Metadata:
474
+ FLOPs: 7019495424
475
+ Parameters: 25680000
476
+ File Size: 102992368
477
+ Architecture:
478
+ - 1x1 Convolution
479
+ - Batch Normalization
480
+ - Bottleneck Residual Block
481
+ - Convolution
482
+ - Global Average Pooling
483
+ - Max Pooling
484
+ - ReLU
485
+ - Residual Block
486
+ - Residual Connection
487
+ - Softmax
488
+ Tasks:
489
+ - Image Classification
490
+ Training Data:
491
+ - ImageNet
492
+ ID: gluon_resnet50_v1s
493
+ Crop Pct: '0.875'
494
+ Image Size: '224'
495
+ Interpolation: bicubic
496
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L156
497
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnet50_v1s-1762acc0.pth
498
+ Results:
499
+ - Task: Image Classification
500
+ Dataset: ImageNet
501
+ Metrics:
502
+ Top 1 Accuracy: 78.7%
503
+ Top 5 Accuracy: 94.25%
504
+ -->
docs/models/.templates/models/gloun-resnext.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # (Gluon) ResNeXt
2
+
3
+ A **ResNeXt** repeats a [building block](https://paperswithcode.com/method/resnext-block) that aggregates a set of transformations with the same topology. Compared to a [ResNet](https://paperswithcode.com/method/resnet), it exposes a new dimension, *cardinality* (the size of the set of transformations) $C$, as an essential factor in addition to the dimensions of depth and width.
4
+
5
+ The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @article{DBLP:journals/corr/XieGDTH16,
17
+ author = {Saining Xie and
18
+ Ross B. Girshick and
19
+ Piotr Doll{\'{a}}r and
20
+ Zhuowen Tu and
21
+ Kaiming He},
22
+ title = {Aggregated Residual Transformations for Deep Neural Networks},
23
+ journal = {CoRR},
24
+ volume = {abs/1611.05431},
25
+ year = {2016},
26
+ url = {http://arxiv.org/abs/1611.05431},
27
+ archivePrefix = {arXiv},
28
+ eprint = {1611.05431},
29
+ timestamp = {Mon, 13 Aug 2018 16:45:58 +0200},
30
+ biburl = {https://dblp.org/rec/journals/corr/XieGDTH16.bib},
31
+ bibsource = {dblp computer science bibliography, https://dblp.org}
32
+ }
33
+ ```
34
+
35
+ <!--
36
+ Type: model-index
37
+ Collections:
38
+ - Name: Gloun ResNeXt
39
+ Paper:
40
+ Title: Aggregated Residual Transformations for Deep Neural Networks
41
+ URL: https://paperswithcode.com/paper/aggregated-residual-transformations-for-deep
42
+ Models:
43
+ - Name: gluon_resnext101_32x4d
44
+ In Collection: Gloun ResNeXt
45
+ Metadata:
46
+ FLOPs: 10298145792
47
+ Parameters: 44180000
48
+ File Size: 177367414
49
+ Architecture:
50
+ - 1x1 Convolution
51
+ - Batch Normalization
52
+ - Convolution
53
+ - Global Average Pooling
54
+ - Grouped Convolution
55
+ - Max Pooling
56
+ - ReLU
57
+ - ResNeXt Block
58
+ - Residual Connection
59
+ - Softmax
60
+ Tasks:
61
+ - Image Classification
62
+ Training Data:
63
+ - ImageNet
64
+ ID: gluon_resnext101_32x4d
65
+ Crop Pct: '0.875'
66
+ Image Size: '224'
67
+ Interpolation: bicubic
68
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L193
69
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnext101_32x4d-b253c8c4.pth
70
+ Results:
71
+ - Task: Image Classification
72
+ Dataset: ImageNet
73
+ Metrics:
74
+ Top 1 Accuracy: 80.33%
75
+ Top 5 Accuracy: 94.91%
76
+ - Name: gluon_resnext101_64x4d
77
+ In Collection: Gloun ResNeXt
78
+ Metadata:
79
+ FLOPs: 19954172928
80
+ Parameters: 83460000
81
+ File Size: 334737852
82
+ Architecture:
83
+ - 1x1 Convolution
84
+ - Batch Normalization
85
+ - Convolution
86
+ - Global Average Pooling
87
+ - Grouped Convolution
88
+ - Max Pooling
89
+ - ReLU
90
+ - ResNeXt Block
91
+ - Residual Connection
92
+ - Softmax
93
+ Tasks:
94
+ - Image Classification
95
+ Training Data:
96
+ - ImageNet
97
+ ID: gluon_resnext101_64x4d
98
+ Crop Pct: '0.875'
99
+ Image Size: '224'
100
+ Interpolation: bicubic
101
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L201
102
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnext101_64x4d-f9a8e184.pth
103
+ Results:
104
+ - Task: Image Classification
105
+ Dataset: ImageNet
106
+ Metrics:
107
+ Top 1 Accuracy: 80.63%
108
+ Top 5 Accuracy: 95.0%
109
+ - Name: gluon_resnext50_32x4d
110
+ In Collection: Gloun ResNeXt
111
+ Metadata:
112
+ FLOPs: 5472648192
113
+ Parameters: 25030000
114
+ File Size: 100441719
115
+ Architecture:
116
+ - 1x1 Convolution
117
+ - Batch Normalization
118
+ - Convolution
119
+ - Global Average Pooling
120
+ - Grouped Convolution
121
+ - Max Pooling
122
+ - ReLU
123
+ - ResNeXt Block
124
+ - Residual Connection
125
+ - Softmax
126
+ Tasks:
127
+ - Image Classification
128
+ Training Data:
129
+ - ImageNet
130
+ ID: gluon_resnext50_32x4d
131
+ Crop Pct: '0.875'
132
+ Image Size: '224'
133
+ Interpolation: bicubic
134
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L185
135
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_resnext50_32x4d-e6a097c1.pth
136
+ Results:
137
+ - Task: Image Classification
138
+ Dataset: ImageNet
139
+ Metrics:
140
+ Top 1 Accuracy: 79.35%
141
+ Top 5 Accuracy: 94.42%
142
+ -->
docs/models/.templates/models/gloun-senet.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # (Gluon) SENet
2
+
3
+ A **SENet** is a convolutional neural network architecture that employs [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block) to enable the network to perform dynamic channel-wise feature recalibration.
4
+
5
+ The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{hu2019squeezeandexcitation,
17
+ title={Squeeze-and-Excitation Networks},
18
+ author={Jie Hu and Li Shen and Samuel Albanie and Gang Sun and Enhua Wu},
19
+ year={2019},
20
+ eprint={1709.01507},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: Gloun SENet
30
+ Paper:
31
+ Title: Squeeze-and-Excitation Networks
32
+ URL: https://paperswithcode.com/paper/squeeze-and-excitation-networks
33
+ Models:
34
+ - Name: gluon_senet154
35
+ In Collection: Gloun SENet
36
+ Metadata:
37
+ FLOPs: 26681705136
38
+ Parameters: 115090000
39
+ File Size: 461546622
40
+ Architecture:
41
+ - Convolution
42
+ - Dense Connections
43
+ - Global Average Pooling
44
+ - Max Pooling
45
+ - Softmax
46
+ - Squeeze-and-Excitation Block
47
+ Tasks:
48
+ - Image Classification
49
+ Training Data:
50
+ - ImageNet
51
+ ID: gluon_senet154
52
+ Crop Pct: '0.875'
53
+ Image Size: '224'
54
+ Interpolation: bicubic
55
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L239
56
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_senet154-70a1a3c0.pth
57
+ Results:
58
+ - Task: Image Classification
59
+ Dataset: ImageNet
60
+ Metrics:
61
+ Top 1 Accuracy: 81.23%
62
+ Top 5 Accuracy: 95.35%
63
+ -->
docs/models/.templates/models/gloun-seresnext.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # (Gluon) SE-ResNeXt
2
+
3
+ **SE ResNeXt** is a variant of a [ResNext](https://www.paperswithcode.com/method/resnext) that employs [squeeze-and-excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block) to enable the network to perform dynamic channel-wise feature recalibration.
4
+
5
+ The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{hu2019squeezeandexcitation,
17
+ title={Squeeze-and-Excitation Networks},
18
+ author={Jie Hu and Li Shen and Samuel Albanie and Gang Sun and Enhua Wu},
19
+ year={2019},
20
+ eprint={1709.01507},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: Gloun SEResNeXt
30
+ Paper:
31
+ Title: Squeeze-and-Excitation Networks
32
+ URL: https://paperswithcode.com/paper/squeeze-and-excitation-networks
33
+ Models:
34
+ - Name: gluon_seresnext101_32x4d
35
+ In Collection: Gloun SEResNeXt
36
+ Metadata:
37
+ FLOPs: 10302923504
38
+ Parameters: 48960000
39
+ File Size: 196505510
40
+ Architecture:
41
+ - 1x1 Convolution
42
+ - Batch Normalization
43
+ - Convolution
44
+ - Global Average Pooling
45
+ - Grouped Convolution
46
+ - Max Pooling
47
+ - ReLU
48
+ - ResNeXt Block
49
+ - Residual Connection
50
+ - Softmax
51
+ - Squeeze-and-Excitation Block
52
+ Tasks:
53
+ - Image Classification
54
+ Training Data:
55
+ - ImageNet
56
+ ID: gluon_seresnext101_32x4d
57
+ Crop Pct: '0.875'
58
+ Image Size: '224'
59
+ Interpolation: bicubic
60
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L219
61
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_seresnext101_32x4d-cf52900d.pth
62
+ Results:
63
+ - Task: Image Classification
64
+ Dataset: ImageNet
65
+ Metrics:
66
+ Top 1 Accuracy: 80.87%
67
+ Top 5 Accuracy: 95.29%
68
+ - Name: gluon_seresnext101_64x4d
69
+ In Collection: Gloun SEResNeXt
70
+ Metadata:
71
+ FLOPs: 19958950640
72
+ Parameters: 88230000
73
+ File Size: 353875948
74
+ Architecture:
75
+ - 1x1 Convolution
76
+ - Batch Normalization
77
+ - Convolution
78
+ - Global Average Pooling
79
+ - Grouped Convolution
80
+ - Max Pooling
81
+ - ReLU
82
+ - ResNeXt Block
83
+ - Residual Connection
84
+ - Softmax
85
+ - Squeeze-and-Excitation Block
86
+ Tasks:
87
+ - Image Classification
88
+ Training Data:
89
+ - ImageNet
90
+ ID: gluon_seresnext101_64x4d
91
+ Crop Pct: '0.875'
92
+ Image Size: '224'
93
+ Interpolation: bicubic
94
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L229
95
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_seresnext101_64x4d-f9926f93.pth
96
+ Results:
97
+ - Task: Image Classification
98
+ Dataset: ImageNet
99
+ Metrics:
100
+ Top 1 Accuracy: 80.88%
101
+ Top 5 Accuracy: 95.31%
102
+ - Name: gluon_seresnext50_32x4d
103
+ In Collection: Gloun SEResNeXt
104
+ Metadata:
105
+ FLOPs: 5475179184
106
+ Parameters: 27560000
107
+ File Size: 110578827
108
+ Architecture:
109
+ - 1x1 Convolution
110
+ - Batch Normalization
111
+ - Convolution
112
+ - Global Average Pooling
113
+ - Grouped Convolution
114
+ - Max Pooling
115
+ - ReLU
116
+ - ResNeXt Block
117
+ - Residual Connection
118
+ - Softmax
119
+ - Squeeze-and-Excitation Block
120
+ Tasks:
121
+ - Image Classification
122
+ Training Data:
123
+ - ImageNet
124
+ ID: gluon_seresnext50_32x4d
125
+ Crop Pct: '0.875'
126
+ Image Size: '224'
127
+ Interpolation: bicubic
128
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_resnet.py#L209
129
+ Weights: https://github.com/rwightman/pytorch-pretrained-gluonresnet/releases/download/v0.1/gluon_seresnext50_32x4d-90cf2d6e.pth
130
+ Results:
131
+ - Task: Image Classification
132
+ Dataset: ImageNet
133
+ Metrics:
134
+ Top 1 Accuracy: 79.92%
135
+ Top 5 Accuracy: 94.82%
136
+ -->
docs/models/.templates/models/gloun-xception.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # (Gluon) Xception
2
+
3
+ **Xception** is a convolutional neural network architecture that relies solely on [depthwise separable convolution](https://paperswithcode.com/method/depthwise-separable-convolution) layers.
4
+
5
+ The weights from this model were ported from [Gluon](https://cv.gluon.ai/model_zoo/classification.html).
6
+
7
+ {% include 'code_snippets.md' %}
8
+
9
+ ## How do I train this model?
10
+
11
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
12
+
13
+ ## Citation
14
+
15
+ ```BibTeX
16
+ @misc{chollet2017xception,
17
+ title={Xception: Deep Learning with Depthwise Separable Convolutions},
18
+ author={François Chollet},
19
+ year={2017},
20
+ eprint={1610.02357},
21
+ archivePrefix={arXiv},
22
+ primaryClass={cs.CV}
23
+ }
24
+ ```
25
+
26
+ <!--
27
+ Type: model-index
28
+ Collections:
29
+ - Name: Gloun Xception
30
+ Paper:
31
+ Title: 'Xception: Deep Learning with Depthwise Separable Convolutions'
32
+ URL: https://paperswithcode.com/paper/xception-deep-learning-with-depthwise
33
+ Models:
34
+ - Name: gluon_xception65
35
+ In Collection: Gloun Xception
36
+ Metadata:
37
+ FLOPs: 17594889728
38
+ Parameters: 39920000
39
+ File Size: 160551306
40
+ Architecture:
41
+ - 1x1 Convolution
42
+ - Convolution
43
+ - Dense Connections
44
+ - Depthwise Separable Convolution
45
+ - Global Average Pooling
46
+ - Max Pooling
47
+ - ReLU
48
+ - Residual Connection
49
+ - Softmax
50
+ Tasks:
51
+ - Image Classification
52
+ Training Data:
53
+ - ImageNet
54
+ ID: gluon_xception65
55
+ Crop Pct: '0.903'
56
+ Image Size: '299'
57
+ Interpolation: bicubic
58
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/gluon_xception.py#L241
59
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/gluon_xception-7015a15c.pth
60
+ Results:
61
+ - Task: Image Classification
62
+ Dataset: ImageNet
63
+ Metrics:
64
+ Top 1 Accuracy: 79.7%
65
+ Top 5 Accuracy: 94.87%
66
+ -->
docs/models/.templates/models/hrnet.md ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HRNet
2
+
3
+ **HRNet**, or **High-Resolution Net**, is a general purpose convolutional neural network for tasks like semantic segmentation, object detection and image classification. It is able to maintain high resolution representations through the whole process. We start from a high-resolution convolution stream, gradually add high-to-low resolution convolution streams one by one, and connect the multi-resolution streams in parallel. The resulting network consists of several ($4$ in the paper) stages and the $n$th stage contains $n$ streams corresponding to $n$ resolutions. The authors conduct repeated multi-resolution fusions by exchanging the information across the parallel streams over and over.
4
+
5
+ {% include 'code_snippets.md' %}
6
+
7
+ ## How do I train this model?
8
+
9
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
10
+
11
+ ## Citation
12
+
13
+ ```BibTeX
14
+ @misc{sun2019highresolution,
15
+ title={High-Resolution Representations for Labeling Pixels and Regions},
16
+ author={Ke Sun and Yang Zhao and Borui Jiang and Tianheng Cheng and Bin Xiao and Dong Liu and Yadong Mu and Xinggang Wang and Wenyu Liu and Jingdong Wang},
17
+ year={2019},
18
+ eprint={1904.04514},
19
+ archivePrefix={arXiv},
20
+ primaryClass={cs.CV}
21
+ }
22
+ ```
23
+
24
+ <!--
25
+ Type: model-index
26
+ Collections:
27
+ - Name: HRNet
28
+ Paper:
29
+ Title: Deep High-Resolution Representation Learning for Visual Recognition
30
+ URL: https://paperswithcode.com/paper/190807919
31
+ Models:
32
+ - Name: hrnet_w18
33
+ In Collection: HRNet
34
+ Metadata:
35
+ FLOPs: 5547205500
36
+ Parameters: 21300000
37
+ File Size: 85718883
38
+ Architecture:
39
+ - Batch Normalization
40
+ - Convolution
41
+ - ReLU
42
+ - Residual Connection
43
+ Tasks:
44
+ - Image Classification
45
+ Training Techniques:
46
+ - Nesterov Accelerated Gradient
47
+ - Weight Decay
48
+ Training Data:
49
+ - ImageNet
50
+ Training Resources: 4x NVIDIA V100 GPUs
51
+ ID: hrnet_w18
52
+ Epochs: 100
53
+ Layers: 18
54
+ Crop Pct: '0.875'
55
+ Momentum: 0.9
56
+ Batch Size: 256
57
+ Image Size: '224'
58
+ Weight Decay: 0.001
59
+ Interpolation: bilinear
60
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L800
61
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w18-8cb57bb9.pth
62
+ Results:
63
+ - Task: Image Classification
64
+ Dataset: ImageNet
65
+ Metrics:
66
+ Top 1 Accuracy: 76.76%
67
+ Top 5 Accuracy: 93.44%
68
+ - Name: hrnet_w18_small
69
+ In Collection: HRNet
70
+ Metadata:
71
+ FLOPs: 2071651488
72
+ Parameters: 13190000
73
+ File Size: 52934302
74
+ Architecture:
75
+ - Batch Normalization
76
+ - Convolution
77
+ - ReLU
78
+ - Residual Connection
79
+ Tasks:
80
+ - Image Classification
81
+ Training Techniques:
82
+ - Nesterov Accelerated Gradient
83
+ - Weight Decay
84
+ Training Data:
85
+ - ImageNet
86
+ Training Resources: 4x NVIDIA V100 GPUs
87
+ ID: hrnet_w18_small
88
+ Epochs: 100
89
+ Layers: 18
90
+ Crop Pct: '0.875'
91
+ Momentum: 0.9
92
+ Batch Size: 256
93
+ Image Size: '224'
94
+ Weight Decay: 0.001
95
+ Interpolation: bilinear
96
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L790
97
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnet_w18_small_v1-f460c6bc.pth
98
+ Results:
99
+ - Task: Image Classification
100
+ Dataset: ImageNet
101
+ Metrics:
102
+ Top 1 Accuracy: 72.34%
103
+ Top 5 Accuracy: 90.68%
104
+ - Name: hrnet_w18_small_v2
105
+ In Collection: HRNet
106
+ Metadata:
107
+ FLOPs: 3360023160
108
+ Parameters: 15600000
109
+ File Size: 62682879
110
+ Architecture:
111
+ - Batch Normalization
112
+ - Convolution
113
+ - ReLU
114
+ - Residual Connection
115
+ Tasks:
116
+ - Image Classification
117
+ Training Techniques:
118
+ - Nesterov Accelerated Gradient
119
+ - Weight Decay
120
+ Training Data:
121
+ - ImageNet
122
+ Training Resources: 4x NVIDIA V100 GPUs
123
+ ID: hrnet_w18_small_v2
124
+ Epochs: 100
125
+ Layers: 18
126
+ Crop Pct: '0.875'
127
+ Momentum: 0.9
128
+ Batch Size: 256
129
+ Image Size: '224'
130
+ Weight Decay: 0.001
131
+ Interpolation: bilinear
132
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L795
133
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnet_w18_small_v2-4c50a8cb.pth
134
+ Results:
135
+ - Task: Image Classification
136
+ Dataset: ImageNet
137
+ Metrics:
138
+ Top 1 Accuracy: 75.11%
139
+ Top 5 Accuracy: 92.41%
140
+ - Name: hrnet_w30
141
+ In Collection: HRNet
142
+ Metadata:
143
+ FLOPs: 10474119492
144
+ Parameters: 37710000
145
+ File Size: 151452218
146
+ Architecture:
147
+ - Batch Normalization
148
+ - Convolution
149
+ - ReLU
150
+ - Residual Connection
151
+ Tasks:
152
+ - Image Classification
153
+ Training Techniques:
154
+ - Nesterov Accelerated Gradient
155
+ - Weight Decay
156
+ Training Data:
157
+ - ImageNet
158
+ Training Resources: 4x NVIDIA V100 GPUs
159
+ ID: hrnet_w30
160
+ Epochs: 100
161
+ Layers: 30
162
+ Crop Pct: '0.875'
163
+ Momentum: 0.9
164
+ Batch Size: 256
165
+ Image Size: '224'
166
+ Weight Decay: 0.001
167
+ Interpolation: bilinear
168
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L805
169
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w30-8d7f8dab.pth
170
+ Results:
171
+ - Task: Image Classification
172
+ Dataset: ImageNet
173
+ Metrics:
174
+ Top 1 Accuracy: 78.21%
175
+ Top 5 Accuracy: 94.22%
176
+ - Name: hrnet_w32
177
+ In Collection: HRNet
178
+ Metadata:
179
+ FLOPs: 11524528320
180
+ Parameters: 41230000
181
+ File Size: 165547812
182
+ Architecture:
183
+ - Batch Normalization
184
+ - Convolution
185
+ - ReLU
186
+ - Residual Connection
187
+ Tasks:
188
+ - Image Classification
189
+ Training Techniques:
190
+ - Nesterov Accelerated Gradient
191
+ - Weight Decay
192
+ Training Data:
193
+ - ImageNet
194
+ Training Resources: 4x NVIDIA V100 GPUs
195
+ Training Time: 60 hours
196
+ ID: hrnet_w32
197
+ Epochs: 100
198
+ Layers: 32
199
+ Crop Pct: '0.875'
200
+ Momentum: 0.9
201
+ Batch Size: 256
202
+ Image Size: '224'
203
+ Weight Decay: 0.001
204
+ Interpolation: bilinear
205
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L810
206
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w32-90d8c5fb.pth
207
+ Results:
208
+ - Task: Image Classification
209
+ Dataset: ImageNet
210
+ Metrics:
211
+ Top 1 Accuracy: 78.45%
212
+ Top 5 Accuracy: 94.19%
213
+ - Name: hrnet_w40
214
+ In Collection: HRNet
215
+ Metadata:
216
+ FLOPs: 16381182192
217
+ Parameters: 57560000
218
+ File Size: 230899236
219
+ Architecture:
220
+ - Batch Normalization
221
+ - Convolution
222
+ - ReLU
223
+ - Residual Connection
224
+ Tasks:
225
+ - Image Classification
226
+ Training Techniques:
227
+ - Nesterov Accelerated Gradient
228
+ - Weight Decay
229
+ Training Data:
230
+ - ImageNet
231
+ Training Resources: 4x NVIDIA V100 GPUs
232
+ ID: hrnet_w40
233
+ Epochs: 100
234
+ Layers: 40
235
+ Crop Pct: '0.875'
236
+ Momentum: 0.9
237
+ Batch Size: 256
238
+ Image Size: '224'
239
+ Weight Decay: 0.001
240
+ Interpolation: bilinear
241
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L815
242
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w40-7cd397a4.pth
243
+ Results:
244
+ - Task: Image Classification
245
+ Dataset: ImageNet
246
+ Metrics:
247
+ Top 1 Accuracy: 78.93%
248
+ Top 5 Accuracy: 94.48%
249
+ - Name: hrnet_w44
250
+ In Collection: HRNet
251
+ Metadata:
252
+ FLOPs: 19202520264
253
+ Parameters: 67060000
254
+ File Size: 268957432
255
+ Architecture:
256
+ - Batch Normalization
257
+ - Convolution
258
+ - ReLU
259
+ - Residual Connection
260
+ Tasks:
261
+ - Image Classification
262
+ Training Techniques:
263
+ - Nesterov Accelerated Gradient
264
+ - Weight Decay
265
+ Training Data:
266
+ - ImageNet
267
+ Training Resources: 4x NVIDIA V100 GPUs
268
+ ID: hrnet_w44
269
+ Epochs: 100
270
+ Layers: 44
271
+ Crop Pct: '0.875'
272
+ Momentum: 0.9
273
+ Batch Size: 256
274
+ Image Size: '224'
275
+ Weight Decay: 0.001
276
+ Interpolation: bilinear
277
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L820
278
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w44-c9ac8c18.pth
279
+ Results:
280
+ - Task: Image Classification
281
+ Dataset: ImageNet
282
+ Metrics:
283
+ Top 1 Accuracy: 78.89%
284
+ Top 5 Accuracy: 94.37%
285
+ - Name: hrnet_w48
286
+ In Collection: HRNet
287
+ Metadata:
288
+ FLOPs: 22285865760
289
+ Parameters: 77470000
290
+ File Size: 310603710
291
+ Architecture:
292
+ - Batch Normalization
293
+ - Convolution
294
+ - ReLU
295
+ - Residual Connection
296
+ Tasks:
297
+ - Image Classification
298
+ Training Techniques:
299
+ - Nesterov Accelerated Gradient
300
+ - Weight Decay
301
+ Training Data:
302
+ - ImageNet
303
+ Training Resources: 4x NVIDIA V100 GPUs
304
+ Training Time: 80 hours
305
+ ID: hrnet_w48
306
+ Epochs: 100
307
+ Layers: 48
308
+ Crop Pct: '0.875'
309
+ Momentum: 0.9
310
+ Batch Size: 256
311
+ Image Size: '224'
312
+ Weight Decay: 0.001
313
+ Interpolation: bilinear
314
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L825
315
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w48-abd2e6ab.pth
316
+ Results:
317
+ - Task: Image Classification
318
+ Dataset: ImageNet
319
+ Metrics:
320
+ Top 1 Accuracy: 79.32%
321
+ Top 5 Accuracy: 94.51%
322
+ - Name: hrnet_w64
323
+ In Collection: HRNet
324
+ Metadata:
325
+ FLOPs: 37239321984
326
+ Parameters: 128060000
327
+ File Size: 513071818
328
+ Architecture:
329
+ - Batch Normalization
330
+ - Convolution
331
+ - ReLU
332
+ - Residual Connection
333
+ Tasks:
334
+ - Image Classification
335
+ Training Techniques:
336
+ - Nesterov Accelerated Gradient
337
+ - Weight Decay
338
+ Training Data:
339
+ - ImageNet
340
+ Training Resources: 4x NVIDIA V100 GPUs
341
+ ID: hrnet_w64
342
+ Epochs: 100
343
+ Layers: 64
344
+ Crop Pct: '0.875'
345
+ Momentum: 0.9
346
+ Batch Size: 256
347
+ Image Size: '224'
348
+ Weight Decay: 0.001
349
+ Interpolation: bilinear
350
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/hrnet.py#L830
351
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-hrnet/hrnetv2_w64-b47cc881.pth
352
+ Results:
353
+ - Task: Image Classification
354
+ Dataset: ImageNet
355
+ Metrics:
356
+ Top 1 Accuracy: 79.46%
357
+ Top 5 Accuracy: 94.65%
358
+ -->
docs/models/.templates/models/ig-resnext.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Instagram ResNeXt WSL
2
+
3
+ A **ResNeXt** repeats a [building block](https://paperswithcode.com/method/resnext-block) that aggregates a set of transformations with the same topology. Compared to a [ResNet](https://paperswithcode.com/method/resnet), it exposes a new dimension, *cardinality* (the size of the set of transformations) $C$, as an essential factor in addition to the dimensions of depth and width.
4
+
5
+ This model was trained on billions of Instagram images using thousands of distinct hashtags as labels exhibit excellent transfer learning performance.
6
+
7
+ Please note the CC-BY-NC 4.0 license on theses weights, non-commercial use only.
8
+
9
+ {% include 'code_snippets.md' %}
10
+
11
+ ## How do I train this model?
12
+
13
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
14
+
15
+ ## Citation
16
+
17
+ ```BibTeX
18
+ @misc{mahajan2018exploring,
19
+ title={Exploring the Limits of Weakly Supervised Pretraining},
20
+ author={Dhruv Mahajan and Ross Girshick and Vignesh Ramanathan and Kaiming He and Manohar Paluri and Yixuan Li and Ashwin Bharambe and Laurens van der Maaten},
21
+ year={2018},
22
+ eprint={1805.00932},
23
+ archivePrefix={arXiv},
24
+ primaryClass={cs.CV}
25
+ }
26
+ ```
27
+
28
+ <!--
29
+ Type: model-index
30
+ Collections:
31
+ - Name: IG ResNeXt
32
+ Paper:
33
+ Title: Exploring the Limits of Weakly Supervised Pretraining
34
+ URL: https://paperswithcode.com/paper/exploring-the-limits-of-weakly-supervised
35
+ Models:
36
+ - Name: ig_resnext101_32x16d
37
+ In Collection: IG ResNeXt
38
+ Metadata:
39
+ FLOPs: 46623691776
40
+ Parameters: 194030000
41
+ File Size: 777518664
42
+ Architecture:
43
+ - 1x1 Convolution
44
+ - Batch Normalization
45
+ - Convolution
46
+ - Global Average Pooling
47
+ - Grouped Convolution
48
+ - Max Pooling
49
+ - ReLU
50
+ - ResNeXt Block
51
+ - Residual Connection
52
+ - Softmax
53
+ Tasks:
54
+ - Image Classification
55
+ Training Techniques:
56
+ - Nesterov Accelerated Gradient
57
+ - Weight Decay
58
+ Training Data:
59
+ - IG-3.5B-17k
60
+ - ImageNet
61
+ Training Resources: 336x GPUs
62
+ ID: ig_resnext101_32x16d
63
+ Epochs: 100
64
+ Layers: 101
65
+ Crop Pct: '0.875'
66
+ Momentum: 0.9
67
+ Batch Size: 8064
68
+ Image Size: '224'
69
+ Weight Decay: 0.001
70
+ Interpolation: bilinear
71
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/resnet.py#L874
72
+ Weights: https://download.pytorch.org/models/ig_resnext101_32x16-c6f796b0.pth
73
+ Results:
74
+ - Task: Image Classification
75
+ Dataset: ImageNet
76
+ Metrics:
77
+ Top 1 Accuracy: 84.16%
78
+ Top 5 Accuracy: 97.19%
79
+ - Name: ig_resnext101_32x32d
80
+ In Collection: IG ResNeXt
81
+ Metadata:
82
+ FLOPs: 112225170432
83
+ Parameters: 468530000
84
+ File Size: 1876573776
85
+ Architecture:
86
+ - 1x1 Convolution
87
+ - Batch Normalization
88
+ - Convolution
89
+ - Global Average Pooling
90
+ - Grouped Convolution
91
+ - Max Pooling
92
+ - ReLU
93
+ - ResNeXt Block
94
+ - Residual Connection
95
+ - Softmax
96
+ Tasks:
97
+ - Image Classification
98
+ Training Techniques:
99
+ - Nesterov Accelerated Gradient
100
+ - Weight Decay
101
+ Training Data:
102
+ - IG-3.5B-17k
103
+ - ImageNet
104
+ Training Resources: 336x GPUs
105
+ ID: ig_resnext101_32x32d
106
+ Epochs: 100
107
+ Layers: 101
108
+ Crop Pct: '0.875'
109
+ Momentum: 0.9
110
+ Batch Size: 8064
111
+ Image Size: '224'
112
+ Weight Decay: 0.001
113
+ Interpolation: bilinear
114
+ Minibatch Size: 8064
115
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/resnet.py#L885
116
+ Weights: https://download.pytorch.org/models/ig_resnext101_32x32-e4b90b00.pth
117
+ Results:
118
+ - Task: Image Classification
119
+ Dataset: ImageNet
120
+ Metrics:
121
+ Top 1 Accuracy: 85.09%
122
+ Top 5 Accuracy: 97.44%
123
+ - Name: ig_resnext101_32x48d
124
+ In Collection: IG ResNeXt
125
+ Metadata:
126
+ FLOPs: 197446554624
127
+ Parameters: 828410000
128
+ File Size: 3317136976
129
+ Architecture:
130
+ - 1x1 Convolution
131
+ - Batch Normalization
132
+ - Convolution
133
+ - Global Average Pooling
134
+ - Grouped Convolution
135
+ - Max Pooling
136
+ - ReLU
137
+ - ResNeXt Block
138
+ - Residual Connection
139
+ - Softmax
140
+ Tasks:
141
+ - Image Classification
142
+ Training Techniques:
143
+ - Nesterov Accelerated Gradient
144
+ - Weight Decay
145
+ Training Data:
146
+ - IG-3.5B-17k
147
+ - ImageNet
148
+ Training Resources: 336x GPUs
149
+ ID: ig_resnext101_32x48d
150
+ Epochs: 100
151
+ Layers: 101
152
+ Crop Pct: '0.875'
153
+ Momentum: 0.9
154
+ Batch Size: 8064
155
+ Image Size: '224'
156
+ Weight Decay: 0.001
157
+ Interpolation: bilinear
158
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/resnet.py#L896
159
+ Weights: https://download.pytorch.org/models/ig_resnext101_32x48-3e41cc8a.pth
160
+ Results:
161
+ - Task: Image Classification
162
+ Dataset: ImageNet
163
+ Metrics:
164
+ Top 1 Accuracy: 85.42%
165
+ Top 5 Accuracy: 97.58%
166
+ - Name: ig_resnext101_32x8d
167
+ In Collection: IG ResNeXt
168
+ Metadata:
169
+ FLOPs: 21180417024
170
+ Parameters: 88790000
171
+ File Size: 356056638
172
+ Architecture:
173
+ - 1x1 Convolution
174
+ - Batch Normalization
175
+ - Convolution
176
+ - Global Average Pooling
177
+ - Grouped Convolution
178
+ - Max Pooling
179
+ - ReLU
180
+ - ResNeXt Block
181
+ - Residual Connection
182
+ - Softmax
183
+ Tasks:
184
+ - Image Classification
185
+ Training Techniques:
186
+ - Nesterov Accelerated Gradient
187
+ - Weight Decay
188
+ Training Data:
189
+ - IG-3.5B-17k
190
+ - ImageNet
191
+ Training Resources: 336x GPUs
192
+ ID: ig_resnext101_32x8d
193
+ Epochs: 100
194
+ Layers: 101
195
+ Crop Pct: '0.875'
196
+ Momentum: 0.9
197
+ Batch Size: 8064
198
+ Image Size: '224'
199
+ Weight Decay: 0.001
200
+ Interpolation: bilinear
201
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/resnet.py#L863
202
+ Weights: https://download.pytorch.org/models/ig_resnext101_32x8-c38310e5.pth
203
+ Results:
204
+ - Task: Image Classification
205
+ Dataset: ImageNet
206
+ Metrics:
207
+ Top 1 Accuracy: 82.7%
208
+ Top 5 Accuracy: 96.64%
209
+ -->
docs/models/.templates/models/inception-resnet-v2.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Inception ResNet v2
2
+
3
+ **Inception-ResNet-v2** is a convolutional neural architecture that builds on the Inception family of architectures but incorporates [residual connections](https://paperswithcode.com/method/residual-connection) (replacing the filter concatenation stage of the Inception architecture).
4
+
5
+ {% include 'code_snippets.md' %}
6
+
7
+ ## How do I train this model?
8
+
9
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
10
+
11
+ ## Citation
12
+
13
+ ```BibTeX
14
+ @misc{szegedy2016inceptionv4,
15
+ title={Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning},
16
+ author={Christian Szegedy and Sergey Ioffe and Vincent Vanhoucke and Alex Alemi},
17
+ year={2016},
18
+ eprint={1602.07261},
19
+ archivePrefix={arXiv},
20
+ primaryClass={cs.CV}
21
+ }
22
+ ```
23
+
24
+ <!--
25
+ Type: model-index
26
+ Collections:
27
+ - Name: Inception ResNet v2
28
+ Paper:
29
+ Title: Inception-v4, Inception-ResNet and the Impact of Residual Connections on
30
+ Learning
31
+ URL: https://paperswithcode.com/paper/inception-v4-inception-resnet-and-the-impact
32
+ Models:
33
+ - Name: inception_resnet_v2
34
+ In Collection: Inception ResNet v2
35
+ Metadata:
36
+ FLOPs: 16959133120
37
+ Parameters: 55850000
38
+ File Size: 223774238
39
+ Architecture:
40
+ - Average Pooling
41
+ - Dropout
42
+ - Inception-ResNet-v2 Reduction-B
43
+ - Inception-ResNet-v2-A
44
+ - Inception-ResNet-v2-B
45
+ - Inception-ResNet-v2-C
46
+ - Reduction-A
47
+ - Softmax
48
+ Tasks:
49
+ - Image Classification
50
+ Training Techniques:
51
+ - Label Smoothing
52
+ - RMSProp
53
+ - Weight Decay
54
+ Training Data:
55
+ - ImageNet
56
+ Training Resources: 20x NVIDIA Kepler GPUs
57
+ ID: inception_resnet_v2
58
+ LR: 0.045
59
+ Dropout: 0.2
60
+ Crop Pct: '0.897'
61
+ Momentum: 0.9
62
+ Image Size: '299'
63
+ Interpolation: bicubic
64
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_resnet_v2.py#L343
65
+ Weights: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/inception_resnet_v2-940b1cd6.pth
66
+ Results:
67
+ - Task: Image Classification
68
+ Dataset: ImageNet
69
+ Metrics:
70
+ Top 1 Accuracy: 0.95%
71
+ Top 5 Accuracy: 17.29%
72
+ -->
docs/models/.templates/models/inception-v3.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Inception v3
2
+
3
+ **Inception v3** is a convolutional neural network architecture from the Inception family that makes several improvements including using [Label Smoothing](https://paperswithcode.com/method/label-smoothing), Factorized 7 x 7 convolutions, and the use of an [auxiliary classifer](https://paperswithcode.com/method/auxiliary-classifier) to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). The key building block is an [Inception Module](https://paperswithcode.com/method/inception-v3-module).
4
+
5
+ {% include 'code_snippets.md' %}
6
+
7
+ ## How do I train this model?
8
+
9
+ You can follow the [timm recipe scripts](https://rwightman.github.io/pytorch-image-models/scripts/) for training a new model afresh.
10
+
11
+ ## Citation
12
+
13
+ ```BibTeX
14
+ @article{DBLP:journals/corr/SzegedyVISW15,
15
+ author = {Christian Szegedy and
16
+ Vincent Vanhoucke and
17
+ Sergey Ioffe and
18
+ Jonathon Shlens and
19
+ Zbigniew Wojna},
20
+ title = {Rethinking the Inception Architecture for Computer Vision},
21
+ journal = {CoRR},
22
+ volume = {abs/1512.00567},
23
+ year = {2015},
24
+ url = {http://arxiv.org/abs/1512.00567},
25
+ archivePrefix = {arXiv},
26
+ eprint = {1512.00567},
27
+ timestamp = {Mon, 13 Aug 2018 16:49:07 +0200},
28
+ biburl = {https://dblp.org/rec/journals/corr/SzegedyVISW15.bib},
29
+ bibsource = {dblp computer science bibliography, https://dblp.org}
30
+ }
31
+ ```
32
+
33
+ <!--
34
+ Type: model-index
35
+ Collections:
36
+ - Name: Inception v3
37
+ Paper:
38
+ Title: Rethinking the Inception Architecture for Computer Vision
39
+ URL: https://paperswithcode.com/paper/rethinking-the-inception-architecture-for
40
+ Models:
41
+ - Name: inception_v3
42
+ In Collection: Inception v3
43
+ Metadata:
44
+ FLOPs: 7352418880
45
+ Parameters: 23830000
46
+ File Size: 108857766
47
+ Architecture:
48
+ - 1x1 Convolution
49
+ - Auxiliary Classifier
50
+ - Average Pooling
51
+ - Average Pooling
52
+ - Batch Normalization
53
+ - Convolution
54
+ - Dense Connections
55
+ - Dropout
56
+ - Inception-v3 Module
57
+ - Max Pooling
58
+ - ReLU
59
+ - Softmax
60
+ Tasks:
61
+ - Image Classification
62
+ Training Techniques:
63
+ - Gradient Clipping
64
+ - Label Smoothing
65
+ - RMSProp
66
+ - Weight Decay
67
+ Training Data:
68
+ - ImageNet
69
+ Training Resources: 50x NVIDIA Kepler GPUs
70
+ ID: inception_v3
71
+ LR: 0.045
72
+ Dropout: 0.2
73
+ Crop Pct: '0.875'
74
+ Momentum: 0.9
75
+ Image Size: '299'
76
+ Interpolation: bicubic
77
+ Code: https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/inception_v3.py#L442
78
+ Weights: https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth
79
+ Results:
80
+ - Task: Image Classification
81
+ Dataset: ImageNet
82
+ Metrics:
83
+ Top 1 Accuracy: 77.46%
84
+ Top 5 Accuracy: 93.48%
85
+ -->