|
 |
|
 |
|
 |
|
|
|
# Learned Optimizer |
|
|
|
Code for [Learned Optimizers that Scale and Generalize](https://arxiv.org/abs/1703.04813). |
|
|
|
## Requirements |
|
|
|
* Bazel ([install](https://bazel.build/versions/master/docs/install.html)) |
|
* TensorFlow >= v1.3 |
|
* Python 2.7.x |
|
|
|
## Training a Learned Optimizer |
|
|
|
## Code Overview |
|
In the top-level directory, ```metaopt.py``` contains the code to train and test a learned optimizer. ```metarun.py``` packages the actual training procedure into a |
|
single file, defining and exposing many flags to tune the procedure, from selecting the optimizer type and problem set to more fine-grained hyperparameter settings. |
|
There is no testing binary; testing can be done ad-hoc via ```metaopt.test_optimizer``` by passing an optimizer object and a directory with a checkpoint. |
|
|
|
The ```optimizer``` directory contains a base ```trainable_optimizer.py``` class and a number of extensions, including the ```hierarchical_rnn``` optimizer used in |
|
the paper, a ```coordinatewise_rnn``` optimizer that more closely matches previous work, and a number of simpler optimizers to demonstrate the basic mechanics of |
|
a learnable optimizer. |
|
|
|
The ```problems``` directory contains the code to build the problems that were used in the meta-training set. |
|
|
|
### Binaries |
|
```metarun.py```: meta-training of a learned optimizer |
|
|
|
### Command-Line Flags |
|
The flags most relevant to meta-training are defined in ```metarun.py```. The default values will meta-train a HierarchicalRNN optimizer with the hyperparameter |
|
settings used in the paper. |
|
|
|
### Using a Learned Optimizer as a Black Box |
|
The ```trainable_optimizer``` inherits from ```tf.train.Optimizer```, so a properly instantiated version can be used to train any model in any APIs that accept |
|
this class. There are just 2 caveats: |
|
|
|
1. If using the Hierarchical RNN optimizer, the apply_gradients return type must be changed (see comments inline for what exactly must be removed) |
|
|
|
2. Care must be taken to restore the variables from the optimizer without overriding them. Optimizer variables should be loaded manually using a pretrained checkpoint |
|
and a ```tf.train.Saver``` with only the optimizer variables. Then, when constructing the session, ensure that any automatic variable initialization does not |
|
re-initialize the loaded optimizer variables. |
|
|
|
## Contact for Issues |
|
|
|
* Olga Wichrowska (@olganw), Niru Maheswaranathan (@nirum) |
|
|