File size: 8,443 Bytes
97b6013
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
![No Maintenance Intended](https://img.shields.io/badge/No%20Maintenance%20Intended-%E2%9C%95-red.svg)
![TensorFlow Requirement: 1.x](https://img.shields.io/badge/TensorFlow%20Requirement-1.x-brightgreen)
![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg)

# Filtering Variational Objectives

This folder contains a TensorFlow implementation of the algorithms from

Chris J. Maddison\*, Dieterich Lawson\*, George Tucker\*, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. "Filtering Variational Objectives." NIPS 2017.

[https://arxiv.org/abs/1705.09279](https://arxiv.org/abs/1705.09279)

This code implements 3 different bounds for training sequential latent variable models: the evidence lower bound (ELBO), the importance weighted auto-encoder bound (IWAE), and our bound, the filtering variational objective (FIVO).

Additionally it contains several sequential latent variable model implementations:

* Variational recurrent neural network (VRNN)
* Stochastic recurrent neural network (SRNN)
* Gaussian hidden Markov model with linear conditionals (GHMM)

The VRNN and SRNN can be trained for sequence modeling of pianoroll and speech data. The GHMM is trainable on a synthetic dataset, useful as a simple example of an analytically tractable model.

#### Directory Structure
The important parts of the code are organized as follows.

```
run_fivo.py         # main script, contains flag definitions
fivo
├─smc.py            # a sequential Monte Carlo implementation
├─bounds.py         # code for computing each bound, uses smc.py
├─runners.py        # code for VRNN and SRNN training and evaluation
├─ghmm_runners.py   # code for GHMM training and evaluation
├─data
| ├─datasets.py     # readers for pianoroll and speech datasets
| ├─calculate_pianoroll_mean.py  # preprocesses the pianoroll datasets
| └─create_timit_dataset.py      # preprocesses the TIMIT dataset
└─models
  ├─base.py   # base classes used in other models
  ├─vrnn.py   # VRNN implementation
  ├─srnn.py   # SRNN implementation
  └─ghmm.py   # Gaussian hidden Markov model (GHMM) implementation
bin
├─run_train.sh            # an example script that runs training
├─run_eval.sh             # an example script that runs evaluation
├─run_sample.sh           # an example script that runs sampling
├─run_tests.sh            # a script that runs all tests
└─download_pianorolls.sh  # a script that downloads pianoroll files
```

### Pianorolls

Requirements before we start:

* TensorFlow (see [tensorflow.org](http://tensorflow.org) for how to install)
* [scipy](https://www.scipy.org/)
* [sonnet](https://github.com/deepmind/sonnet)


#### Download the Data

The pianoroll datasets are encoded as pickled sparse arrays and are available at [http://www-etud.iro.umontreal.ca/~boulanni/icml2012](http://www-etud.iro.umontreal.ca/~boulanni/icml2012). You can use the script `bin/download_pianorolls.sh` to download the files into a directory of your choosing.
```
export PIANOROLL_DIR=~/pianorolls
mkdir $PIANOROLL_DIR
sh bin/download_pianorolls.sh $PIANOROLL_DIR
```

#### Preprocess the Data

The script `calculate_pianoroll_mean.py` loads a pianoroll pickle file, calculates the mean, updates the pickle file to include the mean under the key `train_mean`, and writes the file back to disk in-place. You should do this for all pianoroll datasets you wish to train on.

```
python data/calculate_pianoroll_mean.py --in_file=$PIANOROLL_DIR/piano-midi.de.pkl
python data/calculate_pianoroll_mean.py --in_file=$PIANOROLL_DIR/nottingham.de.pkl
python data/calculate_pianoroll_mean.py --in_file=$PIANOROLL_DIR/musedata.pkl
python data/calculate_pianoroll_mean.py --in_file=$PIANOROLL_DIR/jsb.pkl
```

#### Training

Now we can train a model. Here is the command for a standard training run, taken from `bin/run_train.sh`:
```
python run_fivo.py \
  --mode=train \
  --logdir=/tmp/fivo \
  --model=vrnn \
  --bound=fivo \
  --summarize_every=100 \
  --batch_size=4 \
  --num_samples=4 \
  --learning_rate=0.0001 \
  --dataset_path="$PIANOROLL_DIR/jsb.pkl" \
  --dataset_type="pianoroll"
```

You should see output that looks something like this (with extra logging cruft):

```
Saving checkpoints for 0 into /tmp/fivo/model.ckpt.
Step 1, fivo bound per timestep: -11.322491
global_step/sec: 7.49971
Step 101, fivo bound per timestep: -11.399275
global_step/sec: 8.04498
Step 201, fivo bound per timestep: -11.174991
global_step/sec: 8.03989
Step 301, fivo bound per timestep: -11.073008
```
#### Evaluation

You can also evaluate saved checkpoints. The `eval` mode loads a model checkpoint, tests its performance on all items in a dataset, and reports the log-likelihood averaged over the dataset. For example here is a command, taken from `bin/run_eval.sh`, that will evaluate a JSB model on the test set:

```
python run_fivo.py \
  --mode=eval \
  --split=test \
  --alsologtostderr \
  --logdir=/tmp/fivo \
  --model=vrnn \
  --batch_size=4 \
  --num_samples=4 \
  --dataset_path="$PIANOROLL_DIR/jsb.pkl" \
  --dataset_type="pianoroll"
```

You should see output like this:
```
Restoring parameters from /tmp/fivo/model.ckpt-0
Model restored from step 0, evaluating.
test elbo ll/t: -12.198834, iwae ll/t: -11.981187 fivo ll/t: -11.579776
test elbo ll/seq: -748.564789, iwae ll/seq: -735.209206 fivo ll/seq: -710.577141
```
The evaluation script prints log-likelihood in both nats per timestep (ll/t) and nats per sequence (ll/seq) for all three bounds.

#### Sampling

You can also sample from trained models. The `sample` mode loads a model checkpoint, conditions the model on a prefix of a randomly chosen datapoint, samples a sequence of outputs from the conditioned model, and writes out the samples and prefix to a `.npz` file in `logdir`. For example here is a command that samples from a model trained on JSB, taken from `bin/run_sample.sh`:
```
python run_fivo.py \
  --mode=sample \
  --alsologtostderr \
  --logdir="/tmp/fivo" \
  --model=vrnn \
  --bound=fivo \
  --batch_size=4 \
  --num_samples=4 \
  --split=test \
  --dataset_path="$PIANOROLL_DIR/jsb.pkl" \
  --dataset_type="pianoroll" \
  --prefix_length=25 \
  --sample_length=50
```

Here `num_samples` denotes the number of samples used when conditioning the model as well as the number of trajectories to sample for each prefix.

You should see very little output.
```
Restoring parameters from /tmp/fivo/model.ckpt-0
Running local_init_op.
Done running local_init_op.
```

Loading the samples with `np.load` confirms that we conditioned the model on 4
prefixes of length 25 and sampled 4 sequences of length 50 for each prefix.
```
>>> import numpy as np
>>> x = np.load("/tmp/fivo/samples.npz")
>>> x[()]['prefixes'].shape
(25, 4, 88)
>>> x[()]['samples'].shape
(50, 4, 4, 88)
```

### Training on TIMIT

The TIMIT speech dataset is available at the [Linguistic Data Consortium website](https://catalog.ldc.upenn.edu/LDC93S1), but is unfortunately not free. These instructions will proceed assuming you have downloaded the TIMIT archive and extracted it into the directory `$RAW_TIMIT_DIR`.

#### Preprocess TIMIT

We preprocess TIMIT (as described in our paper) and write it out to a series of TFRecord files. To prepare the TIMIT dataset use the script `create_timit_dataset.py`
```
export $TIMIT_DIR=~/timit_dataset
mkdir $TIMIT_DIR
python data/create_timit_dataset.py \
  --raw_timit_dir=$RAW_TIMIT_DIR \
  --out_dir=$TIMIT_DIR
```
You should see this exact output:
```
4389 train / 231 valid / 1680 test
train mean: 0.006060  train std: 548.136169
```

#### Training on TIMIT
This is very similar to training on pianoroll datasets, with just a few flags switched.
```
python run_fivo.py \
  --mode=train \
  --logdir=/tmp/fivo \
  --model=vrnn \
  --bound=fivo \
  --summarize_every=100 \
  --batch_size=4 \
  --num_samples=4 \
  --learning_rate=0.0001 \
  --dataset_path="$TIMIT_DIR/train" \
  --dataset_type="speech"
```
Evaluation and sampling are similar.

### Tests
This codebase comes with a number of tests to verify correctness, runnable via `bin/run_tests.sh`. The tests are also useful to look at for examples of how to use the code.

### Contact

This codebase is maintained by Dieterich Lawson. For questions and issues please open an issue on the tensorflow/models issues tracker and assign it to @dieterichlawson.