Table of contents

In part 1, we introduced a simple RNN for time-series data. To continue, this article applies a deep version of RNN on a real dataset to predict monthly milk production.

### The data

Monthly milk production: pounds per cow. Jan 1962 – Dec 1975. You can download the data using this link.

Download: CSV file

The data contains the production of 168 months (14 years). We will use an RNN to predict the last 12 month of 1975 and compare with the real data. First, let have a look at the data.

1 2 3 4 5 6 7 | #import libraries %tensorflow_version 2.x import numpy as np import pandas as pd import matplotlib.pyplot as plt import tensorflow.keras as keras %matplotlib inline |

Read the data to a pandas’ data frame and check the first few rows.

1 2 3 | dataURL = 'monthly-milk-production.csv' milk = pd.read_csv(dataURL, index_col='Month') milk.head(5) |

Month | Milk Production |
---|---|

1962-01-01 01:00:00 | 589.0 |

1962-02-01 01:00:00 | 561.0 |

1962-03-01 01:00:00 | 640.0 |

1962-04-01 01:00:00 | 656.0 |

1962-05-01 01:00:00 | 727.0 |

Make the index a time series by using `to_datetime`

and plot the data for 14 years:

1 2 3 | milk.index = pd.to_datetime(milk.index) plt.plot(milk, label='monthly milk product') plt.legend() |

We can clearly see the trend and pattern of each year. Now, we will visualize the pattern by averaging the values of each month.

1 2 3 4 5 6 | #monthly average marray = milk.to_numpy(dtype=np.int32).reshape(int(len(milk)/12), 12) ma = np.mean(marray, axis=0) plt.plot(ma, label='monthly average Jan-Dec') plt.legend() |

The production peaks in the summer (middle of the years) and bottoms in the winter period.

### Preprocess the training data

We now split data into two parts, train and test set. We don’t want a random train test split, we want to specify that the test set is the last 12 months of data is the test set, with everything before it is the training.

1 2 3 4 | milk_train = milk.iloc[:-12] milk_test = milk.iloc[-13:] milk_train.shape, milk_test.shape #((156, 1), (13, 1)) |

Next, we use `sklearn.preprocessing`

to scale the data using the MinMaxScaler. Remember to only `fit_transform`

(not `fit`

) on the training data, then transform the test data.

1 2 3 4 | from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() train_scaled = scaler.fit_transform(milk_train) test_scaled = scaler.fit_transform(milk_test) |

### Create batch training data

We create a helper function to generate all training data at once. Note that the pre-generated training data like this is **not really efficient** for computation. We will cover the use of a data generator with Keras in the next post.

1 2 3 4 5 6 7 8 9 10 11 12 13 | def generate_training_batches(training_data, n_samples = 400, batch_size=1, steps=12): x = np.ndarray(shape=( batch_size, steps), dtype=np.float32) y = np.ndarray(shape=(batch_size, steps), dtype=np.float32) for i in range(n_samples -1): # Grab a random starting point for each batch rand_start = np.random.randint(0,len(training_data)-steps) # Create Y data for time series in the batches y_batch = np.array(training_data[rand_start:rand_start+steps+1]).reshape(1,steps+1) x = np.concatenate([x, y_batch[:, :-1].reshape(-1, steps)]) y = np.concatenate([y, y_batch[:, 1:].reshape(-1, steps)]) return x, y |

So every 12 data points at step `t`

we have another 12 data points at steps `t+1`

(true values).

1 2 3 | trainbx, trainby = generate_training_batches(train_scaled) trainbx = trainbx.reshape(400, 1, 12) trainby = trainby.reshape(400, 1, 12) |

### Setting Up The RNN Model

This time, we implement a deeper version of RNN with Long Short Term Memory (LSTM). LSTM has feedback connections. It can not only process single data points (such as images), but also entire sequences of data.

1 2 3 4 5 6 | model = keras.Sequential([keras.layers.LSTM(40, return_sequences=True, input_shape=[None,12]), keras.layers.BatchNormalization(), keras.layers.LSTM(20, return_sequences=True), keras.layers.Dense(12)]) model.compile(optimizer='adam', loss='mse') |

We can perform the training and plot the progress.

1 | history = model.fit(trainbx, trainby, epochs=200, verbose=False) |

### The milk Prediction

After training, we can predict the twelve months’ production.

1 2 3 4 5 | milkpredict = model.predict(test_scaled[:-1].reshape(1,1,12)) plt.plot(milkpredict.reshape(12,1), label='prediction') plt.plot(test_scaled[1:], label='real values') plt.legend() |

### To sum up

The deep RNN produced pretty good results. It can capture the pattern of monthly milk production. However, the training process can be improved by using a data generator rather than a pre-generated set.