In part 1, we introduced a simple RNN for time-series data. To continue, this article applies a deep version of RNN on a real dataset to predict monthly milk production.
Monthly milk production: pounds per cow. Jan 1962 – Dec 1975. You can download the data using this link.
Download: CSV file
The data contains the production of 168 months (14 years). We will use an RNN to predict the last 12 month of 1975 and compare with the real data. First, let have a look at the data.
#import libraries %tensorflow_version 2.x import numpy as np import pandas as pd import matplotlib.pyplot as plt import tensorflow.keras as keras %matplotlib inline
Read the data to a pandas’ data frame and check the first few rows.
dataURL = 'monthly-milk-production.csv' milk = pd.read_csv(dataURL, index_col='Month') milk.head(5)
Make the index a time series by using
to_datetime and plot the data for 14 years:
milk.index = pd.to_datetime(milk.index) plt.plot(milk, label='monthly milk product') plt.legend()
We can clearly see the trend and pattern of each year. Now, we will visualize the pattern by averaging the values of each month.
#monthly average marray = milk.to_numpy(dtype=np.int32).reshape(int(len(milk)/12), 12) ma = np.mean(marray, axis=0) plt.plot(ma, label='monthly average Jan-Dec') plt.legend()
The production peaks in the summer (middle of the years) and bottoms in the winter period.
Preprocess the training data
We now split data into two parts, train and test set. We don’t want a random train test split, we want to specify that the test set is the last 12 months of data is the test set, with everything before it is the training.
milk_train = milk.iloc[:-12] milk_test = milk.iloc[-13:] milk_train.shape, milk_test.shape #((156, 1), (13, 1))
Next, we use
sklearn.preprocessing to scale the data using the MinMaxScaler. Remember to only
fit) on the training data, then transform the test data.
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() train_scaled = scaler.fit_transform(milk_train) test_scaled = scaler.fit_transform(milk_test)
Create batch training data
We create a helper function to generate all training data at once. Note that the pre-generated training data like this is not really efficient for computation. We will cover the use of a data generator with Keras in the next post.
def generate_training_batches(training_data, n_samples = 400, batch_size=1, steps=12): x = np.ndarray(shape=( batch_size, steps), dtype=np.float32) y = np.ndarray(shape=(batch_size, steps), dtype=np.float32) for i in range(n_samples -1): # Grab a random starting point for each batch rand_start = np.random.randint(0,len(training_data)-steps) # Create Y data for time series in the batches y_batch = np.array(training_data[rand_start:rand_start+steps+1]).reshape(1,steps+1) x = np.concatenate([x, y_batch[:, :-1].reshape(-1, steps)]) y = np.concatenate([y, y_batch[:, 1:].reshape(-1, steps)]) return x, y
So every 12 data points at step
t we have another 12 data points at steps
t+1 (true values).
trainbx, trainby = generate_training_batches(train_scaled) trainbx = trainbx.reshape(400, 1, 12) trainby = trainby.reshape(400, 1, 12)
Setting Up The RNN Model
This time, we implement a deeper version of RNN with Long Short Term Memory (LSTM). LSTM has feedback connections. It can not only process single data points (such as images), but also entire sequences of data.
model = keras.Sequential([keras.layers.LSTM(40, return_sequences=True, input_shape=[None,12]), keras.layers.BatchNormalization(), keras.layers.LSTM(20, return_sequences=True), keras.layers.Dense(12)]) model.compile(optimizer='adam', loss='mse')
We can perform the training and plot the progress.
history = model.fit(trainbx, trainby, epochs=200, verbose=False)
The milk Prediction
After training, we can predict the twelve months’ production.
milkpredict = model.predict(test_scaled[:-1].reshape(1,1,12)) plt.plot(milkpredict.reshape(12,1), label='prediction') plt.plot(test_scaled[1:], label='real values') plt.legend()
To sum up
The deep RNN produced pretty good results. It can capture the pattern of monthly milk production. However, the training process can be improved by using a data generator rather than a pre-generated set.