10 min read

Time Series Analysis - Neural Networks with multiple predictors

1 Introduction

Neural networks can be used not only for “univariate time series”. We can also incorporate other predictors into the model with their help. This is what this post is about.

For this post the dataset Metro_Interstate_Traffic_Volume from the statistic platform “Kaggle” was used. You can download it from my “GitHub Repository”.

2 Import the libraries and the data

import pandas as pd
import numpy as np

from sklearn import preprocessing
from sklearn import metrics
import tensorflow as tf
from xgboost import XGBRegressor

import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')

print(df.shape)
df.head()

The variable ‘traffic_volume’ will be our target variable again.

3 Definition of required functions

def mean_absolute_percentage_error_func(y_true, y_pred):
    '''
    Calculate the mean absolute percentage error as a metric for evaluation
    
    Args:
        y_true (float64): Y values for the dependent variable (test part), numpy array of floats 
        y_pred (float64): Predicted values for the dependen variable (test parrt), numpy array of floats
    
    Returns:
        Mean absolute percentage error 
    '''    
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
def timeseries_evaluation_metrics_func(y_true, y_pred):
    '''
    Calculate the following evaluation metrics:
        - MSE
        - MAE
        - RMSE
        - MAPE
        - R²
    
    Args:
        y_true (float64): Y values for the dependent variable (test part), numpy array of floats 
        y_pred (float64): Predicted values for the dependen variable (test parrt), numpy array of floats
    
    Returns:
        MSE, MAE, RMSE, MAPE and R² 
    '''    
    print('Evaluation metric results: ')
    print(f'MSE is : {metrics.mean_squared_error(y_true, y_pred)}')
    print(f'MAE is : {metrics.mean_absolute_error(y_true, y_pred)}')
    print(f'RMSE is : {np.sqrt(metrics.mean_squared_error(y_true, y_pred))}')
    print(f'MAPE is : {mean_absolute_percentage_error_func(y_true, y_pred)}')
    print(f'R2 is : {metrics.r2_score(y_true, y_pred)}',end='\n\n')
def multiple_data_prep_func(predictors, target, start, end, window, horizon):
    '''
    Prepare univariate data that is suitable for a time series
    
    Args:
        predictors (float64): Scaled values for the predictors, numpy array of floats 
        target (float64): Scaled values for the target variable, numpy array of floats       
        start (int): Start point of range, integer
        end (int): End point of range, integer
        window (int): Number of units to be viewed per step, integer
        horizon (int): Number of units to be predicted, integer
    
    Returns:
        X (float64): Generated X-values for each step, numpy array of floats
        y (float64): Generated y-values for each step, numpy array of floats
    '''   
    X = []
    y = []

    start = start + window
    if end is None:
        end = len(predictors) - horizon

    for i in range(start, end):
        indices = range(i-window, i)
        X.append(predictors[indices])
        indicey = range(i+1, i+1+horizon)
        y.append(target[indicey])
    return np.array(X), np.array(y)

4 Data pre-processing

4.1 Drop Duplicates

df = df.drop_duplicates(subset=['date_time'], keep=False)

df.shape

4.2 Feature Encoding

We have three categorical variables (‘holiday’, ‘weather_main’ and ‘weather_description’) which need to be coded. We use the get_dummies function for this which does the same as One Hot Encoding from Scikit Learn.

# Encode feature 'holiday'
dummy_holiday = pd.get_dummies(df['holiday'], prefix="holiday")
column_name = df.columns.values.tolist()
column_name.remove('holiday')
df = df[column_name].join(dummy_holiday)

# Encode feature 'weather_main'
dummy_weather_main = pd.get_dummies(df['weather_main'], prefix="weather_main")
column_name = df.columns.values.tolist()
column_name.remove('weather_main')
df = df[column_name].join(dummy_weather_main)


# Encode feature 'weather_description'
dummy_weather_description = pd.get_dummies(df['weather_description'], prefix="weather_description")
column_name = df.columns.values.tolist()
column_name.remove('weather_description')
df = df[column_name].join(dummy_weather_description)

# Print final dataframe
print()
print('Shape of new dataframe: ' + str(df.shape))

Now we have increased the number of features from our dataset from 9 to 60.

4.3 Check for Feature Importance

Since not all features are relevant, we can check the Feature Importance at this point. We use XGBoost for this, since this algorithm has a very strong performance for our problem.

column_names_predictors = df.columns.values.tolist()

# Exclude target variable and date_time
column_names_predictors.remove('traffic_volume')
column_names_predictors.remove('date_time')

column_name_criterium = 'traffic_volume'

print('Length of remaining predictors: ' + str(len(column_names_predictors)))
print()
print('Target Variable: ' + str(column_name_criterium))

model = XGBRegressor()
model.fit(df[column_names_predictors],df[column_name_criterium])

Let’s output the features with the corresponding score value, which have been retained by XGBoost.

feature_important = model.get_booster().get_score(importance_type='gain')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.plot(kind='barh')

print()
print('Length of remaining predictors after XGB: ' + str(len(data)))

The calculation of the respective score can be set differently depending on the importance_type. Here is an overview of which calculation types are available:

  • weight - the number of times a feature is used to split the data across all trees.
  • gain - the average gain across all splits the feature is used in.
  • cover - the average coverage across all splits the feature is used in.
  • total_gain - the total gain across all splits the feature is used in.
  • total_cover - the total coverage across all splits the feature is used in.

Let’s create our final dataframe:

# Get column names of remaining predictors after XGB
features_to_keep = list(data.index)
# Append name of target variable
features_to_keep.append(column_name_criterium)

# Create final dataframe
final_df = df[features_to_keep]

print()
print('Length of features_to_keep: ' + str(len(features_to_keep)))
print('(includes 44 predictors and the target variable)')
print()
print('Shape of final dataframe: ' + str(final_df.shape))

4.4 Generate Test Set

Of course, we again need a test set that was not seen in any way by the created neural networks.

test_data = final_df.tail(10)

final_df = final_df.drop(final_df.tail(10).index)

final_df.shape

4.5 Feature Scaling

scaler_x = preprocessing.MinMaxScaler()
scaler_y = preprocessing.MinMaxScaler()

# Here we scale the predictors
x_scaled = scaler_x.fit_transform(final_df.drop(column_name_criterium, axis=1))
# Here we scale the criterium
y_scaled = scaler_y.fit_transform(final_df[[column_name_criterium]])

4.6 Train-Validation Split

In the last post about time series analysis with neural networks I presented two methods:

  • Single Step Style
  • Horizon Style

The single step style is not possible for neural networks with multiple predictors. Why not ? See here:

Here in the Single Step Style at univariate Time Series, we can use the prediction made before for the one that follows.

If we now have multiple predictors, we can determine the one value for the target variable, but we do not have predicted values for our predictors on the basis of which we can make the further predictions.

For this reason, we must limit ourselves to Horizon Style at this point.

# Here we allow the model to see / train the last 48 observations
multi_hist_window_hs = 48
# Here we try to predict the following 10 observations
# Must be the same length as the test_data !
horizon_hs = 10
train_split_hs = 30000

x_train_multi_hs, y_train_multi_hs = multiple_data_prep_func(x_scaled, y_scaled, 
                                                             0, train_split_hs, 
                                                             multi_hist_window_hs, horizon_hs)

x_val_multi_hs, y_val_multi_hs= multiple_data_prep_func(x_scaled, y_scaled, 
                                                        train_split_hs, None, 
                                                        multi_hist_window_hs, horizon_hs)
print ('Length of first Single Window:')
print (len(x_train_multi_hs[0]))
print()
print ('Target horizon:')
print (y_train_multi_hs[0])

4.7 Prepare training and test data using tf

BATCH_SIZE_hs = 256
BUFFER_SIZE_hs = 150

train_multi_hs = tf.data.Dataset.from_tensor_slices((x_train_multi_hs, y_train_multi_hs))
train_multi_hs = train_multi_hs.cache().shuffle(BUFFER_SIZE_hs).batch(BATCH_SIZE_hs).repeat()

validation_multi_hs = tf.data.Dataset.from_tensor_slices((x_val_multi_hs, y_val_multi_hs))
validation_multi_hs = validation_multi_hs.batch(BATCH_SIZE_hs).repeat()

5 Neural Networks with mult. predictors

In the following, I will again use several types of neural networks, which are possible for time series analysis, to check which type of neural network fits our data best.

The following networks will be used:

  • LSTM
  • Bidirectional LSTM
  • GRU
  • Encoder Decoder LSTM
  • CNN

To save me more lines of code later, I’ll set a few parameters for the model training at this point:

n_steps_per_epoch = 117
n_validation_steps = 20
n_epochs = 100

5.1 LSTM

Define Layer Structure

model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(100, input_shape=x_train_multi_hs.shape[-2:],return_sequences=True),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.LSTM(units=100,return_sequences=False),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units=horizon_hs)])

model.compile(loss='mse',
              optimizer='adam')

Fit the model

model_path = 'model/lstm_model_multi.h5'
keras_callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                                    min_delta=0, patience=10, 
                                                    verbose=1, mode='min'),
                   tf.keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss', 
                                                      save_best_only=True, 
                                                      mode='min', verbose=0)]
history = model.fit(train_multi_hs, epochs=n_epochs, steps_per_epoch=n_steps_per_epoch,
                    validation_data=validation_multi_hs, validation_steps=n_validation_steps, verbose =1,
                    callbacks = keras_callbacks)

Validate the model

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)


plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

Test the model

trained_lstm_model_multi = tf.keras.models.load_model(model_path)
df_temp = final_df.drop(column_name_criterium, axis=1)
test_horizon = df_temp.tail(multi_hist_window_hs)
test_history = test_horizon.values


test_scaled = scaler_x.fit_transform(test_history)
test_scaled = test_scaled.reshape(1, test_scaled.shape[0], test_scaled.shape[1])



# Inserting the model
predicted_results = trained_lstm_model_multi.predict(test_scaled)
predicted_results

predicted_inv_trans = scaler_y.inverse_transform(predicted_results)
predicted_inv_trans

timeseries_evaluation_metrics_func(test_data[column_name_criterium], predicted_inv_trans[0])

rmse_lstm_model_multi = np.sqrt(metrics.mean_squared_error(test_data[column_name_criterium], predicted_inv_trans[0]))
plt.plot(list(test_data[column_name_criterium]))
plt.plot(list(predicted_inv_trans[0]))
plt.title("Actual vs Predicted")
plt.ylabel("Traffic volume")
plt.legend(('Actual','predicted'))
plt.show()

5.2 Bidirectional LSTM

Define Layer Structure

model = tf.keras.models.Sequential([
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(150, return_sequences=True), 
                                  input_shape=x_train_multi_hs.shape[-2:]),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50)),
    tf.keras.layers.Dense(20, activation='tanh'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units=horizon_hs)])

model.compile(loss='mse',
              optimizer='adam')

Fit the model

model_path = 'model/bi_lstm_model_multi.h5'
keras_callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                                    min_delta=0, patience=10, 
                                                    verbose=1, mode='min'),
                   tf.keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss', 
                                                      save_best_only=True, 
                                                      mode='min', verbose=0)]
history = model.fit(train_multi_hs, epochs=n_epochs, steps_per_epoch=n_steps_per_epoch,
                    validation_data=validation_multi_hs, validation_steps=n_validation_steps, verbose =1,
                    callbacks = keras_callbacks)

Validate the model

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)


plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

Test the model

trained_bi_lstm_model_multi = tf.keras.models.load_model(model_path)
df_temp = final_df.drop(column_name_criterium, axis=1)
test_horizon = df_temp.tail(multi_hist_window_hs)
test_history = test_horizon.values


test_scaled = scaler_x.fit_transform(test_history)
test_scaled = test_scaled.reshape(1, test_scaled.shape[0], test_scaled.shape[1])

# Inserting the model
predicted_results = trained_bi_lstm_model_multi.predict(test_scaled)
predicted_results

predicted_inv_trans = scaler_y.inverse_transform(predicted_results)
predicted_inv_trans

timeseries_evaluation_metrics_func(test_data[column_name_criterium], predicted_inv_trans[0])

rmse_bi_lstm_model_multi = np.sqrt(metrics.mean_squared_error(test_data[column_name_criterium], predicted_inv_trans[0]))
plt.plot(list(test_data[column_name_criterium]))
plt.plot(list(predicted_inv_trans[0]))
plt.title("Actual vs Predicted")
plt.ylabel("Traffic volume")
plt.legend(('Actual','predicted'))
plt.show()

5.3 GRU

Define Layer Structure

model = tf.keras.models.Sequential([
    tf.keras.layers.GRU(100, input_shape=x_train_multi_hs.shape[-2:],return_sequences=True),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.GRU(units=50,return_sequences=False),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units=horizon_hs)])

model.compile(loss='mse',
              optimizer='adam')

Fit the model

model_path = 'model/gru_model_multi.h5'
keras_callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                                    min_delta=0, patience=10, 
                                                    verbose=1, mode='min'),
                   tf.keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss', 
                                                      save_best_only=True, 
                                                      mode='min', verbose=0)]
history = model.fit(train_multi_hs, epochs=n_epochs, steps_per_epoch=n_steps_per_epoch,
                    validation_data=validation_multi_hs, validation_steps=n_validation_steps, verbose =1,
                    callbacks = keras_callbacks)

Validate the model

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)


plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

Test the model

trained_gru_model_multi = tf.keras.models.load_model(model_path)
df_temp = final_df.drop(column_name_criterium, axis=1)
test_horizon = df_temp.tail(multi_hist_window_hs)
test_history = test_horizon.values


test_scaled = scaler_x.fit_transform(test_history)
test_scaled = test_scaled.reshape(1, test_scaled.shape[0], test_scaled.shape[1])

# Inserting the model
predicted_results = trained_gru_model_multi.predict(test_scaled)
predicted_results

predicted_inv_trans = scaler_y.inverse_transform(predicted_results)
predicted_inv_trans

timeseries_evaluation_metrics_func(test_data[column_name_criterium], predicted_inv_trans[0])

rmse_gru_model_multi = np.sqrt(metrics.mean_squared_error(test_data[column_name_criterium], predicted_inv_trans[0]))
plt.plot(list(test_data[column_name_criterium]))
plt.plot(list(predicted_inv_trans[0]))
plt.title("Actual vs Predicted")
plt.ylabel("Traffic volume")
plt.legend(('Actual','predicted'))
plt.show()

5.4 Encoder Decoder LSTM

Define Layer Structure

model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(40, input_shape=x_train_multi_hs.shape[-2:], return_sequences=True),
    tf.keras.layers.LSTM(units=20,return_sequences=True),
    tf.keras.layers.LSTM(units=15),
    tf.keras.layers.RepeatVector(y_train_multi_hs.shape[1]), 
    tf.keras.layers.LSTM(units=40,return_sequences=True),
    tf.keras.layers.LSTM(units=25,return_sequences=True),
    tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(units=1))])

model.compile(loss='mse',
              optimizer='adam')

Fit the model

model_path = 'model/ed_lstm_model_multi.h5'
keras_callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                                    min_delta=0, patience=10, 
                                                    verbose=1, mode='min'),
                   tf.keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss', 
                                                      save_best_only=True, 
                                                      mode='min', verbose=0)]
history = model.fit(train_multi_hs, epochs=n_epochs, steps_per_epoch=n_steps_per_epoch,
                    validation_data=validation_multi_hs, validation_steps=n_validation_steps, verbose =1,
                    callbacks = keras_callbacks)

Validate the model

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)


plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

Test the model

trained_ed_lstm_model_multi = tf.keras.models.load_model(model_path)
df_temp = final_df.drop(column_name_criterium, axis=1)
test_horizon = df_temp.tail(multi_hist_window_hs)
test_history = test_horizon.values


test_scaled = scaler_x.fit_transform(test_history)
test_scaled = test_scaled.reshape(1, test_scaled.shape[0], test_scaled.shape[1])

# Inserting the model
predicted_results = trained_ed_lstm_model_multi.predict(test_scaled)
predicted_results

predicted_inv_trans = scaler_y.inverse_transform(predicted_results.reshape(-1,1))
predicted_inv_trans

timeseries_evaluation_metrics_func(test_data[column_name_criterium], predicted_inv_trans)

rmse_ed_lstm_model_multi = np.sqrt(metrics.mean_squared_error(test_data[column_name_criterium], predicted_inv_trans))
plt.plot(list(test_data[column_name_criterium]))
plt.plot(list(predicted_inv_trans))
plt.title("Actual vs Predicted")
plt.ylabel("Traffic volume")
plt.legend(('Actual','predicted'))
plt.show()

5.5 CNN

Define Layer Structure

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', 
                                 input_shape=(x_train_multi_hs.shape[1], x_train_multi_hs.shape[2])))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(30, activation='relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(units=horizon_hs))

model.compile(loss='mse',
              optimizer='adam')

Fit the model

model_path = 'model/cnn_model_multi.h5'
keras_callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                                    min_delta=0, patience=10, 
                                                    verbose=1, mode='min'),
                   tf.keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss', 
                                                      save_best_only=True, 
                                                      mode='min', verbose=0)]
history = model.fit(train_multi_hs, epochs=n_epochs, steps_per_epoch=n_steps_per_epoch,
                    validation_data=validation_multi_hs, validation_steps=n_validation_steps, verbose =1,
                    callbacks = keras_callbacks)

Validate the model

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)


plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

Test the model

trained_cnn_model_multi = tf.keras.models.load_model(model_path)
df_temp = final_df.drop(column_name_criterium, axis=1)
test_horizon = df_temp.tail(multi_hist_window_hs)
test_history = test_horizon.values


test_scaled = scaler_x.fit_transform(test_history)
test_scaled = test_scaled.reshape(1, test_scaled.shape[0], test_scaled.shape[1])

# Inserting the model
predicted_results = trained_cnn_model_multi.predict(test_scaled)
predicted_results

predicted_inv_trans = scaler_y.inverse_transform(predicted_results)
predicted_inv_trans

timeseries_evaluation_metrics_func(test_data[column_name_criterium], predicted_inv_trans[0])

rmse_cnn_model_multi = np.sqrt(metrics.mean_squared_error(test_data[column_name_criterium], predicted_inv_trans[0]))
plt.plot(list(test_data[column_name_criterium]))
plt.plot(list(predicted_inv_trans[0]))
plt.title("Actual vs Predicted")
plt.ylabel("Traffic volume")
plt.legend(('Actual','predicted'))
plt.show()

6 Get the Best Model

Let’s see which model performs best:

column_names = ["Model", "RMSE"]
df = pd.DataFrame(columns = column_names)

rmse_lstm_model_multi_df = pd.DataFrame([('lstm_model_multi', rmse_lstm_model_multi)], columns=column_names)
df = df.append(rmse_lstm_model_multi_df)

rmse_bi_lstm_model_multi_df = pd.DataFrame([('bi_lstm_model_multi', rmse_bi_lstm_model_multi)], columns=column_names)
df = df.append(rmse_bi_lstm_model_multi_df)

rmse_gru_model_multi_df = pd.DataFrame([('gru_model_multi', rmse_gru_model_multi)], columns=column_names)
df = df.append(rmse_gru_model_multi_df)

rmse_ed_lstm_model_multi_df = pd.DataFrame([('ed_lstm_model_multi', rmse_ed_lstm_model_multi)], columns=column_names)
df = df.append(rmse_ed_lstm_model_multi_df)

rmse_cnn_model_multi_df = pd.DataFrame([('cnn_model_multi', rmse_cnn_model_multi)], columns=column_names)
df = df.append(rmse_cnn_model_multi_df)

df

best_model = df.sort_values(by='RMSE', ascending=True)
best_model

As we can see, the CNN model fits best and outperforms the other models by far.

However, it should be mentioned at this point that the neural networks created performed even better with univariate time series than with the use of multiple predictors.

7 Conclusion & Overview

In this post, I showed how to do time series analysis using neural networks with the inclusion of multiple predictors.

Looking back, I would like to give a summary of the different posts on the topic of time series analysis:

References

The content of this post was inspired by:

Kaggle: Time Series Analysis using LSTM Keras from Hassan Amin

Chollet, F. (2018). Deep learning with Python (Vol. 361). New York: Manning.

Vishwas, B. V., & Patel, A. (2020). Hands-on Time Series Analysis with Python. New York: Apress. DOI: 10.1007/978-1-4842-5992-4

Medium: Time Series Forecast Using Deep Learning from Rajaram Suryanarayanan