Table of contents
This post introduces using linear autoencoder for dimensionality reduction using TensorFlow and Keras.
What is a linear autoencoder
An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”.

Autoencoders consists of two main parts: encoder and decoder (figure 1). They work by automatically encoding data based on input values, then performing an activation function, and finally decoding the data for output. A bottleneck (the h
layer(s)) of some sort imposed on the input features, compressing them into fewer categories. Thus, if some inherent structure exists within the data, the autoencoder model will identify and leverage it to get the output.
A linear autoencoder uses zero or more linear activation function in its layers.
other variations
- Denoising AutoEncoders: Another regularization technique in which we take a modified version of our input values with some of our input values turned in to 0 randomly.
- Sparse AutoEncoders: Where the hidden layer is greater than the input layer but a regularization technique is applied to reduce overfitting. Adds a constraint on the loss function, preventing the autoencoder from using all its nodes at a time.
- Contractive AutoEncoders: Adds a penalty to the loss function to prevent overfitting and copying of values when the hidden layer is greater than the input layer.
- Stacked AutoEncoders: When you add another hidden layer, you get a stacked autoencoder. It has 2 stages of encoding and 1 stage of decoding.
When to use
- Dimensionality reduction/Feature detection
- Building powerful recommendation systems
- Encoding features in massive datasets
Note that: Under a certain circumstance, the solutions for linear autoencoders are those provided by PCA. We will cover PCA in another post. For an encoder on graph data, follow this link.
example with Keras and TF.2.x
We first generate two-class data with 3 dimensions. Then we will use a linear autoencoder to encode (compress) the input data into 2-dimensional data.
1 2 3 4 5 6 7 8 | import numpy as np import matplotlib.pyplot as plt %matplotlib inline %tensorflow_version 2.x import tensorflow as tf from tensorflow.keras.layers import Input, Dense from tensorflow.keras.models import Sequential |
The data is generated by sklearn.datasets
.
1 2 | from sklearn.datasets import make_blobs data = make_blobs(n_samples=100, n_features=3, centers=2, random_state=101) |
We now process data with a minmaxscaler
. For each value in a feature, MinMaxScaler subtracts the minimum value in the feature and then divides by the range. The range is the difference between the original maximum and original minimum.
1 2 3 | from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() scaled_data = scaler.fit_transform(data[0])#only the data (not the labels) |
Now, we plot the data:
1 2 3 4 5 6 7 8 9 | from mpl_toolkits.mplot3d import Axes3D fig = plt.figure() ax = fig.add_subplot(111, projection='3d') data_x = scaled_data[:,0]#col 1 data_y = scaled_data[:,1]#col 2 data_z = scaled_data[:,2]#col 3 ax.scatter(data_x, data_y, data_z, c=data[1]) |

Build an autoencoder model
1 2 3 | num_inputs = 3 #input dimensions num_hidden = 2 #output dimensions in hidden layer h num_outputs = num_inputs #output and input have the same dim |
Next, we build the model from the defined parameters. As this is a linear one, we don’t use any activation function.
1 2 3 4 5 6 7 8 | inputs = Input(num_inputs) hidden = Dense(num_hidden) outputs = Dense(num_outputs) model = Sequential([inputs,hidden, outputs], name='autoencoder') optimizer = tf.keras.optimizers.Adam(learning_rate) model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['mse']) model.summary() |
1 2 3 4 5 6 7 8 9 10 11 | Model: "encoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_15 (Dense) (None, 2) 8 _________________________________________________________________ dense_16 (Dense) (None, 3) 9 ================================================================= Total params: 17 Trainable params: 17 Non-trainable params: 0 |
Train the model
1 2 3 4 | history = model.fit(x=scaled_data, y=scaled_data, epochs=1000, verbose=False) plt.plot(history.history['loss'], 'r-', label='loss') plt.plot(history.history['mse'], 'bo', label='mse') plt.legend() |

Get the encoded data
After training, now we can pass the data and get the output from the encoder
or the output of the hidden layer.
1 2 3 | encoded = hidden(scaled_data) encoded.shape #TensorShape([100, 2]) |
We can see that the output of the hidden layer has only 2 dimensions. Now we can check if the data is still linearly separable after dimensionality reduction.
1 | plt.scatter(encoded[:,0], encoded[:,1], c=data[1]) |
