Previously, we train our model using the pre-generated dataset, for example, in the recommender system or recurrent neural network. In this article, we will demonstrate using a generator to produce data on the fly for training a model. Keras Data Generator with Sequence There are a couple of ways to create a data generator. However, Tensorflow Keras provides a base class to fit dataset as a...

## Data Wrangling quick note

Data wrangling (munging), like most data analytics processes, is an iterative one – the practitioner will need to carry out these steps repeatedly in order to produce the results he desires. There are six broad steps to data wrangling, which are: 1. Discovering In this step, the data is to be understood more deeply. Before implementing methods to clean it, you...

## One-hot encoding matrices demonstration

This post will demonstrate onehot encoding for a rating matrix, such as movie lens dataset. One-hot encoding Previously, we introduced a quick note for one-hot encoding. It is a representation of categorical variables as binary vectors. It is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0) Rating matrix If you are...

## The intuition of Principal Component Analysis

As PCA and linear autoencoder have a close relation, this post introduces again PCA as a powerful dimension reduction tool while skipping many mathematical proofs. PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly...

## deep learning: Linear Autoencoder with Keras

This post introduces using linear autoencoder for dimensionality reduction using TensorFlow and Keras. What is a linear autoencoder An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network...