Data wrangling (munging), like most data analytics processes, is an iterative one – the practitioner will need to carry out these steps repeatedly in order to produce the results he desires. There are six broad steps to data wrangling, which are: 1. Discovering In this step, the data is to be understood more deeply. Before implementing methods to clean it, you...
One-hot encoding matrices demonstration
This post will demonstrate onehot encoding for a rating matrix, such as movie lens dataset. One-hot encoding Previously, we introduced a quick note for one-hot encoding. It is a representation of categorical variables as binary vectors. It is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0) Rating matrix If you are...
The intuition of Principal Component Analysis
As PCA and linear autoencoder have a close relation, this post introduces again PCA as a powerful dimension reduction tool while skipping many mathematical proofs. PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly...
deep learning: Linear Autoencoder with Keras
This post introduces using linear autoencoder for dimensionality reduction using TensorFlow and Keras. What is a linear autoencoder An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network...
Recurrent neural network – time-series data- part 1
If you are human and curious about your future, then the recurrent neural network (RNN) is definitely a tool to consider. Part 1 will demonstrate some simple RNNs using TensorFlow 2.0 and Keras functional API. What is RNN An RNN is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence (time series). This...