In this article, we review some common metrics and their uses for two main ML problems, i.e. regression and classification. Regression Metrics Most of the blogs have focussed on classification metrics like precision, recall, AUC etc. For a change, I wanted to explore all kinds of metrics including those used in regression as well. MAE and RMSE are the two most popular metrics for continuous...
Advanced Keras – Custom loss functions
When working on machine learning problems, sometimes you want to construct your own custom loss function(s). This article will introduce abstract Keras backend for that purpose. Keras loss functions From Keras loss documentation, there are several built-in loss functions, e.g. mean_absolute_percentage_error, cosine_proximity, kullback_leibler_divergence etc. When compiling a Keras model, we often...
Latent Dirichlet Allocation (LDA) and Topic ModelLing in Python
Topic modelling is a type of statistical modelling for discovering the abstract “topics” that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of a topic model and is used to classify text in a document to a particular topic. It builds a topic per document model and words per topic model, modelled as Dirichlet distributions. Here, we are...
K-Means vs K-Nearest neighbours quick note
These are completely different methods in machine learning. The fact that they both have the letter K in their name is a coincidence. K-means is a clustering algorithm that tries to partition a set of points into K sets (clusters) such that the points in each cluster tend to be near each other. It is unsupervised because the points have no external classification. The typical k-means...
Lasso vs Ridge vs Elastic Net – Machine learning
Lasso, Ridge, and Elastic Net are excellent methods to improve the performance of your linear model. This post will summarise the usage of these regularization techniques. Bias: Biases are the underlying assumptions that are made by data to simplify the target function. Bias does help us generalize the data better and make the model less sensitive to single data points. It also decreases the...