A.I, Data and Software Engineering

Math for ML – Vector norms quick note


Vector norms are used in many machine learning and computer science problems. This article covers some common norms and related applications.

From a high school entrance exam…

Remember the day (?/?/1998) when I took an exam to a high school, there was a problem of finding the shortest path from A to B knowing that the person can only go left/right or up/down given the following grid of m x n blocks.


You may find the shortest path is m +n, simply the person should go down and right only. Accordingly, they add up to the sum of the length and the width of the grid. Also, the shortest path is not unique.

This is a practical problem in our real world to find the shortest routes since in many cases, we just can not follow a straight line connecting A and B. For example, finding the way to your nearest train station by foot/car, not the aeroplane.

So, we can see that there is more than one way to measure the distance from A to B depending on the (movement/constraint) condition between 2 points. The solution of the example is known as manhattan distance and it is closely related to the concept of vector norms which we introduce below.

Vector norms

Definition: A norm on a vector space V is a function:

\(\|\cdot\|: V \to R \\\hspace{1cm} x \to \|x\| \)

which asigns each vector \(\vec{x}\) its length \(\|x\| \in R\), such that for all \(lambda \in R\) and $\atex x, y \in V$ the following hold:

  • Absolutely homogeneous: \(\|\lambda x\| = |\lambda|\|x\|\)
  • Triangle inequality: \(\|x + y \| \le \|x\| + \|y\|\)
  • Positive definite: \(\|x\| \ge 0 \text{ and }\|x\| = 0 \leftrightarrow x = 0 \)
triangle inequality
triangle inequality

Manhattan norm – l1

The manhattan norm on \(R^n\) is defined for \(x \in R^n\) as

$$\|x\|_1 := \sum_{i=1}^{n}|x_i|,$$

where \(|\cdot|\) is the absolute value. The left panel of the equation shows all vectors \(x \in R^2\) with \(\|x\|_1 = 1\). The Manhattan norm is also called \(l_1 norm\).


The Euclidean norm of \(x \in R^n\) is defined as

$$\|x\|_2 := \sqrt{\sum_{i=1}^{n}x_i^2} = \sqrt{x^Tx}$$

and computes the Euclidean distance of x from the origin. The right panel of the equation shows all vectors \(x \in R^2\) with \(\|x\|_2 = 1\).

The Euclidean norm is also called l2 norm.

p-norm (\(L^p\) spaces)

Many may wonder where the letter l come from. It is after the name of mathematician Henri Lebesgue. For higher-order norm, we call them \(l^p\) norm (Lebesgue spaces).

Let \(p \ge 1\) e a real number. The p-norm (also called \(l_p-norm\) of vector \(x=(x_1, \dots, x_n)\) is

$$\|x\|_p := (\sum_{i=1}^{n}|x_i|^p)^{1/p}$$

Frobenius norm

It is a common norm for matrix.

$$\|A\|_\text{F} = \sqrt{\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2} = \sqrt{{trace}\left(A^* A\right)} = \sqrt{\sum_{i=1}^{\min\{m, n\}} \sigma_i^2(A)},$$

where \(\sigma _{i}(A)\)  are the singular values of matrix A.

Quick note

\(\|x\|_0 \): zero norm is not a norm in the usual sense as it lacks the homogeneity property. It is the number of non-zero elements of x.

\( \|x\|_1\) and \( \|x\|_2\) (or Frobenius norm in case of matrices) are commonly used as l1 and l2 regularization in machine learning to reduce the overfitting problem (see this article).

\(\|x\|_\infty:=max(|x|, \dots, |x_n|)\): the max element in x.


Math for machine learning, https://mml-book.github.io/book/mml-book.pdf

Add comment

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Pin It on Pinterest


You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Petaminds will use the information you provide on this form to be in touch with you and to provide updates.