Bias vs Variance Quick note

Whenever we discuss prediction models, it’s important to understand prediction errors, i.e. bias and variance. A proper understanding of these concepts would help us not only to build accurate models but also to avoid the mistake of over-fitting and under-fitting.

We quickly explain the two concepts using the following illustration.

Suppose that a man is trying to shoot in the bull’s eye. His shooting skill can be considered the prediction model. The shooting results are the model’s prediction.

What is bias?

Bias shows the difference between the prediction (average) and the correct value.
If the shoot results are far-away from the bull’s eye, the bias is high and likewise.

Some causes of high bias:

Oversimplifies the model
Not taking into account all the key features
Not enough data
Wrong model selection

What is variance?

Variance shows the spread of our data.
Or the variability of model prediction for a given data point or a value

Some causes on high variance:

Noisy training dataset
Sparse dataset
Algorithm lack of generalization to capture the underlying patterns

Overfitting and Underfitting

Under-fitting: often high bias + low variance
Over-fitting: often low bias + high variance, good at training dataset, bad at testing dataset

💬Cancel reply

Bob Vu says:
June 20, 2019 at 12:41 am

1. Under-fit (high bias): More training data doesn’t help, so don’t waste time on collecting more data.
2. Over-fit (high variance): getting more training data is likely to help.
Choosing reasonable number of features, degree of polynomial, and appropriate regularization parameter (lambda) is the key to keep balance between Overfit and Underfit.
Training set (60%), Cross Verification Set (20%), Test Set (20%) is helpful in choosing the best polynomial degree and regularization parameter

What is bias?

Some causes of high bias:

What is variance?

Some causes on high variance:

Overfitting and Underfitting

Share this:

1 comment

💬Cancel reply

Read more

Categories