Whenever we discuss prediction models, it’s important to understand prediction errors, i.e. bias and variance. A proper understanding of these concepts would help us not only to build accurate models but also to avoid the mistake of over-fitting and under-fitting.
We quickly explain the two concepts using the following illustration.
Suppose that a man is trying to shoot in the bull’s eye. His shooting skill can be considered the prediction model. The shooting results are the model’s prediction.
What is bias?
- Bias shows the difference between the prediction (average) and the correct value.
- If the shoot results are far-away from the bull’s eye, the bias is high and likewise.
Some causes of high bias:
- Oversimplifies the model
- Not taking into account all the key features
- Not enough data
- Wrong model selection
What is variance?
- Variance shows the spread of our data.
- Or the variability of model prediction for a given data point or a value
Some causes on high variance:
- Noisy training dataset
- Sparse dataset
- Algorithm lack of generalization to capture the underlying patterns
Overfitting and Underfitting
- Under-fitting: often high bias + low variance
- Over-fitting: often low bias + high variance, good at training dataset, bad at testing dataset