There is much confusion for beginners in machine learning. One of the frequently asked questions is the difference between predictor vs. estimator. Let get some note:

### Different usage

“Prediction” and “estimation” indeed are sometimes used interchangeably in non-technical writing and they seem to function similarly, but there is a sharp distinction between them in the standard model of a statistical problem. **An estimator uses current data to guess at some fact (or a parameter) from the data while a predictor uses the data to guess at some random value that is not part of the dataset.**

**Estimator example:**

You can also think of an **estimator** as the rule that creates an estimate. For **example**, the **sample** mean(x̄) is an **estimator** for the population mean, **μ**.

In this example, data are assumed to constitute a (possibly multivariate) observation **x** of a random variable **X** whose distribution is known only to lie within a definite set of possible distributions, the “*states of nature*“. An *estimator* ** t** is a mathematical procedure that assigns to each possible value of

**x**some property

**t(x)**of a state of nature

**θ**, such as its mean

**μ(θ)**. Thus

**an estimate is a guess about the true state of nature.**We can tell how good an estimate is by comparing t(x) to μ(θ).

Another example for a given sample x, the “error” of the estimator \(\widehat {\theta }\) is defined as

\(e(x)=\widehat {\theta }(x) -\theta \),

where \({\theta }\) is the parameter being estimated. The error, e, depends not only on the estimator (the estimation formula or procedure) but also on the sample.

#### Predictor example:

Given the data of housing market in New Zealand, let say 1000 houses, create a predictor to predict the price of a house that is, of course, **not **in the dataset.

A *predictor*** p(x) **concerns the independent observation of another random variable **Z** whose distribution is related to the true state of nature. **A prediction is a guess about another random value.** We can tell how good a particular prediction is only by comparing **p(x)** to the value realized by **Z**. We hope that *on average* the agreement will be good (in the sense of averaging over all possible outcomes **x** *and* simultaneously over all possible values of **Z**).

### Tips

**In practice, you can distinguish estimators from predictors in two ways:**

*purpose*: an estimator seeks to know a property of the true state of nature, while a prediction seeks to guess the outcome of a random variable; and*uncertainty*: a predictor usually has larger uncertainty than a related estimator, due to the added uncertainty in the outcome of that random variable. Well-documented and -described predictors therefore usually come with uncertainty bands–prediction intervals–that are wider than the uncertainty bands of estimators, known as confidence intervals. A characteristic feature of prediction intervals is that they can (hypothetically) shrink as the dataset grows, but they will not shrink to zero width–the uncertainty in the random outcome is “irreducible”–whereas the widths of confidence intervals will tend to shrink to zero, corresponding to our intuition that the precision of an estimate can become arbitrarily good with sufficient amounts of data.