Estimators
We create estimators from sufficient statistics. These estimators have some good properties and sometimes we can even know if they are the best.
- Statistic
A statistic is a function of the sample which not depend directly of an unknown parameter .
- Estimator
An estimator of is a r.v which is measurable and calculable on the sample .
- Exemple
The estimator of the mean is the sample mean:
An estimator is not only to estimate parameters of a law! It can be any function which is a statistic. But if it is the case, we denote the estimator of the parameter .
- Sufficient
A statistic is sufficient for when not depend of . That's mean that the statistic give all the information we need about .
- Exemple
Suppose :
- is a statistic
- is a sufficient statistic
- is an estimator of
- Proposition
A statistic is sufficient for iff the density of the sample is factorisable like
- Biais
The bias of an estimator is
The estimator is unbiased when
- Variance
The variance of an estimator is
- Risk
The risk of an estimator is with a loss function.
- Exemple
The Mean Squared Error (MSE) is the risk with the loss function
For any minimization of the risk, you have a bias-variance trade off to deal with.
Proof for the MSE:
Asymptotic
- Consistency
- is weakly consistent when denote
- is strongly consistent (or almost sure) when denote
- is consistent in distribution when continue and bounded, denote
- is consistent in risk when
For risks in the following forms: with , we denote
- Proposition
and
Asymptotic laws
- Law of large number
If i.i.d with . Then,
- Central Limit Theorem
If i.i.d with and . Then,
- Delta Method
If i.i.d with , and a function differentiable in . Then,
- General Delta Method
If with and function differentiable in . Then,
Methods for creating estimators
Method of Moments
- Moment
The moment k of is
- Empirical Moment
We deduce the empirical moment k of with is
- Plugin Method
The aim is to describe the parameter with the moments of and then plugin the empirical moment to get .
- Describe the parameters with the moments . You may have a system if .
- Resolve the system for .
- Plugin the to get .
- Exemple 1
- Exemple 2
Let's find the parameters of .
- and
- Already resolve
- and
Let's find the parameter of .
For complex variable, it is often we can't compute the moments. We can try to do it numerically.
Maximum Likelihood Estimation
- Maximum likelihood estimation (MLE)
The Maximum likelihood estimation is the estimator given by
- Method
The method is the classic method to find a maximum:
- Compute the gradient .
- Find all critical points by resolving .
- Find all the extrema by checking if on the critical point.
- Choose one extrema as .
Be careful because in real life, the likelihood has a lot of local extrema! Moreover, if the model is not regular, the method have to be modify.
- Exemple
Let's estimate the parameters of
- Checking: OK!
For complex variable, it is often we can't compute the likelihood. We can try to do it numerically but we have to try best to not fall into local extrema.
- Proposition
If is a sufficient statistic for , then is a function a .
But don't have to be sufficient.
- Theorem
is the MLE of .
- Proposition
Suppose:
- : model identifiable
- : compact and
- :
Then, is strongly consistent
- Theorem
If is consistent, the model is regular and is invertible.
Then,
So with the delta method,
If you want to know where the come from, check the Fisher Information
Other methods
There are a lot of different methods, I give you an non-exhaustive list:
- M-estimator
- Estimating Equation
- Empirical Likelihood is a unparametric framework
- Bayes Estimation (see Appendix for more information about the Bayes framework)
- Maximum a posteriori estimation
You can find others in the Estimation Theory wiki page.
What is a good estimator?
In general
There is no uniform way to say "I have the best estimator". Most of the time that depend of what you are looking for. Sometimes you can't accept bias, sometimes you need to have a strong consistency and sometimes you just want to minimize you risk.
But you know that mathematicians don't like the answer the answer "that's depend". So they create a function that define a characteristic of the model: the Fisher Information. Sometime it can help to find the best estimator !
With the Fisher Information
- Score
The score is the vector
- Proposition
The score is centered, i.e
- Proposition
For a regular model, the score is additive, i.e
- Fisher Information
The Fisher Information the variance matrix of this score:
Fisher's information is related to the precision with which the parameter is estimated.
If the model is i.i.d, we denote the Fisher Information.
- Proposition
Each sample give the same information, i.e
- Proposition
For any statistic ,
- Proposition
If the model is regular, then the Fisher Information is symmetrical, positive semi-definite and
- Fréchet-Darmois-Cramér-Rao Bound
If the model is regular and is invertible.
Then,
And with a function,
And,
The lower bound is the Cramer-Rao bound !
- Efficiency
An estimator unbiased is efficient when touch the Cramer-Rao bound.
- Theorem
is efficient iff the family of law is an exponential family (i.e ) or following the form with
The efficiency is the way to say "my estimator is the best" (among the unbiased) !
You can proof that your sufficient statistic is the best among the best (i.e total, complete, etc) with some properties (Lehman-Scheffé, etc.) but to be honest I never used it in practice so I skip it 😅