Tests and Confidence Intervals
Test or Confidence Interval is a way to estimate the error we do in our estimation because of the lack of data.
Tests
Tests are the backbone spine of statistical modeling. You can ever confirm or deny the totality of your work in a second with it. But to be transparent, tests are still quite fuzzy for me. So I will explain what I understood:
In order to choose between an hypothetical model named the null hypothesis and an other model named alternative hypothesis, you can apply a methodological framework: a test. For that, you can use either a method with a quantile or with a p-value.
- Test with quantile
- Define a statistical model:
- Define:
- Null hypothesis :
- Alternative hypothesis :
-
Create a statistic that helps distinguish between and . Under , the distribution of is denoted by (i.e., the law of under ).
-
Fix a significance level (e.g., ) and use the quantile of order of the distribution , denoted by and defined by: . This means that, under , the probability that exceeds is exactly
(Note: For a one-sided test with large values of as evidence against ; adjust accordingly for two-sided or left-tailed tests.)
- Compute the observed value of the test statistic from the data:
- If , we are in a rare event under ⇒ reject .
- Otherwise, we do not reject .
- Exercise
- Tips
- Result
Give the statistical test of level 10% to check if the normal law is center in 2 or not. You have two samples equal to .
and you have a test of each side: : and : and
- : and : and
- with the TCL so equal to
- We reject if and so we reject !
- Test with p-value
- Define a statistical model:
- Define:
- Null hypothesis :
- Alternative hypothesis :
- Construct statistic , a function of the data that helps distinguish between and .
Under , the distribution (or "law") of is denoted:
-
Compute the Observed Value of the statistic .
-
Compute the p-value. The p-value is the probability, under the null hypothesis, of observing a value of the test statistic as extreme or more extreme than the one observed:
(Note: For a one-sided test with large values of as evidence against ; adjust accordingly for two-sided or left-tailed tests.)
- Choose a significance level , typically .
- If p-value : Reject
- If p-value : Do not reject
- Exercise
- Tips
- Result
Give the statistical test of level 10% to check if the normal law is center in 2 or not. You have two samples equal to .
and you have a test of each side: : and : and
- : and : and
- so and so
- so we reject !
As you saw, tests really depend of the statistic (and the law associated) that you choose ! As I said, I am not fluent in test but I have to precise that a lot of different statistic/law can be found: Wald, Fisher, Student, , Likelihood-ratio, etc. Some of them are really useful in a specific context. For example, when:
- you don't know the variance, use a Student law.
- , use a Wald or a Likelihood-ratio statistic (which follow a law).
Also you have to not be dumb and use the good side of your density to create a good rejection area. Some mathematician try to automatize this (Neyman-Pearson, etc.) but that's complicate the process for not so much.
If your aim is to understand machine learning, you will see that we can go really far with tests.
TODO: puissance d'un test