Introduction
- Feature Space
The feature space . Usually but you can also have qualitative variable.
- Target Space
The target space . It can be ever compact or countable .
- Dataset
A dataset is a couple where r.v from
You can split a dataset in a train and a test dataset or a train, a validation and a test dataset.
- Hypothesis class
The hypothesis class is the set with a predictor.
- Learning rule
A learning rule is a mapping from training data to hypotheses in a given hypothesis class, i.e
By habit, we will not note the conditioning to the dataset of the learning rule:
- Loss function
A loss function is
- Risk
The risk of a predictor is
Depending the dataset that you use (i.e train, validation, test) you can have different type of risk. The most important is the generalization one with the test dataset.
- Empirical risk
Most of time, we don't know the law of the data so we need to estimate with
You can also estimate via cross validation, bootstrap, etc.
- Bayes Risk
The bayes risk is the best possible risk from the hypothesis class, i.e
- Excess risk
The excess risk is
- Decomposition of the Empirical Risk
That is the approximation/estimation error (same than bias/variance)