Skip to main content

Appendix

TODOOOOOOOOOOOOOOO

This appendix is intended to go into more detail on certain subjects that have been skimmed over in this chapter, because of their minor impact on the overall understanding of the subject.

Although not essential, I encourage you to read this page.

Measure

Bayes

Bayesian statistics are another statistical framework that is often contrasted with frequentist statistics. In theoretical statistics, there is a battle between these two fields. In machine learning, we are not dealing with theory, so we don't care: we use both!

Maximum A Posteriori Estimation

The Bayesian framework comes from Bayes' formula, which in its Bayesian notation is written as:

π(θx)=f(xθ)π(θ)f(x)=f(xθ)π(θ)f(xθ)π(θ)dθ \pi(\theta|x) = \frac{f(x|\theta) \pi(\theta)}{f(x)}=\frac{f(x|\theta) \pi(\theta)}{\int f(x|\theta') \pi(\theta')d\theta'}

where π(θ)\pi(\theta) is the prior law and π(θx)\pi(\theta|x) the posterior law.

You can see that frequentists and Bayesians do not view statistics in the same way. The former believe that probability is the long-term frequency of a random event, while the latter believe that probability measures a belief or subjective degree of uncertainty regarding an event or parameter.

note

A lot of "learning Bayesians statistics" is to learn the links between prior law and posterior law when you put some data from an other law, i.e the conjugate laws.

The Maximum a posteriori estimation is the estimator given by

θ^MAP:=argmaxθΘπ(θx)=argmaxθΘf(xθ)π(θ)\hat \theta_{MAP}:=\arg\max_{\theta \in \Theta}\pi(\theta|x)=\arg\max_{\theta \in \Theta}f(x|\theta) \pi(\theta)
note

When π(θ)\pi(\theta) is a uniform law, it coincides with the MLE.

Let's estimate the parameters θ\theta of B(θ)\mathcal B(\theta)