Skip to main content

Reminders

Suppose f:KRf:K \rightarrow \mathbb R

The differential of ff is a linear function dfdf such that

f(x+h)=f(x)+dxf(h)+o(h) f(x+h)=f(x)+d_xf(h)+o(||h||)

with f(a)=o(g(a))lima0f(a)g(a)=0f(a)=o(g(a)) \Leftrightarrow \lim_{a \rightarrow 0} \frac{f(a)}{g(a)}=0.

Suppose f(x)=2x2f(x)=2x^2,

f(x+h)=2(x+h)2=2x2+4xh+2h2=f(x)+dxf(h)+o(h)f(x+h)=2(x+h)^2=2x^2+4xh+2h^2=f(x)+d_xf(h)+o(||h||) with dxf=4xd_xf=4x

The derivative of a function ff is hf(x)=limt0f(x+th)f(x)t\partial_h f(x) = \lim_{t \rightarrow 0} \frac{f(x+th)-f(x)}{t}

If dxf(h)d_xf(h) exist, then hf(x)\partial_h f(x) also exist.

note

The opposite can not be true.

f(x,y)=xyx2+y2f(x,y) = \frac{xy}{\sqrt{x^2+y^2}} if (x,y)(0,0)(x,y) \neq (0,0) else f(x,y)=f(0,0)=0f(x,y) = f(0,0) = 0

f(x,y)=xyx2+y2=r2cos(θ)sin(θ)r=rcos(θ)sin(θ)r00f(x,y)=\frac{xy}{\sqrt{x^2+y^2}}=\frac{r^2 \cos(\theta)\sin(\theta)}{r}=r \cos(\theta)\sin(\theta) \xrightarrow{r \rightarrow 0} 0, continue !

xf(0,0)=limt0f(th,0)f(0,0)t=0\partial_x f(0, 0)=\lim_{t \rightarrow 0} \frac{f(th, 0)-f(0,0)}{t}=0 and yf(0,0)=limt0f(0,th)f(0,0)t=0\partial_y f(0, 0)=\lim_{t \rightarrow 0} \frac{f(0, th)-f(0,0)}{t}=0

f(0+h1,0+h2)=(0+h1)(0+h2)(0+h1)2+(0+h2)2=h1h2h12+h22f(0+h_1,0+h_2)=\frac{(0+h_1)(0+h_2)}{\sqrt{(0+h_1)^2+(0+h_2)^2}}=\frac{h_1h_2}{\sqrt{h_1^2+h_2^2}}

but h1h2h12+h22=h1h2h\frac{h_1h_2}{\sqrt{h_1^2+h_2^2}}=\frac{h_1h_2}{||h||} is not a o(h)o(||h||) cause if we use (xn,yn)=(1/n,1/n)h1h2/hh=1/2(x_n,y_n)=(1/n,1/n) \Rightarrow \frac{h_1h_2/||h||}{||h||} = 1/2 !

The gradient is the unique vector f\nabla f such that dxf(h)=<f(x)h>d_xf(h)=<\nabla f(x)|h>, i.e f(x)=(x1f(x)x2f(x)xnf(x))\nabla f(x) = \begin{pmatrix} \partial_{x_1}f(x) \\ \partial_{x_2}f(x) \\ \vdots \\ \partial_{x_n}f(x) \end{pmatrix}

The Jacobian matrix is the generalization of the gradient but for function ff that outputs in multidimensional space, Jf(x)=(x1f1(x)x2f1(x)xnf1(x)x1f2(x)x2f2(x)xnf2(x)x1fm(x)x2fm(x)xnfm(x))J_f(x) = \begin{pmatrix} \partial_{x_1}f_1(x) & \partial_{x_2}f_1(x) & \cdots & \partial_{x_n}f_1(x) \\ \partial_{x_1}f_2(x) & \partial_{x_2}f_2(x) & \cdots & \partial_{x_n}f_2(x) \\ \vdots & \vdots & \ddots & \vdots \\ \partial_{x_1}f_m(x) & \partial_{x_2}f_m(x) & \cdots & \partial_{x_n}f_m(x) \end{pmatrix}

The Hessian matrix is the second order of the gradient,

Hf(x)=2f=(x12f(x)x1,x22f(x)x1,xn2f(x)x2,x12f(x)x22f(x)x2,xn2f(x)xn,x12f(x)xn,x22f(x)xn2f(x))H_f(x) = \nabla^2 f = \begin{pmatrix} \partial^2_{x_1}f(x) & \partial^2_{x_1, x_2}f(x) & \cdots & \partial^2_{x_1, x_n}f(x) \\ \partial^2_{x_2, x_1}f(x) & \partial^2_{x_2}f(x) & \cdots & \partial^2_{x_2, x_n}f(x) \\ \vdots & \vdots & \ddots & \vdots \\ \partial^2_{x_n,x_1}f(x) & \partial^2_{x_n,x_2}f(x) & \cdots & \partial^2_{x_n}f(x) \end{pmatrix}

note

Be careful because sometimes the denotation 2f\nabla^2 f mean <ff><\nabla f|\nabla f>