refer to : https://www.microsoft.com/en-us/research/wp-content/uploads/2016/05/prml-slides-3.pdf http://norman3.github.io/prml/docs/chapter03/0
Linear: $wx + b$
Regression : 연속형 응답 변수를 하나 이상의 예측 변수의 함수로 설명하는데 사용되는 통계 모델링 기법
Example: Polynomial Curve Fitting
Polynomial basis functions: $\phi_j(x) = x^j$
Gaussian basis functions: $\phi_j(x) = \text{exp}\{-\dfrac{(x-\mu_j)^2}{2s^2}\}$
Sigmoidal basis functions: $\phi_j(x) = \sigma(\dfrac{x-\mu_j}{s})$
where $\sigma(a) = \dfrac{1}{1+\textit{exp}(-a)}$.
Assume observations from a deterministic function with added Gaussian noise: $t = y(\textbf{x}, \textbf{w}) + \epsilon$ where $p(\epsilon|\beta) = \textit N(\epsilon|0, \beta^{-1})$ ($\beta ^{-1}$을 분산 값으로 사용하는 것은 계산의 편리성 때문)
which is the same as saying, $p(t|\textbf x, \textbf w, \beta) = \mathcal N(t|y(\textbf x, \textbf w), \beta^{-1})$
주어진 $x$에 대해 얻어질 결과 $t$에 대한 확률 분포 == 가우시안 분포
Given observed inputs, $\textbf X = \{\textbf x_1, ..., \textbf x_N\}$ , and targets, $\textbf t = [t_1,...,t_N]^\text T$, we obtain the likelihood function : 샘플 데이터를 얻는 확률
Taking the logarithm, we get
is the sum-of-squares error.
Computing the gradient and setting it to zero yields
$\textbf w_\text{ML}$을 Least Square Method(최소 제곱 문제)의 normal equation(정규 방정식)이라고 한다. $\Phi$는 $N \times M$ matrix로, design matrix(설계 행렬)라 불린다.
Consider $\textbf y = \Phi\textbf w_\text{ML} = [\varphi_1...,\varphi_M]\textbf w_\text{ML}$.
$S$ is spanned by $\varphi_1,...,\varphi_M$.
$\textbf w_\text{ML}$ minimizes the distance between $\textbf t$ and its orthogonal projection on $S$, i.e. $\textbf y$