Chapter 0 贝叶斯基础
1.The Inverse of Partitioned Matrix
\[ \left ( \begin{matrix} A & B \\ C & D \end{matrix} \right ) ^{-1} = \left ( \begin{matrix} M & -MBD^{-1} \\ -D^{-1}CM & D^{-1} +D^{-1}CMBD^{-1} \end{matrix} \right ) \]
\[ where\ M = (A - BD^{-1}C)^{-1} \]
2. Conditional Gaussian Distribution
column vectors \(x, y\) of whatever dimensions, are both gaussian distributions, and the joint distribution is also gaussian. Assume \(x , y\) is zero-mean, then the probability \[ P \left( \begin{matrix} x \\ y \end{matrix} \right) \sim exp\left\{ -\frac{1}{2} (x,y) \left( \begin{matrix} \Sigma_{xx} & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_{yy} \end{matrix} \right) ^{-1} \left( \begin{matrix} x \\ y \end{matrix} \right) \right\} \] For convenience I just replace the inverse of convariance matrix \(\Sigma\) with the precision matrix \(\Lambda\), denote as followings \[ \left( \begin{matrix} \Lambda_{xx} & \Lambda_{xy} \\ \Lambda_{yx} & \Lambda_{yy} \end{matrix} \right) = \left( \begin{matrix} \Sigma_{xx} & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_{yy} \end{matrix} \right) ^{-1} \] if \(y\) is determined, then \(x\) obeys a new conditional distribution, which is still gaussian, where y is no long a random variable but a determined value \[ P(x|y) \sim exp \left\{ -\frac{1}{2}( x^T\Lambda_{xx}x + 2x^T\Lambda_{xy}y + y^T\Lambda_{yy}y ) \right\} \] Compare with the cannonical form of gaussian distribution, we can find the mean vector and covariance matrix, according to the former part. \[ \mu_{x|y} = -\Lambda_{xx}^{-1}\Lambda_{xy}y = \Sigma_{xy}\Sigma_{yy}^{-1}y \\ \Sigma x|y = \Lambda_{xx}^{-1} = \Sigma_{xx} - \Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx} \] Considering the common situations, \(x, y\) with mean value \(\mu_x, \mu_y\) respectively, the outcome modified as followings \[ \mu_{x|y} = \mu_x + \Sigma_{xy}\Sigma_{yy}^{-1}(y - \mu_y) \\ \Sigma x|y = \Sigma_{xx} - \Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx} \]
Something related to Kalman Filter
Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than those based on a single measurement alone, by using Bayesian inference and estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, one of the primary developers of its theory. From Wikipedia
What I want to say is that we can use the method in part 2 to deduce the formula of the Kalman Filter. I’ll completer this part if possible
3.Bayes’ Theorem for Gaussian variables
Assume two gaussian variables(vectors) \(x, y\) s.t. \[ p(x) = \mathcal{N}(x|\mu, \Lambda^{-1}) \\ p(y|x) = \mathcal{N}(y|Ax+b, L^{-1}) \] According the former part, we get the precision and covariance matrix of \(x, y\) \[ R[x,y] = \left( \begin{matrix} \Lambda + A^TLA & -A^TL \\ -LA & L \end{matrix} \right) \\ cov[x,y] = \left( \begin{matrix} \Lambda^{-1} & \Lambda^{-1}A^T \\ A\Lambda^{-1} & L^{-1} + A\Lambda^{-1}A^T \end{matrix} \right) \] And the conditional distribution \(p(x|y)\) has mean and covariance given by \[ E[x|y] = (\Lambda + A^TLA)^{-1}\{A^TL(y-b) + \Lambda\mu\} \\ Cov[x|y] = (\Lambda + A^TLA)^{-1} \]
4.Conclusion
\[ \begin{align} p(x) &= \mathcal{N}(x | \mu, \Lambda^{-1}) &(1)\\ p(y|x) &= \mathcal{N}(y | Ax + b, L^{-1}) &(2) \\ p(y) &= \mathcal{N}(y |A\mu + b, L^{-1} + A\Lambda^{-1}A^T) &(3)\\ p(x|y) &= \mathcal{N}(x | \Sigma\{A^TL(y - b) + \Lambda\mu\}, \Sigma) &(4)\\ where\quad \Sigma &= (\Lambda+ A^TLA)^{-1} \end{align} \]
\[ p(x) = N(xxx)\\\\ xsxs = sxsx \]
These are foundation of PRML. Many chapters uses them to reconsider many methods in the perspective of Bayes
Generally, x is the goal, for example, the parameters of the model, whose a prior distribution is gaussian, and y is the data, then you get \((1), (2)\) to deduce \((4)\), which is the a posterior distribution of x, and use it as a prior distribution in the next iteration.