PRML读书笔记(一)

Chapter 0 贝叶斯基础

1.The Inverse of Partitioned Matrix

\[ \left ( \begin{matrix} A & B \\ C & D \end{matrix} \right ) ^{-1} = \left ( \begin{matrix} M & -MBD^{-1} \\ -D^{-1}CM & D^{-1} +D^{-1}CMBD^{-1} \end{matrix} \right ) \]

\[ where\ M = (A - BD^{-1}C)^{-1} \]

2. Conditional Gaussian Distribution

column vectors \(x, y\) of whatever dimensions, are both gaussian distributions, and the joint distribution is also gaussian. Assume \(x , y\) is zero-mean, then the probability \[ P \left( \begin{matrix} x \\ y \end{matrix} \right) \sim exp\left\{ -\frac{1}{2} (x,y) \left( \begin{matrix} \Sigma_{xx} & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_{yy} \end{matrix} \right) ^{-1} \left( \begin{matrix} x \\ y \end{matrix} \right) \right\} \] For convenience I just replace the inverse of convariance matrix \(\Sigma\) with the precision matrix \(\Lambda\), denote as followings \[ \left( \begin{matrix} \Lambda_{xx} & \Lambda_{xy} \\ \Lambda_{yx} & \Lambda_{yy} \end{matrix} \right) = \left( \begin{matrix} \Sigma_{xx} & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_{yy} \end{matrix} \right) ^{-1} \] if \(y\) is determined, then \(x\) obeys a new conditional distribution, which is still gaussian, where y is no long a random variable but a determined value \[ P(x|y) \sim exp \left\{ -\frac{1}{2}( x^T\Lambda_{xx}x + 2x^T\Lambda_{xy}y + y^T\Lambda_{yy}y ) \right\} \] Compare with the cannonical form of gaussian distribution, we can find the mean vector and covariance matrix, according to the former part. \[ \mu_{x|y} = -\Lambda_{xx}^{-1}\Lambda_{xy}y = \Sigma_{xy}\Sigma_{yy}^{-1}y \\ \Sigma x|y = \Lambda_{xx}^{-1} = \Sigma_{xx} - \Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx} \] Considering the common situations, \(x, y\) with mean value \(\mu_x, \mu_y\) respectively, the outcome modified as followings \[ \mu_{x|y} = \mu_x + \Sigma_{xy}\Sigma_{yy}^{-1}(y - \mu_y) \\ \Sigma x|y = \Sigma_{xx} - \Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx} \] ConditionalGaussian

Kalman Filter
Kalman Filter

Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than those based on a single measurement alone, by using Bayesian inference and estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, one of the primary developers of its theory. From Wikipedia

What I want to say is that we can use the method in part 2 to deduce the formula of the Kalman Filter. I’ll completer this part if possible

3.Bayes’ Theorem for Gaussian variables

Assume two gaussian variables(vectors) \(x, y\) s.t. \[ p(x) = \mathcal{N}(x|\mu, \Lambda^{-1}) \\ p(y|x) = \mathcal{N}(y|Ax+b, L^{-1}) \] According the former part, we get the precision and covariance matrix of \(x, y\) \[ R[x,y] = \left( \begin{matrix} \Lambda + A^TLA & -A^TL \\ -LA & L \end{matrix} \right) \\ cov[x,y] = \left( \begin{matrix} \Lambda^{-1} & \Lambda^{-1}A^T \\ A\Lambda^{-1} & L^{-1} + A\Lambda^{-1}A^T \end{matrix} \right) \] And the conditional distribution \(p(x|y)\) has mean and covariance given by \[ E[x|y] = (\Lambda + A^TLA)^{-1}\{A^TL(y-b) + \Lambda\mu\} \\ Cov[x|y] = (\Lambda + A^TLA)^{-1} \]

4.Conclusion

\[ \begin{align} p(x) &= \mathcal{N}(x | \mu, \Lambda^{-1}) &(1)\\ p(y|x) &= \mathcal{N}(y | Ax + b, L^{-1}) &(2) \\ p(y) &= \mathcal{N}(y |A\mu + b, L^{-1} + A\Lambda^{-1}A^T) &(3)\\ p(x|y) &= \mathcal{N}(x | \Sigma\{A^TL(y - b) + \Lambda\mu\}, \Sigma) &(4)\\ where\quad \Sigma &= (\Lambda+ A^TLA)^{-1} \end{align} \]

\[ p(x) = N(xxx)\\\\ xsxs = sxsx \]

These are foundation of PRML. Many chapters uses them to reconsider many methods in the perspective of Bayes

Generally, x is the goal, for example, the parameters of the model, whose a prior distribution is gaussian, and y is the data, then you get \((1), (2)\) to deduce \((4)\), which is the a posterior distribution of x, and use it as a prior distribution in the next iteration.

Contents
  1. 1. Chapter 0 贝叶斯基础
    1. 1.1. 1.The Inverse of Partitioned Matrix
    2. 1.2. 2. Conditional Gaussian Distribution
      1. 1.2.1. Something related to Kalman Filter
    3. 1.3. 3.Bayes’ Theorem for Gaussian variables
    4. 1.4. 4.Conclusion
|