拟牛顿法
记\(p_k = x_{k+1} - x_k, q_k = \nabla f(x_{k+1}) - \nabla f(x_k)\)则 \[q_k \approx \nabla^2f(x_k)p_k\] 令\(H_k = \nabla^2f(x_k)_{-1}\),则 \[H_kq_k \approx p_k\] 修正 \[(H_k+D_k)q_k = p_k\] 记\(H_{k+1} = H_k+D_k\) ### 秩1修正 \(D_k = \alpha_kz_kz_k^T\) 计算过程 \[z_k = \frac{p_k-H_kq_k}{\alpha_kz_k^Tq_k}\] \[q_k^T(p_k-H_kq_k) = \alpha_k(z_k^Tq_k)^2\] 代入得 \[D_k = \frac{(p_k-H_kq_k)(p_k-H_kq_k)^T}{q_k^T(p_k-H_kq_k)}\] \[H_{k+1} = H_k + \frac{(p_k-H_kq_k)(p_k-H_kq_k)^T}{q_k^T(p_k-H_kq_k)}\] 但是秩1修正不能保证正定,有一定缺陷 ### DFP \(D_k = mvv^T + nww^T\) \[v(mv^Tq_k) + w(nw^Tq_k) = p_k - H_kq_k\] 则\(v = p_k, w = H_kq_k, mv^Tq_k = 1, nw^Tq_k = -1\) 得\[H_{k+1} = H_k + \frac{p_kp_k^T}{p_k^Tq_k}-\frac{H_kq_kq_k^TH_k^T}{q_k^TH_kq_k}\] #### 正定性 归纳法证明\(H_k正定\) \[H_k正定,令H_k = \sum_{i=1}^n\lambda_ie_ie_i^T\] \[弱化证明H_k - \frac{H_kq_kq_k^TH_k^T}{q_k^TH_kq_k} \ge 0\] \[x^T(H_k - \frac{H_kq_kq_k^TH_k^T}{q_k^TH_kq_k})x = \sum_{i=1}^n\lambda_i<e_i,x>^2-\frac{(\sum_{i=1}^n\lambda_i<e_i,x>)(\sum_{i=1}^n\lambda_i<e_i,q_k>)}{\sum_{i=1}^n\lambda_i<e_i,q_k>^2}\] 由柯西不等式可知上式成立
BFGS
$H_{k+1} = V^TH_kV + p_kp_k^T $ \(\rho = \frac{1}{q_k^Ts_k}, V = I-\rho q_kt_k^T\) 另一种方法\(B = H^{-1}, q_k = B_{k+1}p_k\) \[B_{k+1} = B_k + \frac{q_kq_k^T}{q_k^Tp_k}-\frac{B_kp_kp_k^TB_k^T}{p_k^TB_kp_k}\] 根据Sherman-Morrison定理,\([A^{-1}+\rho uv^T]^{-1} = A-\frac{\rho}{1+\rho v^TAu}Auv^TA\),则 \[H_{k+1} = H_k + (1+\frac{q^TH_kq_k}{p_k^Tq_k})\frac{p_kp_k^T}{p_k^Tq_k}-\frac{p_kq_k^TH_k+H_kq_k^Tp_k}{p_k^Tq^k}\]
Broden族
\[H_{k+1}^\phi = (1-\phi)H_{k+1}^{DFP}+\phi H_{k+1}^{BFGS}\] #### 问题 1 没有证明那个定理 2 没有代入计算过,有点复杂
L-BFGS
随着迭代,\(H_k\) 矩阵会变得越来越稠密,难以存储,L-BFGS只存储最新的k个更新向量来计算\(H_{k+1}\)
总结
优化方法中一阶的方法有 SGD,Adagrad, Adadelta, RMSPROP, ADAM. 牛顿法是二阶方法,BFGS是目前较好的拟牛顿法,L-BFGS在机器学习中有较大的应用。
参考资料
[1] 陈宝林 最优化理论与算法 [2] 网络参考文章 [3] Shanno D F. Conditioning of quasi-Newton methods for function minimization[J]. Mathematics of computation, 1970, 24(111): 647-656.MLA