## 无监督学习: 变分自编码

知识回顾

  • 问题: 部分观测条件下密度函数估计问题:

    $$ p(\underbrace{\textcolor{blue}{\bm{x}}}_{观测数据}, \underbrace{\textcolor{red}{\bm{z}}}_{隐变量}; \theta). $$
  • EM 算法 (Expectation Maximization Algorithm):

    • E 步: 计算期望对数似然函数

      $$ Q(\theta, \theta^{t-1}) := \sum_{i=1}^N \underset{\bm{z}_i \sim p(\bm{z}|\bm{x}_i; \theta^{t-1})}{\mathbb{E}} \left[ \log p(\bm{x}_i, \bm{z}_i; \theta) \right] $$
    • M 步: 最大化期望对数似然函数

      $$ \theta^t = \underset{\theta}{\text{argmax}} \: Q(\theta, \theta^{t-1}) $$

变分自编码 (VAE, Variational Autoencoder)

变分自编码 (VAE, Variational Autoencoder)

与 EM、VAE 核心思想一脉相承

(期望) 对数似然间的关系

$$ \begin{align*} Q(\theta, \hat{\theta}) &= \mathbb{E}_{\bm{z}} \left[\ell (\bm{x}, \bm{z}; \theta) \right] = \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log p(\bm{x}_i, \bm{z}; \theta) \mathrm{d} \bm{z} \\ &\textcolor{red}{\Leftrightarrow} \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \hat{\theta})}} \mathrm{d} \bm{z} \\ \end{align*} $$

(期望) 对数似然间的关系

$$ \begin{align*} Q(\theta, \hat{\theta}) &= \mathbb{E}_{\bm{z}} \left[\ell (\bm{x}, \bm{z}; \theta) \right] = \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log p(\bm{x}_i, \bm{z}; \theta) \mathrm{d} \bm{z} \\ &\textcolor{red}{\Leftrightarrow} \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \hat{\theta})}} \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{z}| \bm{x}_i; \theta) p(\bm{x}_i; \theta)}{p(\bm{z}|\bm{x}_i; \hat{\theta})} \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \left(\log p(\bm{x}_i; \theta) + \log \frac{p(\bm{z}| \bm{x}_i; \theta) }{p(\bm{z}|\bm{x}_i; \hat{\theta})} \right) \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \Big\{\log p(\bm{x}_i; \theta) - \text{KL}(p(\bm{z}|\bm{x}_i; \hat{\theta}) \| p(\bm{z}|\bm{x}_i; \theta)) \Big\} \end{align*} $$

Kullback-Leibler Divergence: $\text{KL}(p\|q) = \int_z p(z) \log \frac{p(z)}{q(z)} \mathrm{d} z$

从 EM 算法到 VAE

$$ \begin{align*} Q(\theta, \hat{\theta}) &\Leftrightarrow \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{p(\bm{z}|\bm{x}_i; \hat{\theta})} \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \Big\{\underbrace{\log p(\bm{x}_i; \theta)}_{\text{边际似然}} - \underbrace{\text{KL}(p(\bm{z}|\bm{x}_i; \hat{\theta}) \| p(\bm{z}|\bm{x}_i; \theta))}_{\text{GAP, } \ge 0} \Big\} \end{align*} $$

从 EM 算法到 VAE

$$ \begin{align*} Q(\theta, \hat{\theta}) &\Leftrightarrow \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{p(\bm{z}|\bm{x}_i; \hat{\theta})} \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \Big\{\underbrace{\log p(\bm{x}_i; \theta)}_{\text{边际似然}} - \underbrace{\text{KL}(p(\bm{z}|\bm{x}_i; \hat{\theta}) \| p(\bm{z}|\bm{x}_i; \theta))}_{\text{GAP, } \ge 0} \Big\} \end{align*} $$

$\textcircled{\small 1}$ 最大化 $Q$ $\Leftrightarrow$ 最大化似然下界

$\textcircled{\small 2}$ $\hat{\theta} = \theta$ 时完全等价最大似然

ELBO (Evidence Lower Bound)

$$ \theta^t = \mathop{\text{argmax}} \limits_{\theta} \sum_{i=1}^N \int_{\bm{z}} \textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})} \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})}} \mathrm{d} \bm{z} $$

ELBO (Evidence Lower Bound)

$$ \theta^t = \mathop{\text{argmax}} \limits_{\theta} \sum_{i=1}^N \int_{\bm{z}} \textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})} \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})}} \mathrm{d} \bm{z} $$

一般情况下 $p(\bm{z}|\bm{x}; \theta)$ 难以处理

ELBO (Evidence Lower Bound)

$$ \theta^t = \mathop{\text{argmax}} \limits_{\theta} \sum_{i=1}^N \int_{\bm{z}} \textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})} \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})}} \mathrm{d} \bm{z} $$

一般情况下 $p(\bm{z}|\bm{x}; \theta)$ 难以处理

 

$$ \text{ELBO:} \quad \sum_{i=1}^N \int_{\bm{z}} \textcolor{blue}{q(\bm{z}|\bm{x}_i; \phi)} \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{blue}{q(\bm{z}|\bm{x}_i; \phi)}} \mathrm{d} \bm{z} $$

变分自编码 (VAE)

从 VAE 到 EM 算法

$$ \max_{\phi, \theta} \quad \sum_{i=1}^N \int_{\bm{z}} q(\bm{z}|\bm{x}_i; \phi) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{q(\bm{z}|\bm{x}_i; \phi)} \mathrm{d} \bm{z} $$
  • E 步: 保持 $\theta$ 固定, 关于 $\phi$

    $$ \max_{\textcolor{blue}{\phi}} \quad -\sum_{i=1}^N \text{KL} (q(\bm{z}|\bm{x}_i; \textcolor{blue}{\phi}) \| p(\bm{z}|\bm{x}_i; \theta)) $$
  • M 步: 保持 $\phi$ 固定, 关于 $\theta$

    $$ \max_{\textcolor{blue}{\theta}} \quad \sum_{i=1}^N \mathbb{E}_q [\log p(\bm{x}_i| \bm{z}; \textcolor{blue}{\theta})] $$

知识点总结

  • 期望对数似然 $\Leftrightarrow$ ELBO $\Leftrightarrow$ 对数似然下界

  • 从 EM 算法到 VAE

EM VAE
优化目标 ELBO ELBO
近似后验 $p(\bm{z}\vert \bm{x}; \theta^{t-1})$ $q(\bm{z}\vert \bm{x}; \phi)$
收敛性
灵活性

课后扩展

  • 了解 KL 散度距离的非对称性以及 JS 散度

  • 通过 Jensen’s inequality 直接证明

    $$ \text{ELBO} \le \sum_{i=1}^N \log p(\bm{x}_i; \theta) $$
  • 仿照 EM $\rightarrow$ ELBO 的方法反推 VAE $\rightarrow$ EM

Thanks!