知识回顾
-
问题: 部分观测条件下密度函数估计问题:
$$ p(\underbrace{\textcolor{blue}{\bm{x}}}_{观测数据}, \underbrace{\textcolor{red}{\bm{z}}}_{隐变量}; \theta). $$ -
EM 算法 (Expectation Maximization Algorithm):
-
E 步: 计算期望对数似然函数
$$ Q(\theta, \theta^{t-1}) := \sum_{i=1}^N \underset{\bm{z}_i \sim p(\bm{z}|\bm{x}_i; \theta^{t-1})}{\mathbb{E}} \left[ \log p(\bm{x}_i, \bm{z}_i; \theta) \right] $$ -
M 步: 最大化期望对数似然函数
$$ \theta^t = \underset{\theta}{\text{argmax}} \: Q(\theta, \theta^{t-1}) $$
-
变分自编码 (VAE, Variational Autoencoder)
变分自编码 (VAE, Variational Autoencoder)
(期望) 对数似然间的关系
$$ \begin{align*} Q(\theta, \hat{\theta}) &= \mathbb{E}_{\bm{z}} \left[\ell (\bm{x}, \bm{z}; \theta) \right] = \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log p(\bm{x}_i, \bm{z}; \theta) \mathrm{d} \bm{z} \\ &\textcolor{red}{\Leftrightarrow} \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \hat{\theta})}} \mathrm{d} \bm{z} \\ \end{align*} $$
(期望) 对数似然间的关系
$$ \begin{align*} Q(\theta, \hat{\theta}) &= \mathbb{E}_{\bm{z}} \left[\ell (\bm{x}, \bm{z}; \theta) \right] = \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log p(\bm{x}_i, \bm{z}; \theta) \mathrm{d} \bm{z} \\ &\textcolor{red}{\Leftrightarrow} \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \hat{\theta})}} \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{z}| \bm{x}_i; \theta) p(\bm{x}_i; \theta)}{p(\bm{z}|\bm{x}_i; \hat{\theta})} \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \left(\log p(\bm{x}_i; \theta) + \log \frac{p(\bm{z}| \bm{x}_i; \theta) }{p(\bm{z}|\bm{x}_i; \hat{\theta})} \right) \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \Big\{\log p(\bm{x}_i; \theta) - \text{KL}(p(\bm{z}|\bm{x}_i; \hat{\theta}) \| p(\bm{z}|\bm{x}_i; \theta)) \Big\} \end{align*} $$
从 EM 算法到 VAE
$$ \begin{align*} Q(\theta, \hat{\theta}) &\Leftrightarrow \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{p(\bm{z}|\bm{x}_i; \hat{\theta})} \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \Big\{\underbrace{\log p(\bm{x}_i; \theta)}_{\text{边际似然}} - \underbrace{\text{KL}(p(\bm{z}|\bm{x}_i; \hat{\theta}) \| p(\bm{z}|\bm{x}_i; \theta))}_{\text{GAP, } \ge 0} \Big\} \end{align*} $$从 EM 算法到 VAE
$$ \begin{align*} Q(\theta, \hat{\theta}) &\Leftrightarrow \sum_{i=1}^N \int_{\bm{z}} p(\bm{z}|\bm{x}_i; \hat{\theta}) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{p(\bm{z}|\bm{x}_i; \hat{\theta})} \mathrm{d} \bm{z} \\ &= \sum_{i=1}^N \Big\{\underbrace{\log p(\bm{x}_i; \theta)}_{\text{边际似然}} - \underbrace{\text{KL}(p(\bm{z}|\bm{x}_i; \hat{\theta}) \| p(\bm{z}|\bm{x}_i; \theta))}_{\text{GAP, } \ge 0} \Big\} \end{align*} $$ELBO (Evidence Lower Bound)
$$ \theta^t = \mathop{\text{argmax}} \limits_{\theta} \sum_{i=1}^N \int_{\bm{z}} \textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})} \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})}} \mathrm{d} \bm{z} $$ELBO (Evidence Lower Bound)
$$ \theta^t = \mathop{\text{argmax}} \limits_{\theta} \sum_{i=1}^N \int_{\bm{z}} \textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})} \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})}} \mathrm{d} \bm{z} $$ELBO (Evidence Lower Bound)
$$ \theta^t = \mathop{\text{argmax}} \limits_{\theta} \sum_{i=1}^N \int_{\bm{z}} \textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})} \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{red}{p(\bm{z}|\bm{x}_i; \theta^{t-1})}} \mathrm{d} \bm{z} $$$$ \text{ELBO:} \quad \sum_{i=1}^N \int_{\bm{z}} \textcolor{blue}{q(\bm{z}|\bm{x}_i; \phi)} \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{\textcolor{blue}{q(\bm{z}|\bm{x}_i; \phi)}} \mathrm{d} \bm{z} $$
变分自编码 (VAE)
从 VAE 到 EM 算法
$$ \max_{\phi, \theta} \quad \sum_{i=1}^N \int_{\bm{z}} q(\bm{z}|\bm{x}_i; \phi) \log \frac{p(\bm{x}_i, \bm{z}; \theta)}{q(\bm{z}|\bm{x}_i; \phi)} \mathrm{d} \bm{z} $$-
E 步: 保持 $\theta$ 固定, 关于 $\phi$
$$ \max_{\textcolor{blue}{\phi}} \quad -\sum_{i=1}^N \text{KL} (q(\bm{z}|\bm{x}_i; \textcolor{blue}{\phi}) \| p(\bm{z}|\bm{x}_i; \theta)) $$ -
M 步: 保持 $\phi$ 固定, 关于 $\theta$
$$ \max_{\textcolor{blue}{\theta}} \quad \sum_{i=1}^N \mathbb{E}_q [\log p(\bm{x}_i| \bm{z}; \textcolor{blue}{\theta})] $$
知识点总结
-
期望对数似然 $\Leftrightarrow$ ELBO $\Leftrightarrow$ 对数似然下界
-
从 EM 算法到 VAE
| EM | VAE | |
|---|---|---|
| 优化目标 | ELBO | ELBO |
| 近似后验 | $p(\bm{z}\vert \bm{x}; \theta^{t-1})$ | $q(\bm{z}\vert \bm{x}; \phi)$ |
| 收敛性 | ✅ | |
| 灵活性 | ❎ | ✅ |
课后扩展
-
了解 KL 散度距离的非对称性以及 JS 散度
-
通过 Jensen’s inequality 直接证明
$$ \text{ELBO} \le \sum_{i=1}^N \log p(\bm{x}_i; \theta) $$ -
仿照 EM $\rightarrow$ ELBO 的方法反推 VAE $\rightarrow$ EM