朴素贝叶斯

朴素贝叶斯(Naive Bayes) 如此蛤意盎然的算法, 居然一直没写

原理

  1. 训练集:
    $T = \lbrace (X^{(1)}, y_1), (X^{(2)}, y_2), \cdots, (X^{(N)}, y_N) \rbrace$, 其中
    $ X = (X_1, X_2, \cdots, X_n), y \in \lbrace c_1, c_2, \cdots, c_K \rbrace $

  2. 学习过程: 根据贝叶斯模型, 需要学习的参数有
    $$ P(y = c_k) = \frac{\sum\limits^{N}_{i=1}I(y_i = c_k)}{N} \tag{1}$$
    $$
    \begin{align}
    P(X_j = X^{(i)}_j | y = c_k) & = \frac{P(X_j = X^{(i)}_j, y = c_k)}{P(y = c_k)} \\
    & = \frac{\sum\limits^{N}_{i=1}I(X_j = X^{(i)}_j, y_i = c_k)} {\sum\limits^{N}_{i=1}I(y_i = c_k)}
    \end{align} \tag{2}
    $$
    此外,
    $$ P(X = X^{(i)} | y = c_k) = \prod\limits^{n}_{j=1} P(X_j = X^{(i)}_j | y = c_k) \tag{3}$$

    其中(3)式利用了朴素贝叶斯的条件独立性假设

  3. 预测过程:
    对于给定的实例$X = A$
    $$ \begin{align} P(y = c_k | X = A) &= \frac{P(X = A | y = c_k)P(y = c_k)}{P(X = A)} \\
    & = \frac{P(X = A | y = c_k)P(y = c_k)}{\sum\limits_{k} P(X = A | y = c_k)P(y = c_k)}
    \end{align}$$
    由于分母对任意$c_k$都一样, 因此贝叶斯分类结果是
    $$ y = arg \max\limits_{c_k} P(X = A | y = c_k)P(y = c_k) \tag{4}$$

贝叶斯估计

如果样本比较少, 出现$P(X_j = X^{(i)}_j | y = c_k) = 0$ 的情况, 那整个$P(X = X^{(i)} | y = c_k) = 0$, 这会影响后验概率的计算结果. 使用常数项并保持概率分布的特性可以解决这一问题

$$
P_{\lambda}(X_j = X^{(i)}_j | y = c_k) = \frac{\sum\limits^{N}_{i=1}I(X_j = X^{(i)}_j, y_i = c_k) + \lambda}{\sum\limits^{N}_{i=1}I(y_i = c_k) + S_j\lambda}
$$

$$ P(y = c_k) = \frac{\sum\limits^{N}_{i=1}I(y_i = c_k) + \lambda}{N + K\lambda} $$

其中$S_j$代表$X$的第$j$个特征$X_j$有多少种可能的取值. 一般$\lambda$取1, 称为拉普拉斯平滑