✍️
mathematics_basic_for_ML
  • README
  • README
    • Summary
    • Geometry
      • EulerAngle
      • Gimbal lock
      • Quaternion
      • RiemannianManifolds
      • RotationMatrix
      • SphericalHarmonics
    • Information
      • Divergence
      • 信息熵 entropy
    • LinearAlgebra
      • 2D仿射变换(2D Affine Transformation)
      • 2DTransformation
      • 3D变换(3D Transformation)
      • ComplexTransformation
      • Conjugate
      • Hessian
      • IllConditioning
      • 逆变换(Inverse transform)
      • SVD
      • det
      • eigendecomposition
      • 矩阵
      • norm
      • orthogonal
      • special_matrix
      • trace
      • vector
    • Mathematics
      • Complex
      • ExponentialDecay
      • average
      • calculus
      • convex
      • derivative
      • 距离
      • function
      • space
      • Formula
        • euler
        • jensen
        • taylor
        • trigonometric
    • Numbers
      • 几何级数
      • SpecialNumbers
    • NumericalComputation
      • ConstrainedOptimization
      • GradientDescent
      • Newton
      • Nominal
      • ODE_SDE
      • Preprocessing
    • Probability
      • bayes
      • distribution
      • expectation_variance
      • 贝叶斯公式
      • functions
      • likelihood
      • mixture_distribution
      • 一些术语
      • probability_distribution
Powered by GitBook
On this page
  • f-divergence
  • KL散度 KL divergence
  • 交叉熵 cross-entropy
  • JS散度 Jensen-Shannon

Was this helpful?

  1. README
  2. Information

Divergence

PreviousInformationNext信息熵 entropy

Last updated 2 years ago

Was this helpful?

f-divergence

Df(P∣∣Q)=∫xq(x)f(p(x)q(x))dxD_f(P||Q) = \int_x q(x)f\left(\frac{p(x)}{q(x)}\right)dxDf​(P∣∣Q)=∫x​q(x)f(q(x)p(x)​)dx

且f(x) is convex && f(1) = 0

$D_f(P||Q)$用于评估分布P和分布Q的差异。 f(1) = 0 ==> 当p(x)=q(x)时,Df = 0 f is convex ==> Df >= f(1) = 0

令$f(x) = x\log x$,则Df是KL divergence

Df=∫xp(x)log⁡(p(x)q(x))D_f = \int_x p(x)\log\left(\frac{p(x)}{q(x)}\right)Df​=∫x​p(x)log(q(x)p(x)​)

令$f(x) = -\log x$,则Df是reverse KL divergence

Df=∫xq(x)log⁡(q(x)p(x))D_f = \int_x q(x)\log\left(\frac{q(x)}{p(x)}\right)Df​=∫x​q(x)log(p(x)q(x)​)

令$f(x) = (x-1)^2$,则Df是Chi Square Divergence

KL散度 KL divergence

有两个独立的分布P(x)和Q(x),KL散度用于衡量这两个分布的差异。

交叉熵 cross-entropy

最小化交叉熵等价于最小化KL散度。

JS散度 Jensen-Shannon

http://blog.sina.com.cn/s/blog_18bdda0da0102xzpw.html

Df=∫x(p(x)−q(x))2q(x)dxD_f = \int_x\frac{(p(x)-q(x))^2}{q(x)}dxDf​=∫x​q(x)(p(x)−q(x))2​dx
DKL(P∣∣Q)=EX∼P[log⁡P(x)Q(x)]=EX∼P[log⁡P(x)−log⁡Q(x)]D_{KL}(P||Q) = E_{X\sim P}\left[\log \frac{P(x)}{Q(x)}\right] = E_{X\sim P}[\log P(x) - \log Q(x)]DKL​(P∣∣Q)=EX∼P​[logQ(x)P(x)​]=EX∼P​[logP(x)−logQ(x)]
H(P,Q)=−EX∼Plog⁡Q(x)H(P, Q) = -E_{X\sim P} \log Q(x)H(P,Q)=−EX∼P​logQ(x)