f-divergence
Df(P∣∣Q)=∫xq(x)f(q(x)p(x))dx 且f(x) is convex && f(1) = 0
$D_f(P||Q)$用于评估分布P和分布Q的差异。
f(1) = 0 ==> 当p(x)=q(x)时,Df = 0
f is convex ==> Df >= f(1) = 0
令$f(x) = x\log x$,则Df是KL divergence
Df=∫xp(x)log(q(x)p(x)) 令$f(x) = -\log x$,则Df是reverse KL divergence
Df=∫xq(x)log(p(x)q(x)) 令$f(x) = (x-1)^2$,则Df是Chi Square Divergence
Df=∫xq(x)(p(x)−q(x))2dx KL散度 KL divergence
有两个独立的分布P(x)和Q(x),KL散度用于衡量这两个分布的差异。
DKL(P∣∣Q)=EX∼P[logQ(x)P(x)]=EX∼P[logP(x)−logQ(x)] 交叉熵 cross-entropy
H(P,Q)=−EX∼PlogQ(x) 最小化交叉熵等价于最小化KL散度。
JS散度 Jensen-Shannon
http://blog.sina.com.cn/s/blog_18bdda0da0102xzpw.html