scipy.stats.

energy_distance#

scipy.stats.energy_distance(u_values, v_values, u_weights=None, v_weights=None)[source]#

计算两个 1D 分布之间的能量距离。

1.0.0 版本新增。

参数:

u_values, v_values类数组: 在（经验）分布中观察到的值。
u_weights, v_weights类数组, 可选: 每个值的权重。如果未指定，则为每个值分配相同的权重。u_weights（或 v_weights）的长度必须与 u_values（或 v_values）相同。如果权重总和与 1 不同，它仍然必须是正的且有限的，以便可以将权重归一化为 1。

返回:

distance浮点数: 分布之间计算的距离。

注释

两个分布 \(u\) 和 \(v\) 之间的能量距离，其各自的 CDF 为 \(U\) 和 \(V\)，等于

\[D(u, v) = \left( 2\mathbb E|X - Y| - \mathbb E|X - X'| - \mathbb E|Y - Y'| \right)^{1/2}\]

其中 \(X\) 和 \(X'\)（或 \(Y\) 和 \(Y'\)）是独立随机变量，其概率分布为 \(u\)（或 \(v\)）。

有时，此数量的平方被称为“能量距离”（例如，在 [2]、[4] 中），但正如 [1] 和 [3] 中指出的那样，只有上面的定义才满足距离函数（度量）的公理。

如 [2] 所示，对于一维实值变量，能量距离与 Cramér-von Mises 距离的非分布自由版本相关

\[D(u, v) = \sqrt{2} l_2(u, v) = \left( 2 \int_{-\infty}^{+\infty} (U-V)^2 \right)^{1/2}\]

请注意，常见的 Cramér-von Mises 准则使用距离的分布自由版本。有关距离的两个版本的更多详细信息，请参见 [2]（第 2 节）。

输入分布可以是经验性的，因此来自样本，样本的值实际上是函数的输入，或者它们可以被视为广义函数，在这种情况下，它们是位于指定值的 Dirac delta 函数的加权和。

参考文献

[1]

Rizzo, Szekely “Energy distance.” Wiley Interdisciplinary Reviews: Computational Statistics, 8(1):27-38 (2015).

[2] (1,2,3)

Szekely “E-statistics: The energy of statistical samples.” Bowling Green State University, Department of Mathematics and Statistics, Technical Report 02-16 (2002).

[3]

“Energy distance”, https://en.wikipedia.org/wiki/Energy_distance

[4]

Bellemare, Danihelka, Dabney, Mohamed, Lakshminarayanan, Hoyer, Munos “The Cramer Distance as a Solution to Biased Wasserstein Gradients” (2017). arXiv:1705.10743.

示例

>>> from scipy.stats import energy_distance
>>> energy_distance([0], [2])
2.0000000000000004
>>> energy_distance([0, 8], [0, 8], [3, 1], [2, 2])
1.0000000000000002
>>> energy_distance([0.7, 7.4, 2.4, 6.8], [1.4, 8. ],
...                 [2.1, 4.2, 7.4, 8. ], [7.6, 8.8])
0.88003340976158217