scipy.stats.

anderson#

scipy.stats.anderson(x, dist='norm')[source]#

Anderson-Darling 检验，用于检验数据是否来自特定分布。

Anderson-Darling 检验用于检验样本是否来自服从特定分布的总体这一零假设。对于 Anderson-Darling 检验，临界值取决于要检验的分布。此函数适用于正态分布、指数分布、逻辑分布、weibull_min 分布或 Gumbel（极值类型 I）分布。

参数:

xarray_like: 样本数据数组。
dist{‘norm’, ‘expon’, ‘logistic’, ‘gumbel’, ‘gumbel_l’, ‘gumbel_r’, ‘extreme1’, ‘weibull_min’}, optional: 要检验的分布类型。默认为 ‘norm’。名称 ‘extreme1’、‘gumbel_l’ 和 ‘gumbel’ 是同一分布的同义词。

返回:

resultAndersonResult

具有以下属性的对象

statisticfloat: Anderson-Darling 检验统计量。
critical_valueslist: 此分布的临界值。
significance_levellist: 相应临界值的显著性水平（百分比）。该函数返回一组不同的显著性水平的临界值，具体取决于要检验的分布。
fit_resultFitResult: 一个包含将分布拟合到数据的结果的对象。

另请参见

kstest: Kolmogorov-Smirnov 拟合优度检验。

注释

提供的临界值适用于以下显著性水平

normal/exponential: 15%, 10%, 5%, 2.5%, 1%
logistic: 25%, 10%, 5%, 2.5%, 1%, 0.5%
gumbel_l / gumbel_r: 25%, 10%, 5%, 2.5%, 1%
weibull_min: 50%, 25%, 15%, 10%, 5%, 2.5%, 1%, 0.5%

如果返回的统计量大于这些临界值，则对于相应的显著性水平，可以拒绝数据来自所选分布的零假设。返回的统计量在参考文献中称为“A2”。

对于 weibull_min，已知最大似然估计具有挑战性。如果测试成功返回，则最大似然估计的一阶条件已得到验证，并且临界值与显著性水平对应得相对较好，前提是样本足够大（>10 个观测值 [7]）。但是，对于某些数据 - 尤其是没有左尾的数据 - anderson 可能会导致错误消息。在这种情况下，请考虑使用 scipy.stats.monte_carlo_test 执行自定义拟合优度检验。

参考文献

[1]

https://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm

[2]

Stephens, M. A. (1974). EDF Statistics for Goodness of Fit and Some Comparisons, Journal of the American Statistical Association, Vol. 69, pp. 730-737.

[3]

Stephens, M. A. (1976). Asymptotic Results for Goodness-of-Fit Statistics with Unknown Parameters, Annals of Statistics, Vol. 4, pp. 357-369.

[4]

Stephens, M. A. (1977). Goodness of Fit for the Extreme Value Distribution, Biometrika, Vol. 64, pp. 583-588.

[5]

Stephens, M. A. (1977). Goodness of Fit with Special Reference to Tests for Exponentiality , Technical Report No. 262, Department of Statistics, Stanford University, Stanford, CA.

[6]

Stephens, M. A. (1979). Tests of Fit for the Logistic Distribution Based on the Empirical Distribution Function, Biometrika, Vol. 66, pp. 591-595.

[7]

Richard A. Lockhart and Michael A. Stephens “Estimation and Tests of Fit for the Three-Parameter Weibull Distribution” Journal of the Royal Statistical Society.Series B(Methodological) Vol. 56, No. 3 (1994), pp. 491-500, Table 0.

示例

检验随机样本是否来自正态分布（具有未指定的均值和标准差）的零假设。

>>> import numpy as np
>>> from scipy.stats import anderson
>>> rng = np.random.default_rng()
>>> data = rng.random(size=35)
>>> res = anderson(data)
>>> res.statistic
0.8398018749744764
>>> res.critical_values
array([0.527, 0.6  , 0.719, 0.839, 0.998])
>>> res.significance_level
array([15. , 10. ,  5. ,  2.5,  1. ])

统计量的值（勉强）超过与 2.5% 显著性水平相关的临界值，因此可以在 2.5% 的显著性水平下拒绝零假设，但不能在 1% 的显著性水平下拒绝。