scipy.cluster.hierarchy.

fclusterdata#

scipy.cluster.hierarchy.fclusterdata(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)[source]#

使用给定的度量标准聚类观测数据。

使用欧几里德距离度量来计算原始观测之间的距离，在n×m数据矩阵X（m维中的n个观测）中对原始观测进行聚类，使用单连接算法执行分层聚类，并使用不一致方法以 t 作为截止阈值来形成平面聚类。

返回长度为 n 的 1-D 数组 T。T[i] 是原始观测 i 所属的平面聚类的索引。

参数:

X(N, M) ndarray

N x M 数据矩阵，其中 N 个观测位于 M 个维度中。

t标量

对于标准 ‘inconsistent’、‘distance’ 或 ‘monocrit’，: 这是形成平面聚类时应用的阈值。
对于 ‘maxclust’ 或 ‘maxclust_monocrit’ 标准，: 这将是请求的最大聚类数。

criterionstr，可选

指定形成平面聚类的标准。有效值为 ‘inconsistent’（默认值）、‘distance’ 或 ‘maxclust’ 聚类算法。有关说明，请参见 fcluster。

metricstr 或 function，可选

用于计算成对距离的距离度量。有关说明和链接，以验证与连接方法兼容性，请参见 distance.pdist。

depthint，可选

不一致性计算的最大深度。有关更多信息，请参见 inconsistent。

methodstr，可选

要使用的连接方法（single、complete、average、weighted、median centroid、ward）。有关更多信息，请参见 linkage。默认值为“single”。

Rndarray，可选

不一致性矩阵。如果未传递，将根据需要进行计算。

返回值:

fclusterdatandarray: 长度为 n 的向量。T[i] 是原始观测 i 所属的平面聚类编号。

另请参阅

scipy.spatial.distance.pdist: 成对距离度量

注释

此函数类似于 MATLAB 函数 clusterdata。

fclusterdata 除了 NumPy 之外，还对 Python Array API Standard 兼容后端具有实验性支持。请考虑通过设置环境变量 SCIPY_ARRAY_API=1 并提供 CuPy、PyTorch、JAX 或 Dask 数组作为数组参数来测试这些功能。支持以下后端和设备（或其他功能）的组合。

库	CPU	GPU
NumPy	✅	不适用
CuPy	不适用	⛔
PyTorch	✅	⛔
JAX	⚠️ 无 JIT	⛔
Dask	⚠️ 计算图	不适用

有关更多信息，请参见对数组 API 标准的支持。

示例

>>> from scipy.cluster.hierarchy import fclusterdata

这是一种方便的方法，它提取了在典型的 SciPy 分层聚类工作流程中执行的所有步骤。

使用 scipy.spatial.distance.pdist 将输入数据转换为压缩矩阵。
应用聚类方法。
使用 scipy.cluster.hierarchy.fcluster 在用户定义的距离阈值 t 处获得平面聚类。

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]

>>> fclusterdata(X, t=1)
array([3, 3, 3, 4, 4, 4, 2, 2, 2, 1, 1, 1], dtype=int32)

此处的输出（对于数据集 X、距离阈值 t 和默认设置）是四个聚类，每个聚类有三个数据点。