level_set_tree.LevelSetTree.get_clusters

LevelSetTree.get_clusters(method='leaf', fill_background=False, **kwargs)

Retrieve cluster labels from the level set tree. There are several ways to do this with a level set tree, and a particular strategy can be specified with the method parameter.

Parameters:

method : {‘leaf’, ‘first-k’, ‘upper-level-set’, ‘k-level’}, optional

Method for obtaining cluster labels from the tree.

  • ‘leaf’: treat each leaf of the tree as a separate cluster.
  • ‘first-k’: find the first K non-overlapping clusters from the roots of the tree.
  • ‘upper-level-set’: cluster by cutting the tree at a specified density or mass level.
  • ‘k-level’: returns clusters at the lowest density level that has k nodes.

fill_background : bool, optional

If True, a label of -1 is assigned to background points, i.e. those instances not otherwise assigned to a high-density cluster. If False, the background points are omitted from the output.

Returns:

labels : 2-dimensional numpy array

Each row corresponds to an observation. The first column indicates the index of the observation in the original data matrix, and the second column is the index of the LST node to which the observation belongs, with respect to the clustering. Note that the set of observations in this “foreground” set is typically smaller than the original dataset.

Other Parameters:
 

k : int

If method is ‘first-k’ or ‘k-level’, this is the desired number of clusters.

threshold : float

If method is ‘upper-level-set’, this is the threshold at which to cut the tree.

form : {‘density’, ‘mass’}

If method is ‘upper-level-set’, this is vertical scale which ‘threshold’ refers to.

See also

debacl.utils.reindex_cluster_labels, get_leaf_nodes

Examples

>>> X = numpy.random.rand(100, 2)
>>> tree = debacl.construct_tree(X, k=8, prune_threshold=5)
>>> labels = tree.get_clusters(method='leaf')