utils.knn_graph

utils.knn_graph(X, k, method='brute_force', leaf_size=30)

Compute the symmetric k-nearest neighbor graph for a set of points. Assume a Euclidean distance metric.

Parameters:

X : numpy array | list [numpy arrays]

Data points, with each row as an observation.

k : int

The number of points to consider as neighbors of any given observation.

method : {‘brute-force’, ‘kd-tree’, ‘ball-tree’}, optional

Computing method.

  • ‘brute-force’: computes the (Euclidean) distance between all O(n^2) pairs of rows in ‘X’, then for every point finds the k-nearest. It is limited to tens of thousands of observations (depending on available RAM).
  • ‘kd-tree’: partitions the data into axis-aligned rectangles to avoid computing all O(n^2) pairwise distances. Much faster than ‘brute-force’, but only works for data in fewer than about 20 dimensions. Requires the scikit-learn library.
  • ‘ball-tree’: partitions the data into balls and uses the metric property of euclidean distance to avoid computing all O(n^2) distances. Typically much faster than ‘brute-force’, and works with up to a few hundred dimensions. Requires the scikit-learn library.

leaf_size : int, optional

For the ‘kd-tree’ and ‘ball-tree’ methods, the number of observations in the leaf nodes. Leaves are not split further, so distance computations within leaf nodes are done by brute force. ‘leaf_size’ is ignored for the ‘brute-force’ method.

Returns:

neighbors : numpy array

Each row contains the nearest neighbors of the corresponding row in ‘X’, indicated by row indices.

radii : list[float]

For each row of ‘X’ the distance to its k’th nearest neighbor (including itself).

See also

epsilon_graph

Examples

>>> X = numpy.random.rand(100, 2)
>>> knn, radii = debacl.utils.knn_graph(X, k=8, method='kd-tree')