utils.knn_graph¶
-
utils.
knn_graph
(X, k, method='brute_force', leaf_size=30)¶ Compute the symmetric k-nearest neighbor graph for a set of points. Assume a Euclidean distance metric.
Parameters: X : numpy array | list [numpy arrays]
Data points, with each row as an observation.
k : int
The number of points to consider as neighbors of any given observation.
method : {‘brute-force’, ‘kd-tree’, ‘ball-tree’}, optional
Computing method.
- ‘brute-force’: computes the (Euclidean) distance between all O(n^2) pairs of rows in ‘X’, then for every point finds the k-nearest. It is limited to tens of thousands of observations (depending on available RAM).
- ‘kd-tree’: partitions the data into axis-aligned rectangles to avoid computing all O(n^2) pairwise distances. Much faster than ‘brute-force’, but only works for data in fewer than about 20 dimensions. Requires the scikit-learn library.
- ‘ball-tree’: partitions the data into balls and uses the metric property of euclidean distance to avoid computing all O(n^2) distances. Typically much faster than ‘brute-force’, and works with up to a few hundred dimensions. Requires the scikit-learn library.
leaf_size : int, optional
For the ‘kd-tree’ and ‘ball-tree’ methods, the number of observations in the leaf nodes. Leaves are not split further, so distance computations within leaf nodes are done by brute force. ‘leaf_size’ is ignored for the ‘brute-force’ method.
Returns: neighbors : numpy array
Each row contains the nearest neighbors of the corresponding row in ‘X’, indicated by row indices.
radii : list[float]
For each row of ‘X’ the distance to its k’th nearest neighbor (including itself).
See also
Examples
>>> X = numpy.random.rand(100, 2) >>> knn, radii = debacl.utils.knn_graph(X, k=8, method='kd-tree')