utils.knn_density¶
-
utils.
knn_density
(k_radius, n, p, k)¶ Compute the kNN density estimate for a set of points.
Note that this density estimator is highly susceptible to the “curse of dimensionality”. If the dimension p of the data is larger than about 340, the density estimates will all be infinite or undefined. If your ultimate goal is to estimate the level set tree, consider setting the dimension here to p = 1. DeBaCl requires only the relative order of the data points for accurate estimation of the shape of the LST, not the density values themselves. Please see the notes for more details on the numerical problems that occur with high-dimensional data.
Parameters: k_radius : 1-dimensional numpy array of floats
The distance to each points k’th nearest neighbor.
n : int
The number of points.
p : int
The dimension of the data.
k : int
The number of observations considered neighbors of each point.
Returns: fhat : 1D numpy array of floats
Estimated density for the points corresponding to the entries of ‘k_radius’.
See also
Notes
The k-nearest neighbor density estimate is
\[\hat{f}(x) = \frac{k}{n \cdot v_p \cdot r_k^p(x)}\]- Where:
- \(x\) is any point in the feature space,
- \(n\) is the total number of observations,
- \(p\) is the dimension of the feature space,
- \(k\) is the number of neighbors to consider for \(x\),
- \(v_p\) is the volume of a \(p\)-dimensional unit sphere, and
- \(r_k(x)\) is the distance from \(x\) to its \(k\)‘th nearest neighbor.
As the dimension \(p\) grows past about 340 (on most systems), the volume of the unit ball \(v_p\) underflows to 0.0. At the same time, if the radius \(r_k(x)\) is large enough, r_k^p(x) overflows to infinity. The result is either an infinite or undefined density estimate.
For the purpose of level set tree estimation, a simple workaround is to set p to 1. Clearly this results in incorrect values of the density estimate, but this does not matter for estimating the shape of a level set tree; on the relative order of the density estimates matters.
References
- Kpotufe, S. and Luxburg, U. Von (2011) Pruning Nearest Neighbor Cluster Trees. Proceedings of the 28th International Conference on Machine Learning, 105, pp. 225-232.
Examples
>>> X = numpy.random.rand(100, 2) >>> knn, radii = debacl.utils.knn_graph(X, k=8, method='kd-tree') >>> density = debacl.utils.knn_density(radii, n=100, p=2, k=8)