utils.knn_density

utils.knn_density(k_radius, n, p, k)

Compute the kNN density estimate for a set of points.

Note that this density estimator is highly susceptible to the “curse of dimensionality”. If the dimension p of the data is larger than about 340, the density estimates will all be infinite or undefined. If your ultimate goal is to estimate the level set tree, consider setting the dimension here to p = 1. DeBaCl requires only the relative order of the data points for accurate estimation of the shape of the LST, not the density values themselves. Please see the notes for more details on the numerical problems that occur with high-dimensional data.

Parameters:

k_radius : 1-dimensional numpy array of floats

The distance to each points k’th nearest neighbor.

n : int

The number of points.

p : int

The dimension of the data.

k : int

The number of observations considered neighbors of each point.

Returns:

fhat : 1D numpy array of floats

Estimated density for the points corresponding to the entries of ‘k_radius’.

See also

knn_graph

Notes

The k-nearest neighbor density estimate is

\[\hat{f}(x) = \frac{k}{n \cdot v_p \cdot r_k^p(x)}\]
Where:
  • \(x\) is any point in the feature space,
  • \(n\) is the total number of observations,
  • \(p\) is the dimension of the feature space,
  • \(k\) is the number of neighbors to consider for \(x\),
  • \(v_p\) is the volume of a \(p\)-dimensional unit sphere, and
  • \(r_k(x)\) is the distance from \(x\) to its \(k\)‘th nearest neighbor.

As the dimension \(p\) grows past about 340 (on most systems), the volume of the unit ball \(v_p\) underflows to 0.0. At the same time, if the radius \(r_k(x)\) is large enough, r_k^p(x) overflows to infinity. The result is either an infinite or undefined density estimate.

For the purpose of level set tree estimation, a simple workaround is to set p to 1. Clearly this results in incorrect values of the density estimate, but this does not matter for estimating the shape of a level set tree; on the relative order of the density estimates matters.

References

  • Kpotufe, S. and Luxburg, U. Von (2011) Pruning Nearest Neighbor Cluster Trees. Proceedings of the 28th International Conference on Machine Learning, 105, pp. 225-232.

Examples

>>> X = numpy.random.rand(100, 2)
>>> knn, radii = debacl.utils.knn_graph(X, k=8, method='kd-tree')
>>> density = debacl.utils.knn_density(radii, n=100, p=2, k=8)