Multivariate Online Kernel Density Estimation
We propose a novel approach for online estimation of probability density functions. Our approach is based on the kernel density estimation (KDE) and produces models that enable online adaptation, which at the same time maintain a low (or bounded) complexity that scales sublinearly with the observed samples. As a result we get an Online Gaussian Mixture Model.
Key idea 1: maintain a non-parametric model of the data itself in a form of a sample distribution ps(x) and use this model to calculate the kernel density estimate pkde(x) of the target distribution when required.
Key idea 2: each new observation as a distribution in a form of a Dirac-delta function and we model the sample distribution by a mixture of Gaussian and Dirac-delta functions.
Key idea 3: each component carries with it its detailed model in a form of a two-component mixture. This is used during online operation to detect whether a component is oversmoothing the distribution. In that case, the component is replaced by its detailed model.
Update in a nutshell: (1) we update the sample model with the observed data-point. (2) the updated model is used to recalculate the optimal bandwidth for the KDE. (3) the sample distribution is refined and compressed.
Research Issues:
(1) how to calculate the bandwidth without having access to the previously observed individual samples?
(2) how to devise an optimization which would efficiently compress the sample model?
(3) how to determine the extent of the allowed compression?
(4) how to recover from early compressions that lead to oversmoothing?
(5) how to address all of the above issues without preprocessing all data?
Some videos demonstrating the oKDE:
1. Estimation of a nonstationary distribution:
(the distribution changes at some point in time):
2. Estimation of a 3D spiral distribution with small and large compression:
3. Effects of data ordering:
The proposed oKDE, random order | The proposed oKDE, sorted data from center outward |
Online EM, random order | Online EM, sorted data from center outward |
4. The Old Faithful dataset:
(The model was initialized from the first two samples, the rest were added one at a time)
oKDE with Dth = 0.05 |
oKDE with Dth = 0.1 |
5. Estimation of N-D distributios:
3D distributions can be fully visualized, or via sets of 2D projections. The image below shows a 3D reference distributions from which we drew 10k samples (left). The KDE estimated from these and the respective projections are also shown (right).
7. Mode detection on N-D distributions:
The oKDE toolbox contains some potprocessing tools for analysis of the estimated density. Below we show an example of mode detection in the oKDE and reclustering of the components that form each of the detected modes. The reclustering is performed either by moment matching (matching first two moments of a Gaussian to the subset of components in GMM) or by L2 distance minimization between (minimizing the l2 distance between a Gaussian and the subset of components in the GMM).
Published work:
Supplement derivations of the multivariate bandwidth selector: Media:Supplement_derivations_KristanPR11.pdf
Downloads:
The Java implementation of the oKDE (implemented by Jonas Luthke) is available from the following GIT hub repository:
- Github project: httpss://github.com/joluet/okde-java
- library - https://github.com/joluet/okde-java
- example - https://github.com/joluet/okde-java-example
The C++ implementation of a faster and stable oKDE by Fereira, de Matos nad Riberio is available from the following repository:
- Link to source and installation: httpss://goo.gl/wGuZ2E
Download the Matlab code for online Gaussian Mixture Models using the Online Kernel Density Estimation:
Download the Matlab code for batch Multivariate Bandwidth: