Estimating distributions using an integral identity

Overview

In molecular simulations, we often wish to compute a distribution of some quantity of interest. This is usually done by cumulating a histogram, then approximating the histogram as the distribution. A common problem is that without enough data, the resulting distribution from the histogram often looks quite noisy and ugly. A cheap solution is to increase the bin size of the histogram (to give each bin more data points). But then we may risk introducing too much systematic bias.

Adib and Jarzynski (Adib and Jarzynski, 2005) found an elegant solution for this problem [this is later extended by Basner and Jarzynski (Basner and Jarzynski, 2008)]. They observed that in molecular dynamics simulation, we have to compute the force, and the force information can then be averaged to form a mean force, which can be used to estimate the derivative of a typical distribution. Then, by a clever integral identity, they derived a correction term from the mean force and obtained an unbiased estimate of the distribution density. This technique is efficient because it allows us to use data from many neighboring bins. In fact, we can now use data from a quite large window instead of a single bin.

Practically, the above technique has a little technical inconvenience: it sometimes yields a negative output. This is because the correction from the mean force is additive; if at some point, the distribution is small, a negative estimate can be accidentally obtained.

The problem can be the fixed by replacing the additive correction with a multiplicative one (Zhang and Ma, 2012). In this way, we always get a nonnegative output. An additional benefit is that we can get a reasonable estimate of the best window size.

Downloads

Program	Description
dsprog.zip	Examples code in the paper (Zhang and Ma 2012).

References

Adib and Jarzynski, Unbiased estimators for spatial distribution functions of classical fluids, J. Chem. Phys. 122, 014114 (2005).
Basner and Jarzynski, Binless estimation of the potential of mean force, J. Phys. Chem. 112 (40) 12722-12729 (2008).
Zhang and Ma, Estimating statistical distributions using an integral identity, J. Chem. Phys. 136, 204113 (2012); preprint arxiv:1005.0170.