What is Approximate Bayesian Computation

Introduction to Approximate Bayesian Computation

Bayesian statistics is a popular and powerful statistical approach that provides a way to make probabilistic statements about model parameters and hypotheses. It involves updating prior beliefs based on observed data. However, in some cases, the likelihood function may not be tractable, making direct application of Bayesian methods difficult or impossible.

Approximate Bayesian Computation (ABC) provides a solution to this challenge by enabling the simulation of data sets from the model and comparing these to observed data to obtain posterior distributions.

What is Approximate Bayesian Computation?

Approximate Bayesian Computation (ABC) is a statistical framework that allows Bayesian inference in models where the likelihood function is difficult to specify or to evaluate. ABC provides a way to generate samples from the posterior distribution using simulated data sets from the model and comparing them to the observed data.

The basic idea behind ABC is that a parameter set is accepted or rejected based on its ability to generate a simulated data set that is similar to the observed data. ABC uses a distance metric to compare summary statistics of the observed and simulated data sets. The acceptance rate is then used to approximate the posterior distribution.

History of Approximate Bayesian Computation

The first mention of ABC was in 1984 by Rubin who used a simulation-based approach to approximate the posterior distribution by using summary statistics of the observed and simulated data sets. Later in the 1990s, the technique was developed further by Tavare, who used it in the analysis of population genetic data. The modern version of ABC was proposed by Pritchard in 1999, who introduced Markov chain Monte Carlo (MCMC) techniques for sampling from the approximate posterior distribution.

Since then, ABC has been used in various fields of application including genetics, ecology, epidemiology, finance, and more.

The ABC algorithm

The ABC algorithm consists of four main steps:

Step 1: Set the prior distribution for the model parameters.
Step 2: Generate simulated data sets from the model parameters.
Step 3: Calculate a distance metric between the observed and simulated data sets.
Step 4: Accept or reject the model parameters based on the distance metric.

The algorithm repeats steps 2-4 until a sufficient number of accepted samples are obtained to approximate the posterior distribution.

There are different variants of the ABC algorithm, including rejection ABC, importance sampling ABC, MCMC ABC, and sequential ABC.

ABC variants

Rejection ABC: The rejection ABC algorithm is the simplest form of ABC. It involves generating a large number of samples from the prior distribution and accepting those samples that are close to the observed data. Since it involves generating many samples that are not close to the observed data, it can be computationally expensive.

Importance sampling ABC: Importance sampling ABC involves using a weighted likelihood function to estimate the posterior distribution. The weights are calculated as the ratio of the likelihood function for the observed and simulated data sets. This approach can be more computationally efficient than rejection ABC.

MCMC ABC: MCMC ABC is a Markov chain Monte Carlo algorithm that generates samples from the approximate posterior distribution. It involves proposing new parameters and accepting or rejecting them based on an acceptance ratio. MCMC ABC generally requires fewer simulated data sets and can provide better approximations to the posterior.

Sequential ABC: Sequential ABC involves dividing the data set into small subsets and using a progressively refined threshold for the distance metric to approximate the posterior distribution. It can be more computationally efficient than other variants for large data sets or complex models.

Applications of ABC

ABC has been used in a variety of fields of application, including:

Population genetics: ABC has been used to estimate parameters of evolutionary models and infer demographic history based on genetic data. It has also been used in the analysis of ancient DNA.
Ecology: ABC has been used to estimate parameters of population models and meta-population models from ecological data.
Epidemiology: ABC has been used to model the spread of infectious diseases and infer the effectiveness of interventions.
Finance: ABC has been used to model stock returns and option prices.
Machine learning: ABC has been used for model selection and parameter estimation in machine learning models, such as neural networks.

Advantages and limitations of ABC

Advantages:

ABC can be used in models where the likelihood function is intractable or unknown.
ABC provides a way to estimate the posterior distribution without requiring the use of MCMC methods or other computationally expensive techniques.
ABC is flexible and can be adapted to different models and data types.

Limitations:

ABC can be computationally expensive, especially for complex models or large data sets.
ABC requires the specification of summary statistics, which may not always capture all the relevant information present in the data.
ABC can be sensitive to the choice of the distance metric and threshold.

Conclusion

ABC is a powerful and flexible statistical framework that provides a way to estimate the posterior distribution in models where the likelihood function is difficult to specify or to evaluate. Its advantages make it an attractive option for a range of applications in fields such as genetics, ecology, epidemiology, finance, and more. However, its limitations mean that careful consideration must be given to the choice of summary statistics, distance metric, and threshold. Overall, ABC represents an important advance in probabilistic modeling and Bayesian inference.

Related AI Basics