library(ggplot2)
Computer lab: Bayesian analysis
Methods of Scientific Working (3502-450)
Background
The purpose of this computer lab is to simulate an urn experiment to demonstrate the key principles of a Bayesian analysis.
Then, as an application of the theory we will apply Bayesian reasoning to estimate a parameter (allele frequency) in a simple population genetic model.
The third examples is a Bayesian test of a hypothesis given the data observed in a sample. We will test the hypothesis that that genotype frequencies of a genetic locus are in Hardy Weinberg equilibrium.
Marble experiment
We first start with a simple marble experiment in which we compare two hypotheses about the number of blue and white marbles in an urn.
Design of the experiment
The experiment is set up by flipping a fair coin to select one of two possible hypothesis as true hypothesis.
Depending on the outcome, we put the marbles into the urn:
- If head, place in an urn 1 white and 3 blue marbles
- If tail, place in an urn 3 white and 1 blue marble
From this, we generate two hypotheses, we will test:
- H\(_B\): 1 white and 3 blue marbles (WBBB)
- H\(_W\): 3 white and 1 blue marble (WWWB)
The goal of the Bayesian analysis is to test which of the hypotheses is probably true (or more likely true) given the data.
To find out, we conduct an experiment:
- Mix the marbles in the urn
- Draw a marble and record its color
- Put it back into the urn
- Repeat the procedure as necessary
Such an experiment can be easily set up with R
The goal of the setup is to find out which of the two hypotheses have a higher probability given the data. We flip the coin first to randomly choose between one of the hypotheses. Because we do not know the outcome of the coin toss, we assume that both hypotheses are equally likely and for this reason the prior probability is 1:1 for both hypotheses.
<- c("head", "tail")
coin <- sample(coin, 1)
draw if (draw == "head") {
<- c('W', 'B', 'B', 'B')
urn else {
} <- c('W', 'W', 'W', 'B')
urn }
We have now randomly constructed an urn, from which we can sample. To run our experiment, we need to calculate the posterior probability. Since we are comparing two hypotheses, we can use the ratio form of Bayes’s rule,
\[\frac{P(H_B | D)}{P(H_W | D)} = \frac{P(D | H_B)}{P(D | H_W)} \times \frac{P(H_B)}{P(H_W)}\]
The first term describes the ratio of posterior probabilities, the second the likelihood odds ratio, and the third the ratio of the prior probabilities.
It is now possible to construct the likelihoods. If hypothesis \(H_B\) is true, the probability of obtaining a blue draw is 0.75 (three out of four marbles are blue). Under hypothesis \(H_W\) it is 0.25. Therefore the likelihood odds ratio is
\[\frac{P(D|H_B)}{P(D|H_W)} = \frac{0.75}{0.25} = \frac{3}{1}.\]
If we draw a white marble, the likelihood odds ratio is
\[\frac{P(D|H_B)}{P(D|H_W)} = \frac{0.25}{0.75} = \frac{1}{3}.\]
For the first draw, the prior probability of both hypotheses is \(P(H_B) = P(H_W) = 0.5\), and we can calculate the posterior probabilities if we drew a blue marble as
\[\frac{P(H_B | D)}{P(H_W | D)} = \frac{0.75}{0.25} \times \frac{0.5}{0.5} = \frac{3}{1}\]
The posterior probability of the first draw is the prior probability of the second draw. Since we are only interested in the ratio of the new prior probabilities, we can just set
\[\frac{P(H_B)}{P(H_W)}=\frac{3}{1}\]
as the new prior ratio and start from new.
If the second draw is a white marble, the posterior ratio is then
\[\frac{P(H_B | D)}{P(H_W | D)} = \frac{0.25}{0.75} \times \frac{3}{1} = \frac{0.75}{0.75} = \frac{1}{1}\]
and so on. We can implement this experiment now in R:
<- c('W','B','B','B')
urnB <- c('W','W','W','B')
urnW
<- 0.5 # prior probability of hypothesis B
priorB <- 0.5 # prior probability of hypothesis W
priorW
# randomly select the urn (i.e., the true hypothesis)
<- sample(c('H_B', 'H_W'), 1)
hypothesis if (hypothesis == 'H_B') {
<- c('W','B','B','B')
urn else if (hypothesis == 'H_W') {
} <- c('W','W','W','B')
urn
}print(urn)
[1] "W" "W" "W" "B"
<- 0.999999
stoppingrule
# set up experiment
<- 0
i <- c(i)
draws <- c("(Prior)")
results <- c("1:1")
ratio <- c(priorB)
posteriorB <- c(priorW)
posteriorW
while (priorB < stoppingrule && priorW < stoppingrule) {
<- sample(urn, 1) # Randomly choose the true hypothesis
draw if (draw == 'B') {
<- 0.75
likelihoodB <- 0.25
likelihoodW else {
} <- 0.25
likelihoodB <- 0.75
likelihoodW
}
# Bayes' theorem for odds ratios
<- (likelihoodB / likelihoodW) * (priorB / priorW)
posteriorRatio <- paste(as.integer(posteriorRatio), ": 1")
posteriorInteger <- posteriorRatio / (posteriorRatio + 1) #see footnote
priorB <- 1 - priorB
priorW
= i + 1 # increase the counter for the next draw
i
<- c(draws, i)
draws <- c(results, draw)
results <- c(ratio, posteriorInteger)
ratio <- c(posteriorB, priorB)
posteriorB <- c(posteriorW, priorW)
posteriorW
}
# put the data into a dataframe and plot the outcome
= data.frame(draws, H_B = posteriorB, H_W = posteriorW)
df
# Melting the dataframe to make it suitable for ggplot
<- reshape2::melt(df, id.vars = 'draws', measure.vars = c('H_W', 'H_B'))
df_long
# Plotting
ggplot(df_long, aes(x = draws, y = value, color = variable)) +
geom_line() +
geom_point() +
labs(title = paste("True hypothesis:", hypothesis, "; Stopping rule: Posterior probability >", stoppingrule), x = "Number of draws until stoping rule", y = "Posterior probability", color = "Hypothesis") +
ylim(0, 1) +
scale_color_manual(values = c("H_B" = "blue", "H_W" = "black"))
Excercises
- Run a single simulation run using this code and inspect the output
- Repeat the simulation run. How does the output change?
- Change the prior to check out how much the experiment depends on the prior used in the early stage of the experiment
- The ratio of the posterior probabilities give the relative believe in each of the hypotheses. Check how the ratio is effected when the true hypothesis differs from the one which are tested.
Study questions
- Why can we use the posterior probability of the last experiment as the prior of the next experiment?
- How would the calculations be changed if we have three different marble colors instead of two different marbles?