We are going to build Brownian motion simulator. The idea is simple, start from a given value (here 0), and at each generation, add a random value drawn from a normal distribution. It’s a commonly used approximation to model ecological and evolutionary processes. Over time, the value is going on a “random walk”.
The function we are going to write won’t be particularly elegant or efficient but will illustrate key programming concepts:
for
loops work?We will also use this example to get familiar with basic plotting functions.
sim_brownian <- function() {
seq_len(10)
}
The curly brackets delimit the begining and the end of the function. More generally, the curly brackets delimit a chunk of code that is meaningful in itself.
with an argument:
sim_brownian <- function(ngen) {
seq_len(ngen)
}
with an argument and a default value:
sim_brownian <- function(ngen=10) {
seq_len(ngen)
}
for
loopsim_brownian <- function(ngen=10) {
for (j in seq_len(ngen)) {
message("j is equal to ", j)
}
}
sim_brownian <- function(ngen=10) {
## create a vector to hold the results
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
message("j is equal to ", j)
tmp_gen[j] <- j * 2
message("tmp_gen[j] is equal to ", tmp_gen[j])
}
tmp_gen
}
The function rnorm()
draws random numbers from a Normal distribution. For
instance, rnorm(1, mean=2, sd=3)
draws 1 number from a Normal distribution
with a mean 2 (default=0) and a standard deviation of 3 (default=1).
So let’s modify our loop so that the vector we return holds the previous value + a random value drawn from a Normal distribution:
sim_brownian <- function(ngen=10) {
## create a vector to hold the results
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
message("j is equal to ", j)
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1)
message("tmp_gen[j] is equal to ", tmp_gen[j])
}
tmp_gen
}
Oops… it’s not working. Why?
We need to tell our function that it needs to treat the first element of our vector differently that the rest of it. That’s where conditional statements come to the rescue.
In plain English, we would say, “if this is the first element, give it the value 0, else give it the previous value + a value drawn from a Normal distribution”.
In R these are written like this:
if (a condition) {
what to do if the condition is true
} else {
what to do if the condition is false
}
let’s modify our function:
sim_brownian <- function(ngen=10) {
## create a vector to hold the results
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
message("j is equal to ", j)
if (j == 1) {
tmp_gen[j] <- 0
} else {
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1)
}
message("tmp_gen[j] is equal to ", tmp_gen[j])
}
tmp_gen
}
Modify the function so that the user can specify the standard deviation of the Normal distribution from which the random number is drawn from.
Because this is an illustration of a random process, if we want to understand and visualize the impact of changing the standard deviation on the simulations, we need to generate replicates under the same conditions.
To do so, we are going to use nested loops. We are going to use a first loop to create our replicates, and within each replicates, another loop to simulate the random walk (just like before).
sim_brownian <- function(ngen=10, nrep=10, sd) {
## create an empty list to store each replicate
rep_lst <- vector("list", nrep)
for (i in seq_len(nrep)) {
## create a vector to hold the results
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
if (j == 1) {
tmp_gen[j] <- 0
} else {
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1, sd=sd)
}
}
rep_lst[[i]] <- tmp_gen
}
rep_lst
}
sim_brownian <- function(ngen=10, nrep=10, sd) {
## create an empty list to store each replicate
rep_lst <- vector("list", nrep)
for (i in seq_len(nrep)) {
## create a vector to hold the results
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
if (j == 1) {
tmp_gen[j] <- 0
} else {
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1, sd=sd)
}
}
rep_lst[[i]] <- tmp_gen
}
plot(seq_len(ngen), rep_lst[[1]], type="l")
rep_lst
}
draw.plot
) to make the drawing
of the plot optional.lines()
hint:
sim_brownian <- function(ngen=10, nrep=10, sd) {
## create an empty list to store each replicate
rep_lst <- vector("list", nrep)
for (i in seq_len(nrep)) {
## create a vector to hold the results
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
if (j == 1) {
tmp_gen[j] <- 0
} else {
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1, sd=sd)
}
}
rep_lst[[i]] <- tmp_gen
}
plot(seq_len(ngen), rep_lst[[1]], type="l")
lines(seq_len(ngen), rep_lst[[2]])
rep_lst
}
We need to identify the minimum and maximum values across all replicates.
Using plot(..., type="n")
draws an empty plot which is useful to subsequently
draw multiple lines (or other low level functions).
sim_brownian <- function(ngen=100, nrep=10, sd=1, draw.plot=TRUE) {
rep_lst <- vector("list", nrep)
for (i in seq_len(nrep)) {
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
if (j == 1) {
tmp_gen[j] <- 0
} else {
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1, sd=sd)
}
}
rep_lst[[i]] <- tmp_gen
}
if (draw.plot) {
max_y <- max(sapply(rep_lst, max))
min_y <- min(sapply(rep_lst, min))
plot(seq_len(ngen), rep_lst[[1]], lwd=2, type="n",
ylim=c(min_y, max_y))
for (i in seq_len(nrep)) {
lines(seq_len(ngen), rep_lst[[i]])
}
}
rep_lst
}
sim_brownian <- function(ngen=100, nrep=10, sd=1, draw.plot=TRUE) {
rep_lst <- vector("list", nrep)
for (i in seq_len(nrep)) {
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
if (j == 1) {
tmp_gen[j] <- 0
} else {
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1, sd=sd)
}
}
rep_lst[[i]] <- tmp_gen
}
if (draw.plot) {
max_y <- max(sapply(rep_lst, max))
min_y <- min(sapply(rep_lst, min))
plot(seq_len(ngen), rep_lst[[1]], lwd=2, type="n",
ylim=c(min_y, max_y))
for (i in seq_len(nrep)) {
lines(seq_len(ngen), rep_lst[[i]], col="gray")
}
}
rep_lst
}
if/else
so that the last line is drawn using a different color (e.g., “red”)main.col
and an argument for the other lines bg.col
).Let’s use the ...
to specify the labels:
sim_brownian <- function(ngen=100, nrep=10, sd=1, draw.plot=TRUE, ...) {
rep_lst <- vector("list", nrep)
for (i in seq_len(nrep)) {
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
if (j == 1) {
tmp_gen[j] <- 0
} else {
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1, sd=sd)
}
}
rep_lst[[i]] <- tmp_gen
}
if (draw.plot) {
max_y <- max(sapply(rep_lst, max))
min_y <- min(sapply(rep_lst, min))
plot(seq_len(ngen), rep_lst[[1]], lwd=2, type="n",
ylim=c(min_y, max_y), ...)
for (i in seq_len(nrep)) {
lines(seq_len(ngen), rep_lst[[i]], col="gray")
}
}
rep_lst
}
This might be unlikely to occur (maybe you are using the function as part of
your workflow and calling it from another function that has a bug in it), but
what would happen if you tried to use the function using nrep=0
? How about the
number of generations?
sim_brownian(nrep=0, plot=FALSE)
## Error in max(sapply(rep_lst, max)): invalid 'type' (list) of argument
sim_brownian(ngen=0, nrep=1, plot=FALSE)
## Warning in FUN(X[[1L]], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(X[[1L]], ...): no non-missing arguments to min; returning
## Inf
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in plot.window(...): "plot" is not a graphical parameter
## Error in plot.window(...): need finite 'xlim' values
Hmm… not so clear what the issue is. Let’s create an error message that is going to be a little more informative in this case.
Also, we may want to warn the user that if less than 10 replicates are used, the values taken are not going to be very representative of the possible range.
sim_brownian <- function(ngen=100, nrep=10, main.col="red",
bg.col="gray50", sd=1, draw.plot=TRUE, ...) {
if (nrep < 1) {
stop("We need at least one replicate.")
}
if (ngen < 2) {
stop("We need at least 2 generations.")
}
if (nrep < 10) {
warning("With less than 10 replicates, the possible range",
"of values may not be very representative.")
}
rep_lst <- vector("list", nrep)
for (i in seq_len(nrep)) {
tmp_gen <- numeric(ngen)
for (j in seq_len(ngen)) {
if (j == 1) {
tmp_gen[j] <- 0
} else {
tmp_gen[j] <- tmp_gen[j - 1] + rnorm(1, sd=sd)
}
}
rep_lst[[i]] <- tmp_gen
}
if (draw.plot) {
max_y <- max(sapply(rep_lst, max))
min_y <- min(sapply(rep_lst, min))
plot(seq_len(ngen), rep_lst[[1]], col=main.col, lwd=2,
type="n", ylim=c(min_y, max_y), ...)
for (i in seq_len(nrep)) {
if (i == max(nrep)) {
lines(seq_len(ngen), rep_lst[[i]], col=main.col,
lwd=2)
} else {
lines(seq_len(ngen), rep_lst[[i]], col=bg.col)
}
}
}
return(rep_lst)
}
match.arg