# Homework 5: Genetic algorithm functions II

## Part 1: Explain code

Your first task is to examine the two versions of the mateVectors function that I wrote and add your own detailed comments explaining each step. Explain what each line and each section/loop does. Here are my functions for reference but I have also provided a stand alone R script with the functions and tests: hw05-code.R. You only need to add comments to the functions, not the testing code I've included in the script.

# Function: mateVectors
# Args:
#    - x, y = Two vectors each representing a "genotype". The vectors should be
#             atomic vectors, but this is not enforced. These vectors are the
#             two parent genotypes and must be of the same length and type.
#    - r    = The per-locus recombination rate (scalar). A rate between 0 and 0.5
#             which indicates the probability of a crossover event between any
#             two adjacent genes (vector elements).
# Returns
#        - vector of the same length and type as the parents.
mateVectors1 <- function(x, y, r) {
stopifnot(length(x) == length(y))

child <- y
usingX <- runif(1) <= 0.5

for (i in 1:length(x)) {
if(usingX) {
child[i] <- x[i]
}

if(runif(1) <= r) {
usingX <- !usingX
}
}
return(child)
}

# vectorized implementation of above without explicit loops
mateVectors2 <- function(x,y,r) {
stopifnot(length(x) == length(y))

crossovers <- runif(length(x)) <= r
useFirstParent  <- cumsum(crossovers) %% 2 == 0

if(runif(1) <= 0.5) {
child <- ifelse(useFirstParent, x, y)
} else {
child <- ifelse(useFirstParent, y, x)
}
return(child)
}


For your second task, we will continue to build the functions needed for a simple simulation of evolution.

You will modify our mateVectors() function to include an optional argument, mu, which is the per-locus mutation rate. So far, your function has accepted vectors of any type as long as the types were the same for both parents. Because we are adding mutation and we need to specify how to mutate a locus, we will need to decide and fix how the genotype is represented (otherwise there would be no way to know how to change an allele when mutation occurred). We will represent our genotypes as a vector of numbers between 1 and 100. During mutation, an allele is randomly assigned a value between 1 and 100. We can keep things simple and just assign integers (eg sample(1:100, 1) ). It would be good practice to avoid including literal numbers in your main code body and instead define constants such as MIN.ALLELE and MAX.ALLELE at the top of your script.

# mateNumericVectors(x, y, r, mu=0):
#
# Returns a child vector created by "mating" two parent vectors x and y and
# supplying a per-adjacent-locus recombination rate and a per-locus mutation
# rate. During mutation, an allele is randomly assigned a value between 1 and
# 100, inclusive.
#
#   Args:
#    - x, y = Two vectors each representing a "genotype". The vectors should be
#             atomic vectors containing integer values between 1 and 100
#             inclusive. These vectors are the two parent genotypes and must be
#             of the same length.
#    - r =    The per-locus recombination rate. A rate between 0 and 0.5 which
#             indicates the probability of a crossover event between any two
#   - mu      The per-locus mutation rate. A probability betwwen 0 and 1.


## Part 3: Defining "fitness"

Your final task is to write a function that returns the "fitness" of a vector representing a chromosome. This function will test a chromosome vector (vector of numbers, presumably integers between 1 and 100) against a second vector representing the optimal genotype. We will define "fitness" as the sum of the squared deviations from the optimum. Fitness will be one over the sum of squares (add 1 to sum of squared deviations to avoid divide by zero in the case of a perfect match to the optimum). In other words your code will calculate the sum of squared deviations from the optimum and return $$\frac{1}{sse+1}$$.

# testFitness(genotype, targetGenotype):
#
# Returns a scalar number representing how well the genotype (a vector of
# numbers) matches the target genotype. The matching is measured as the
# reciprocal sum of the squared deviations from the target (1 / (sse+1) ).
#
#     Args:
#         - genotype = the chromosome to be tested.
#         - targetGenotype = The optimum genotype. These must be numeric
#                            vectors of the same length.
#
#  Returns:
#         - A scalar: the reciprocal of the sum of the squared deviations of
#           the genotype from the target (1 / (sse+1) ).