Week 2 Molecular Markers

2.1 Paths

For this course, I will upload the datasets you need to this website and you will download them to your own computer. You will then need to figure out how tell R where to find them. You can find out what directory you are currently in using the getwd() command:

getwd()
## [1] "/Users/rachaelbay/Documents/Courses/Molecular Ecology/2021/Website"

Directories are hierarchical: the represent folders within folders just like you view them when you navigate with your mouse. You can also set which directory R will use to look for files. This is called your ‘working directory’ and you can set it using setwd(). For example, if I create a folder called "EVE109 inside my documents folder where I want to store all my materials for this class, I can then tell R I want it to look in that directory for files:

setwd("~/Documents/EVE109")

If you get an error that your file cannot be found, a good first step is to check your working directory.

     

2.2 Reading in data from spreadsheets

Today we will be working with data from the paper we read in class. You can download the dataset here

A lot of data we use is stored in spreadsheets. However, when we store them in a normal Excel format (.xlsx) they come with a lot of extra formatting that R has trouble reading. One way to avoid this is to store them in a comma separated value format (.csv). Open “wombats.csv” in a text editor and look at it. It’s not easy for us to read, but you can see that it is a very simple format where each entry is separated by a comma. You were able to open it in a spreadsheet program and R can read it easily. If you are making your own data, you can save spreadsheets as .csv files from most spreadsheet programs.

Now we want to read that data into RStudio. We can do that using the command read.csv. Remember do assign the resulting data frame to an object:

data <- read.csv("data/wombats.csv") # Read in a file

Remember, we can look at the top of the dataframe using head:

head(data)
##   No.captures total males females
## 1           1    25    18       7
## 2           2    11     7       4
## 3           3    10     7       3
## 4           4    10     6       4
## 5           5     9     6       3
## 6           6    11     6       5

This is data from the paper we read this week by Banks et al. (2003) using genetic analysis to monitor northern hairy-nosed wombats. The table represents the data shown in Figure 1, so individuals have already been identified and we’re looking at the number of times each was recaptured.

     

2.3 Simple capture recapture example

Next we will learn a simple way to make population size estimations. We will use the capwire package to do this. Remember how to install and load a package?

install.packages("capwire")
library(capwire)

The manual for capwire can be found online here

 

Okay, now lets make up some data. Luckily, capwire has a function that allows you to simulate data. Look up the function simCapture. Notice there are three different arguments we need to specify (we can ignore return.cap.probs because it has a default that we do not want to change). Using this function we’ll simulate a population of 300, from which we have 50 samples. This distribution function just means that every individual has an equal probability of being captured.

sim <- simCapture(n=300,s=50,dist.func=drawCapRatesUnif(0.1,1)) # Simulate capture data
sim
##   capture.class No.Ind
## 1             1     44
## 2             2      3

Now that we have simulated data, we can use one of the fitEcm function to estimate population size. What arguments do we need for that function? Notice that the help page tells you what format your data should be in.

ecm <- fitEcm(data=sim,max.pop=500) # estimate population size

The fitEcm page also describes that output. We are most interested in the population size estimate, which is called ml.pop.size. We can extract that using:

ecm$ml.pop.size
## [1] 392

How much uncertainty is there in our estimate? We can use bootstrap resampling to create confidence intervals. Look up the command boostrapCapwire

boot <- bootstrapCapwire(x=ecm,bootstraps=1000,CI=c(0.025,0.975)) # estimate confidence intervals
boot
## $ml.pop.size
## [1] 392
## 
## $conf.int
##  2.5% 97.5% 
##   187   500

Using this 95% confidence interval means there is a 95% chance the real answer is within that range. Why is it so large?! What happens if we have more samples?

     

2.4 Homework

Now that we know how to estimate population sizes, we will use the wombat data from before. Start a new script for this.

2.4.1 Homework 2: Write a script that does the following:

  1. Read in “wombats.csv”

  2. Estimate the total population size.

  3. Estimate confidence intervals for the total population size.

  4. Estimate male and female population sizes.

  5. What is the ratio of males to females in this population?