11 Genome Environment Association
All of the code presented in this weeks class comes modified from a former Marine Genomics student and recent UC Davis graduate, Camille Rumberger! Thank you Camille!
11.1 Download the data
For this week we are again using data from the wonderful Xuereb et al. 2018 paper here And consistes of a vcf file with 3966 Snps from 685 pacific sea cucumbers.
cd
wget https://raw.githubusercontent.com/BayLab/MarineGenomicsData/main/week11_semester.tar.gz
tar -xzvf week11_semester.tar.gz
mv MarineGenomics/Week11 MarineGenomics/Week10
11.2 install a compiler for Unix/bash
Run this command below in the terminal from the home directory. It’s necessary to install several R packages.
Make sure to respond with a lower case “y” and not “Y” when prompted.
sudo apt update
sudo apt-get install libgdal-dev
sudo apt-get install libudunits2-dev
11.3 install R packages
Make sure to run all of the commands below (without the #) to install and load programs we will be using. You may get warnings during the installation but feel free to ignore them.
# install.packages('devtools')
# install.packages("sf", dependencies = T)
# install.packages(c('lfmm','vegan','data.table','sdmpredictors', 'rgdal','raster'), dependencies = T)
# install.packages("psych", dependencies = T)
# install.packages('foreach')
# devtools::install_github("bcm-uga/lfmm")
# install.packages("BiocManager")
# BiocManager::install("LEA", force=T) ## choose (a) to update
library(lfmm)
library(psych)
library(vegan)
library(LEA)
library(data.table)
library(sdmpredictors)
library(leaflet)
library(ggplot2)
library(rgdal)
library(raster)
11.4 Get the Environmental Data
Now we need to download the environmental data for our study. Make sure to go into the correct working directory in R (MarineGenomics/Week10).
We’re just going to look at a few variables but there are many others that you could choose from here: https://www.bio-oracle.org/explore-data.php
Note that this step is rather time consuming, thus it’s best to download the data once and save it as an environment file, which you can upload each time you want to modify your script.
# load the environmental data and store it as objects that we can use later
# This will give you warning messages, but feel free to ignore them.
<- load_layers(layercodes = c("BO_chlomean","BO_ph", "BO_salinity","BO_sstmax","BO_sstmean")) environ
## Error in utils::download.file(layer_url, path, method = "auto", quiet = FALSE, : download from 'https://bio-oracle.org/data/1.0/BO_chlomean.zip' failed
#now put each layer into it's own object
<-load_layers("BO_chlomean") # mean chlorophyll chlomean
## Error in utils::download.file(layer_url, path, method = "auto", quiet = FALSE, : download from 'https://bio-oracle.org/data/1.0/BO_chlomean.zip' failed
<-load_layers("BO_ph") # mean pH ph
## Error in utils::download.file(layer_url, path, method = "auto", quiet = FALSE, : cannot open URL 'https://bio-oracle.org/data/1.0/BO_ph.zip'
<-load_layers("BO_salinity") # mean salinity salinity
## Error in utils::download.file(layer_url, path, method = "auto", quiet = FALSE, : download from 'https://bio-oracle.org/data/1.0/BO_salinity.zip' failed
<-load_layers("BO_sstmax") # maximum sea surface temperature sst_max
## Error in utils::download.file(layer_url, path, method = "auto", quiet = FALSE, : cannot open URL 'https://bio-oracle.org/data/1.0/BO_sstmax.zip'
<-load_layers("BO_sstmean") # mean sea surface temperature sst_mean
## Error in utils::download.file(layer_url, path, method = "auto", quiet = FALSE, : download from 'https://bio-oracle.org/data/1.0/BO_sstmean.zip' failed
Now we’ll read in our metadata, which has lat and lon for each sample
<-read.csv("californicus_metadata.csv")
meta
#make a new dataframe that just contains the lat and lon and label them "lat" and "long"
#notice that the original metafile mislabels lat and lon, oops!
<-cbind("lat" = meta$LONG, "lon" = meta$LAT)
sites
head(sites)
## lat lon
## [1,] -126.5646 50.5212
## [2,] -126.5646 50.5212
## [3,] -126.5646 50.5212
## [4,] -126.5646 50.5212
## [5,] -126.5646 50.5212
## [6,] -126.5646 50.5212
11.5 Extract Site Specific Information from our Envirornmental Layers
The data we downloaded
<- data.frame(Name=sites, extract(environ,sites)) sites_environ
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'extract': object 'environ' not found
head(sites_environ)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'sites_environ' not found
That produces for us a site specific environment file. We now need to convert this file into a format that we can save it as a matrix and export it as an environment file (i.e., in the format that the next step need it to be in which we do below)
#remove lat and lon and convert to matrix
<-as.matrix(sites_environ[,-c(1,2)]) sites_environ_matrix
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.matrix': object 'sites_environ' not found
#remove any Na's
<-na.omit(sites_environ_matrix) sites_environ_matrix_nas
## Error in na.omit(sites_environ_matrix): object 'sites_environ_matrix' not found
#write the file to an env file
write.env(sites_environ_matrix_nas, "sites_environ_matrix.env")
## Error in write.env(sites_environ_matrix_nas, "sites_environ_matrix.env"): object 'sites_environ_matrix_nas' not found
11.6 Make a Map of our environmental data
We now have enough information to make a nice map of one of our environmental parameters.
#what are are min amd max lat and lon
range(sites_environ$Name.lon)
## Error in eval(expr, envir, enclos): object 'sites_environ' not found
range(sites_environ$Name.lat)
## Error in eval(expr, envir, enclos): object 'sites_environ' not found
# define the area we want to map
<-extent(-140, -120, 40, 60)
ne.pacific
# crop the map we are interested in (here it is maximum sea surface temp)
<-crop(sst_max, ne.pacific) sst_max.crop
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'crop': object 'sst_max' not found
#make a nice color ramp and plot the map
= colorRampPalette(c("#5E85B8","#EDF0C0","#C13127"))
my.colors plot(sst_max.crop,col=my.colors(1000),axes=FALSE, box=FALSE)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'sst_max.crop' not found
title(cex.sub = 1.25, sub = "Maximum temperature at the sea surface (ºC)")
## Error in title(cex.sub = 1.25, sub = "Maximum temperature at the sea surface (ºC)"): plot.new has not been called yet
11.7 Load in genetic data
Now we need to load in our genetic data. We have data in a vcf file format and the first few steps will be to convert it to the format needed by the lfmm package.
# convert our vcf file to lfmm
vcf2lfmm('filtered_3699snps_californicus_685inds.recode.vcf', 'filtered_3699snps_californicus.lfmm')
##
## - number of detected individuals: 685
## - number of detected loci: 3699
##
## For SNP info, please check ./filtered_3699snps_californicus_685inds.recode.vcfsnp.
##
## 0 line(s) were removed because these are not SNPs.
## Please, check ./filtered_3699snps_californicus_685inds.recode.removed file, for more informations.
##
##
## - number of detected individuals: 685
## - number of detected loci: 3699
## [1] "./filtered_3699snps_californicus_685inds.recode.lfmm"
<-read.lfmm('filtered_3699snps_californicus_685inds.recode.lfmm')
sea_cuc_lfmm
# and convert to geno
lfmm2geno('filtered_3699snps_californicus_685inds.recode.lfmm', 'filtered_3699snps_californicus_685inds.recode.geno')
##
## - number of detected individuals: 685
## - number of detected loci: 3699
## [1] "filtered_3699snps_californicus_685inds.recode.geno"
# read in geno file
<-read.geno('filtered_3699snps_californicus_685inds.recode.geno') sea_cuc_geno
# create a snmf object
<- snmf("filtered_3699snps_californicus_685inds.recode.geno", K = 1:10, entropy=T, ploidy = 2, project = "new") sea_cuc200.snmf
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] 941514363
## [1] "*************************************"
## [1] "* create.dataset *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -s (seed random init) 941514363
## -r (percentage of masked data) 0.05
## -x (genotype file in .geno format) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -o (output file in .geno format) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
##
## Write genotype file with masked data, /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
## [1] "*************************************"
## [1] "* sNMF K = 1 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 1
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K1/run1/filtered_3699snps_californicus_685inds.recode_r1.1.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K1/run1/filtered_3699snps_californicus_685inds.recode_r1.1.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 562950894935675
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
##
## Least-square error: 472064.927614
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K1/run1/filtered_3699snps_californicus_685inds.recode_r1.1.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K1/run1/filtered_3699snps_californicus_685inds.recode_r1.1.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 1
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K1/run1/filtered_3699snps_californicus_685inds.recode_r1.1.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K1/run1/filtered_3699snps_californicus_685inds.recode_r1.1.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.334567
## Cross-Entropy (masked data): 0.339706
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 2 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 2
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K2/run1/filtered_3699snps_californicus_685inds.recode_r1.2.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K2/run1/filtered_3699snps_californicus_685inds.recode_r1.2.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 18302628886575210107
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=============]
## Number of iterations: 34
##
## Least-square error: 466665.312183
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K2/run1/filtered_3699snps_californicus_685inds.recode_r1.2.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K2/run1/filtered_3699snps_californicus_685inds.recode_r1.2.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 2
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K2/run1/filtered_3699snps_californicus_685inds.recode_r1.2.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K2/run1/filtered_3699snps_californicus_685inds.recode_r1.2.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.328714
## Cross-Entropy (masked data): 0.336246
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 3
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K3/run1/filtered_3699snps_californicus_685inds.recode_r1.3.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K3/run1/filtered_3699snps_californicus_685inds.recode_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 140420602289787
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===================]
## Number of iterations: 52
##
## Least-square error: 463294.754856
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K3/run1/filtered_3699snps_californicus_685inds.recode_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K3/run1/filtered_3699snps_californicus_685inds.recode_r1.3.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 3
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K3/run1/filtered_3699snps_californicus_685inds.recode_r1.3.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K3/run1/filtered_3699snps_californicus_685inds.recode_r1.3.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.324817
## Cross-Entropy (masked data): 0.334397
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 4
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K4/run1/filtered_3699snps_californicus_685inds.recode_r1.4.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K4/run1/filtered_3699snps_californicus_685inds.recode_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 140420602289787
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=====================]
## Number of iterations: 56
##
## Least-square error: 461854.890300
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K4/run1/filtered_3699snps_californicus_685inds.recode_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K4/run1/filtered_3699snps_californicus_685inds.recode_r1.4.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 4
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K4/run1/filtered_3699snps_californicus_685inds.recode_r1.4.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K4/run1/filtered_3699snps_californicus_685inds.recode_r1.4.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.323667
## Cross-Entropy (masked data): 0.335366
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 5
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K5/run1/filtered_3699snps_californicus_685inds.recode_r1.5.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K5/run1/filtered_3699snps_californicus_685inds.recode_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 140420602289787
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===================================================================]
## Number of iterations: 179
##
## Least-square error: 460008.997985
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K5/run1/filtered_3699snps_californicus_685inds.recode_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K5/run1/filtered_3699snps_californicus_685inds.recode_r1.5.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 5
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K5/run1/filtered_3699snps_californicus_685inds.recode_r1.5.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K5/run1/filtered_3699snps_californicus_685inds.recode_r1.5.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.322707
## Cross-Entropy (masked data): 0.336741
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 6 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 6
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K6/run1/filtered_3699snps_californicus_685inds.recode_r1.6.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K6/run1/filtered_3699snps_californicus_685inds.recode_r1.6.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 9570150149676667
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===========================================================================]
## Number of iterations: 200
##
## Least-square error: 459367.949539
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K6/run1/filtered_3699snps_californicus_685inds.recode_r1.6.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K6/run1/filtered_3699snps_californicus_685inds.recode_r1.6.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 6
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K6/run1/filtered_3699snps_californicus_685inds.recode_r1.6.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K6/run1/filtered_3699snps_californicus_685inds.recode_r1.6.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.321801
## Cross-Entropy (masked data): 0.337255
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 7 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 7
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K7/run1/filtered_3699snps_californicus_685inds.recode_r1.7.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K7/run1/filtered_3699snps_californicus_685inds.recode_r1.7.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 140420602289787
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [======================]
## Number of iterations: 60
##
## Least-square error: 457972.178305
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K7/run1/filtered_3699snps_californicus_685inds.recode_r1.7.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K7/run1/filtered_3699snps_californicus_685inds.recode_r1.7.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 7
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K7/run1/filtered_3699snps_californicus_685inds.recode_r1.7.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K7/run1/filtered_3699snps_californicus_685inds.recode_r1.7.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.321229
## Cross-Entropy (masked data): 0.337651
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 8 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 8
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K8/run1/filtered_3699snps_californicus_685inds.recode_r1.8.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K8/run1/filtered_3699snps_californicus_685inds.recode_r1.8.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 941514363
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===========================================================================]
## Number of iterations: 200
##
## Least-square error: 456642.352389
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K8/run1/filtered_3699snps_californicus_685inds.recode_r1.8.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K8/run1/filtered_3699snps_californicus_685inds.recode_r1.8.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 8
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K8/run1/filtered_3699snps_californicus_685inds.recode_r1.8.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K8/run1/filtered_3699snps_californicus_685inds.recode_r1.8.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.320441
## Cross-Entropy (masked data): 0.33836
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 9 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 9
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K9/run1/filtered_3699snps_californicus_685inds.recode_r1.9.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K9/run1/filtered_3699snps_californicus_685inds.recode_r1.9.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 140420602289787
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=============================================]
## Number of iterations: 120
##
## Least-square error: 455318.160000
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K9/run1/filtered_3699snps_californicus_685inds.recode_r1.9.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K9/run1/filtered_3699snps_californicus_685inds.recode_r1.9.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 9
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K9/run1/filtered_3699snps_californicus_685inds.recode_r1.9.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K9/run1/filtered_3699snps_californicus_685inds.recode_r1.9.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.319607
## Cross-Entropy (masked data): 0.338609
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 10 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 10
## -x (input file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## -q (individual admixture file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K10/run1/filtered_3699snps_californicus_685inds.recode_r1.10.Q
## -g (ancestral frequencies file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K10/run1/filtered_3699snps_californicus_685inds.recode_r1.10.G
## -i (number max of iterations) 200
## -a (regularization parameter) 10
## -s (seed random init) 140420602289787
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===========================================================]
## Number of iterations: 157
##
## Least-square error: 454563.274582
## Write individual ancestry coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K10/run1/filtered_3699snps_californicus_685inds.recode_r1.10.Q: OK.
## Write ancestral allele frequency coefficient file /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K10/run1/filtered_3699snps_californicus_685inds.recode_r1.10.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of ancestral pops) 10
## -x (genotype file) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.geno
## -q (individual admixture) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K10/run1/filtered_3699snps_californicus_685inds.recode_r1.10.Q
## -g (ancestral frequencies) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/K10/run1/filtered_3699snps_californicus_685inds.recode_r1.10.G
## -i (with masked genotypes) /Users/ericaspotswoodnielsen/Desktop/PD_stuffies/teaching/MarineGenomics2022/data/Week10_gea/filtered_3699snps_californicus_685inds.recode.snmf/masked/filtered_3699snps_californicus_685inds.recode_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.319525
## Cross-Entropy (masked data): 0.339157
## The project is saved into :
## filtered_3699snps_californicus_685inds.recode.snmfProject
##
## To load the project, use:
## project = load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
=load.snmfProject("filtered_3699snps_californicus_685inds.recode.snmfProject")
project
#plot values of cross-entropy criteron of k
plot(sea_cuc200.snmf)
This plot tells us which value of K we should use when running the next analyses. We’re looking for a region where the cross-entropy is lowest and ideally it plateaus. We don’t get a plateau in our case but you can see our lowest value is at 3. So we would use a K of 3 in the next step.
However, for the purposes of time we’re going to do the next step with a k=1 and a rep=1.
# Genome scan for selection using environmental variables
<- lfmm("filtered_3699snps_californicus_685inds.recode.lfmm", "sites_environ_matrix.env", K=1, rep=1, project = "new") sea_cuc_lfmm
## The project is saved into :
## filtered_3699snps_californicus_685inds.lfmmProject
##
## To load the project, use:
## project = load.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## To remove the project, use:
## remove.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## [1] "********************************"
## [1] "* K = 1 repetition 1 d = 1 *"
## [1] "********************************"
## Summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of latent factors) 1
## -o (output file) filtered_3699snps_californicus_685inds.lfmm/K1/run1/filtered_3699snps_californicus_685inds.recode_r1
## -i (number of iterations) 10000
## -b (burnin) 5000
## -s (seed random init) 140420984577842
## -p (number of processes (CPU)) 1
## -x (genotype file) filtered_3699snps_californicus_685inds.recode.lfmm
## -v (variable file) sites_environ_matrix.env
## -D (number of covariables) 5
## -d (the dth covariable) 1
##
## Read variable file:
## sites_environ_matrix.env OK.
##
## Read genotype file:
## filtered_3699snps_californicus_685inds.recode.lfmm OK.
##
## <<<<
## Analyse for variable 1
##
## Start of the Gibbs Sampler algorithm.
##
## [ ]
## [===========================================================================]
##
## End of the Gibbs Sampler algorithm.
##
## ED:2533818.161 DIC: 2533819.214
##
## The statistics for the run are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run1/filtered_3699snps_californicus_685inds.recode_r1_s1.1.dic.
##
## The zscores for variable 1 are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run1/filtered_3699snps_californicus_685inds.recode_r1_s1.1.zscore.
## The columns are: zscores, -log10(p-values), p-values.
##
## -------------------------
## The execution for variable 1 worked without error.
## >>>>
##
## The project is saved into :
## filtered_3699snps_californicus_685inds.lfmmProject
##
## To load the project, use:
## project = load.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## To remove the project, use:
## remove.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## [1] "********************************"
## [1] "* K = 1 repetition 1 d = 2 *"
## [1] "********************************"
## Summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of latent factors) 1
## -o (output file) filtered_3699snps_californicus_685inds.lfmm/K1/run2/filtered_3699snps_californicus_685inds.recode_r2
## -i (number of iterations) 10000
## -b (burnin) 5000
## -s (seed random init) 140420984577842
## -p (number of processes (CPU)) 1
## -x (genotype file) filtered_3699snps_californicus_685inds.recode.lfmm
## -v (variable file) sites_environ_matrix.env
## -D (number of covariables) 5
## -d (the dth covariable) 2
##
## Read variable file:
## sites_environ_matrix.env OK.
##
## Read genotype file:
## filtered_3699snps_californicus_685inds.recode.lfmm OK.
##
## <<<<
## Analyse for variable 2
##
## Start of the Gibbs Sampler algorithm.
##
## [ ]
## [===========================================================================]
##
## End of the Gibbs Sampler algorithm.
##
## ED:2533818.053 DIC: 2533819.044
##
## The statistics for the run are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run2/filtered_3699snps_californicus_685inds.recode_r2_s2.1.dic.
##
## The zscores for variable 2 are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run2/filtered_3699snps_californicus_685inds.recode_r2_s2.1.zscore.
## The columns are: zscores, -log10(p-values), p-values.
##
## -------------------------
## The execution for variable 2 worked without error.
## >>>>
##
## The project is saved into :
## filtered_3699snps_californicus_685inds.lfmmProject
##
## To load the project, use:
## project = load.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## To remove the project, use:
## remove.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## [1] "********************************"
## [1] "* K = 1 repetition 1 d = 3 *"
## [1] "********************************"
## Summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of latent factors) 1
## -o (output file) filtered_3699snps_californicus_685inds.lfmm/K1/run3/filtered_3699snps_californicus_685inds.recode_r3
## -i (number of iterations) 10000
## -b (burnin) 5000
## -s (seed random init) 1323802418
## -p (number of processes (CPU)) 1
## -x (genotype file) filtered_3699snps_californicus_685inds.recode.lfmm
## -v (variable file) sites_environ_matrix.env
## -D (number of covariables) 5
## -d (the dth covariable) 3
##
## Read variable file:
## sites_environ_matrix.env OK.
##
## Read genotype file:
## filtered_3699snps_californicus_685inds.recode.lfmm OK.
##
## <<<<
## Analyse for variable 3
##
## Start of the Gibbs Sampler algorithm.
##
## [ ]
## [===========================================================================]
##
## End of the Gibbs Sampler algorithm.
##
## ED:2533818.108 DIC: 2533819.087
##
## The statistics for the run are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run3/filtered_3699snps_californicus_685inds.recode_r3_s3.1.dic.
##
## The zscores for variable 3 are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run3/filtered_3699snps_californicus_685inds.recode_r3_s3.1.zscore.
## The columns are: zscores, -log10(p-values), p-values.
##
## -------------------------
## The execution for variable 3 worked without error.
## >>>>
##
## The project is saved into :
## filtered_3699snps_californicus_685inds.lfmmProject
##
## To load the project, use:
## project = load.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## To remove the project, use:
## remove.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## [1] "********************************"
## [1] "* K = 1 repetition 1 d = 4 *"
## [1] "********************************"
## Summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of latent factors) 1
## -o (output file) filtered_3699snps_californicus_685inds.lfmm/K1/run4/filtered_3699snps_californicus_685inds.recode_r4
## -i (number of iterations) 10000
## -b (burnin) 5000
## -s (seed random init) 140420984577842
## -p (number of processes (CPU)) 1
## -x (genotype file) filtered_3699snps_californicus_685inds.recode.lfmm
## -v (variable file) sites_environ_matrix.env
## -D (number of covariables) 5
## -d (the dth covariable) 4
##
## Read variable file:
## sites_environ_matrix.env OK.
##
## Read genotype file:
## filtered_3699snps_californicus_685inds.recode.lfmm OK.
##
## <<<<
## Analyse for variable 4
##
## Start of the Gibbs Sampler algorithm.
##
## [ ]
## [===========================================================================]
##
## End of the Gibbs Sampler algorithm.
##
## ED:2533818.034 DIC: 2533819.034
##
## The statistics for the run are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run4/filtered_3699snps_californicus_685inds.recode_r4_s4.1.dic.
##
## The zscores for variable 4 are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run4/filtered_3699snps_californicus_685inds.recode_r4_s4.1.zscore.
## The columns are: zscores, -log10(p-values), p-values.
##
## -------------------------
## The execution for variable 4 worked without error.
## >>>>
##
## The project is saved into :
## filtered_3699snps_californicus_685inds.lfmmProject
##
## To load the project, use:
## project = load.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## To remove the project, use:
## remove.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## [1] "********************************"
## [1] "* K = 1 repetition 1 d = 5 *"
## [1] "********************************"
## Summary of the options:
##
## -n (number of individuals) 685
## -L (number of loci) 3699
## -K (number of latent factors) 1
## -o (output file) filtered_3699snps_californicus_685inds.lfmm/K1/run5/filtered_3699snps_californicus_685inds.recode_r5
## -i (number of iterations) 10000
## -b (burnin) 5000
## -s (seed random init) 140420984577842
## -p (number of processes (CPU)) 1
## -x (genotype file) filtered_3699snps_californicus_685inds.recode.lfmm
## -v (variable file) sites_environ_matrix.env
## -D (number of covariables) 5
## -d (the dth covariable) 5
##
## Read variable file:
## sites_environ_matrix.env OK.
##
## Read genotype file:
## filtered_3699snps_californicus_685inds.recode.lfmm OK.
##
## <<<<
## Analyse for variable 5
##
## Start of the Gibbs Sampler algorithm.
##
## [ ]
## [===========================================================================]
##
## End of the Gibbs Sampler algorithm.
##
## ED:2533818.048 DIC: 2533819.073
##
## The statistics for the run are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run5/filtered_3699snps_californicus_685inds.recode_r5_s5.1.dic.
##
## The zscores for variable 5 are registered in:
## filtered_3699snps_californicus_685inds.lfmm/K1/run5/filtered_3699snps_californicus_685inds.recode_r5_s5.1.zscore.
## The columns are: zscores, -log10(p-values), p-values.
##
## -------------------------
## The execution for variable 5 worked without error.
## >>>>
##
## The project is saved into :
## filtered_3699snps_californicus_685inds.lfmmProject
##
## To load the project, use:
## project = load.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
##
## To remove the project, use:
## remove.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject")
# load project
= load.lfmmProject("filtered_3699snps_californicus_685inds.lfmmProject") project
Now we can check and see which of our environmental variables is associated with genomic variation
#get the z scores from each env variable
<- z.scores(sea_cuc_lfmm, K=1, d=1)
zs1 <- z.scores(sea_cuc_lfmm, K=1, d=2)
zs2 <- z.scores(sea_cuc_lfmm, K=1, d=3)
zs3 <- z.scores(sea_cuc_lfmm, K=1, d=4)
zs4 <- z.scores(sea_cuc_lfmm, K=1, d=5)
zs5
# compute the genomic inflation factor lambda
=median(zs1^2)/qchisq(0.5, df=1)
lambda1=median(zs2^2)/qchisq(0.5, df=1)
lambda2=median(zs3^2)/qchisq(0.5, df=1)
lambda3=median(zs4^2)/qchisq(0.5, df=1)
lambda4=median(zs5^2)/qchisq(0.5, df=1)
lambda5
#then compute adjusted p-values
<-pchisq(zs1^2/lambda1, df=1, lower=F)
adj.p.val1<-pchisq(zs2^2/lambda2, df=1, lower=F)
adj.p.val2<-pchisq(zs3^2/lambda3, df=1, lower=F)
adj.p.val3<-pchisq(zs4^2/lambda4, df=1, lower=F)
adj.p.val4<-pchisq(zs5^2/lambda5, df=1, lower=F)
adj.p.val5
#then plot that p-value
hist(adj.p.val1, col="blue")
hist(adj.p.val2, col="blue")
hist(adj.p.val3, col="blue")
hist(adj.p.val4, col="blue")
hist(adj.p.val5, col="blue")
# control of false discoveries
# to correct for multiple testing, we can apply the Benjamini-Hochberg algorithm
# L is number of loci
=3699
L#fdr level q
= 0.1
q = which(sort(adj.p.val1) < q * (1:L)/L)
w # candidates are then
.1 = order(adj.p.val1)[w]
candidates
length(candidates.1)
## [1] 86
#These are the candidate loci related to our first variable Mean Chlorphyl
11.8 Exercises
- Pick your favorite environmental variable from the bio-oracle website.
For marine variables you can use the command list_layers(c("Bio-ORACLE","MARSPEC"))$layer_code
to view the different Bio-Oracle and MARSPEC environmental layers. For Terrestrial layers use: list_layers("WorldClim")$layer_code
Then download the layer you choose and store it as an object with load_layers()
Now decide which region you want to plot. If you’re interested in getting most of California you can use these coords with the function extent(-125, -115, 30, 40).
Crop your environmental layer and make a plot similar to the one we made in class.
Solution
library(sdmpredictors)
#get env data
<-load_layers("WC_bio15")
precip_seasonality
<-load_layers("WC_bio13")
precip_wettest
#make range to crop for california
<-extent(-130, -116, 30, 40)
cali
#crop it
<-crop(precip_wettest, cali)
precip_w.crop
#plot it
= colorRampPalette(c("#5E85B8","#EDF0C0","#C13127"))
my.colors plot(precip_w.crop,col=my.colors(1000),axes=FALSE, box=FALSE)
title(cex.sub = 1.25, sub = "Precipitation of the Wettest Month")
- Using the script for determining the number of candidates for the first environmental variable, find the number of candidates for the other four variables. Are these the same or different SNPs? (hint: have a look at this webpage on how to compare lists in R)
Solution
#We'll use this script below
#for variable 2 (PH)
# L is number of loci
=3699
L#fdr level q
= 0.1
q = which(sort(adj.p.val2) < q * (1:L)/L)
w # candidates are then
.2 = order(adj.p.val2)[w]
candidates
length(candidates.2)
## [1] 63
#variable 3
= which(sort(adj.p.val3) < q * (1:L)/L)
w # candidates are then
.3 = order(adj.p.val3)[w]
candidates
length(candidates.3)
## [1] 77
#variable 4
= which(sort(adj.p.val4) < q * (1:L)/L)
w # candidates are then
.4 = order(adj.p.val4)[w]
candidates
length(candidates.4)
## [1] 66
#variable 5
= which(sort(adj.p.val5) < q * (1:L)/L)
w # candidates are then
.5 = order(adj.p.val5)[w]
candidates
length(candidates.5)
## [1] 106
11.10 Download the data
You’ll download one fasta file that contains the 17 regions in which we found our Fst outlier SNPs.This will automatically go into the directory called ‘MarineGenomicsData.’
cd
wget https://raw.githubusercontent.com/BayLab/MarineGenomicsData/main/week10.tar.gz
tar -xzvf week10.tar.gz
11.11 Run NCBI-BLAST
NCBI-BLAST is commonly called “blast” and stands for Basic Local Alignment Search Tool. This program is often a part of the annotation process as it compares your sequences (or query) to those contained in a database. As such blast results are highly dependent on the database in which you search for results and whether your species of interest (or a close relative) is present in that database. This program can be run in the command line, but in my testing I found it to be incredibly slow on jetstream. We’re going to run our file on the NCBI-BLAST web browser which should only take a few minutes.
Find the fasta file we just downloaded in our MarineGenomicsData/Week 10 folder on the desktop and open it (this may take a few seconds). Navigate to the web browser on jetstream (the default is firefox). Search for NCBI-Blast.
We’ll do a translated nucleotide to protein search (blastx). Copy and paste the nucleotide sequence of our first genomic region of interest. In the organism box, you can filter your search to only match with other sea cucumber sequences.
11.12 What did we find?
We got some hits! Many of those that found a hit in Blast were for the other sea cucumber species. Many of these sequences are listed as “uncharacterized” or “hypothetical” protein meaning that there has not yet been any downstream classification of these sequences. This is a pretty common result for marine organisms, as they are rarely as highly studied as model organisms such as Drosophila or zebra fish.
11.13 The importance of Genomic Architecture
Even though we often don’t get satisfying answers about which genes are doing what in Marine Genomics, we can learn a lot about trait evolution from using outlier detection tests. These tests give us some idea of the number of genes and their effect on a given trait and that in turn has big implications on how that trait can respond to evolution. And that in turn has implications for climate change, conservation, and biodiversity.