Occupancy modeling, maximum contig size probabilities and designing metagenomics experiments

Stephen A. Stanhope
University of Chicago, Biological Sciences Division
June 14, 2010
sstanhop@bsd.uchicago.edu

These R codes accompany "Occupancy modeling, maximum contig size probabilities and designing metagenomics experiments," Stephen A. Stanhope (2010). They enable the user to replicate most of the results presented in the paper and perform experimental designs described therein. Their intended use is to be edited for the desired number of genomes, genome length, number of reads, etc., and then called via "source" command in an R terminal or from "Rscript" on the command line. They are to be considered research codes and anticipate reasonable proficiency in R. Please do contact the author with any questions.

Included are the following:

maximum_contig_length_simulation.R - This code performs simulations of genome assemblies and related Wendl and expected overlap tiling discretizations, and reports distributions of maximum contig sizes over a number of iterations. Additionally, it produces the Poisson approximation of the maximum contig size. These results are described in "Largest contig size probabilities for a single genome."

design_single_novel.R - This code computes experimental designs as described in the "Detecting a single novel species in a pool of known species" subsection.

design_fixed_pool.R - This code computes experimental designs described in "Obtaining contigs representative of a pool of species."

design_fixed_pool_distributed_size_abundance.R - This code computes experimental designs described in "Fixed pool sizes with distributed genome sizes and abundances."

design_stochastic_pool_distributed_size_abundance.R - This code computes experimental designs described in "Stochastic pools with distributed genome sizes and abundances."

metagenome_assembly_simulation.R - This code is used to simulate whole metagenome assemblies. It is used to verify experimental designs computed in "Stochastic pools with distributed genome sizes and abundances."

metagenomics_experimental_design.tar.gz