1887

Abstract

Whole-genome sequencing (WGS) of bacterial isolates has become standard practice in many laboratories. Applications for WGS analysis include phylogeography and molecular epidemiology, using single nucleotide polymorphisms (SNPs) as the unit of evolution. NASP was developed as a reproducible method that scales well with the hundreds to thousands of WGS data typically used in comparative genomics applications. In this study, we demonstrate how NASP compares with other tools in the analysis of two real bacterial genomics datasets and one simulated dataset. Our results demonstrate that NASP produces similar, and often better, results in comparison with other pipelines, but is much more flexible in terms of data input types, job management systems, diversity of supported tools and output formats. We also demonstrate differences in results based on the choice of the reference genome and choice of inferring phylogenies from concatenated SNPs or alignments including monomorphic positions. NASP represents a source-available, version-controlled, unit-tested method and can be obtained from tgennorth.github.io/NASP.

Keyword(s): bioinformatics , Phylogeography and SNPs
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000074
2016-08-25
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/2/8/mgen000074.html?itemId=/content/journal/mgen/10.1099/mgen.0.000074&mimeType=html&fmt=ahah

References

  1. Aberer A. J., Kobert K., Stamatakis A. 2014; ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol 31:2553–2556 [View Article][PubMed]
    [Google Scholar]
  2. Angiuoli S. V., Salzberg S. L. 2011; Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 27:334–342 [View Article][PubMed]
    [Google Scholar]
  3. Bertels F., Silander O. K., Pachkov M., Rainey P. B., van Nimwegen E. 2014; Automated reconstruction of whole-genome phylogenies from short-sequence reads. Mol Biol Evol 31:1077–1088 [View Article][PubMed]
    [Google Scholar]
  4. Blattner F. R., Plunkett G., Bloch C. A., Perna N. T., Burland V., Riley M., Collado-Vides J., Rode C. K., Rode C. K. et al. 1997; The complete genome sequence of Escherichia coli K-12. Science 277:1453–1462 [View Article][PubMed]
    [Google Scholar]
  5. Bolger A. M., Lohse M., Usadel B. 2014; Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120 [View Article][PubMed]
    [Google Scholar]
  6. Bowers J. R., Kitchel B., Driebe E. M., MacCannell D. R., Roe C., Lemmer D., de Man T., Rasheed J. K., Engelthaler D. M. et al. 2015; Genomic analysis of the emergence and rapid global dissemination of the clonal group 258 Klebsiella pneumoniae pandemic. PLoS One 10:e0133727 [View Article][PubMed]
    [Google Scholar]
  7. Cingolani P., Platts A., Wang le L., Coon M., Nguyen T., Wang L., Land S. J., Lu X., Ruden D. M., Le W. 2012; A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6:80–92 [View Article][PubMed]
    [Google Scholar]
  8. Cui Y., Yu C., Yan Y., Li D., Li Y., Jombart T., Weinert L. A., Wang Z., Guo Z. et al. 2013; Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis. Proc Natl Acad Sci U S A 110:577–582 [View Article][PubMed]
    [Google Scholar]
  9. Delcher A. L., Salzberg S. L., Phillippy A. M. 2003; Using MUMmer to identify similar regions in large sequence sets. Curr Protoc Bioinformatic Chapter 10:Unit 10.3
    [Google Scholar]
  10. den Bakker H. C., Allard M. W., Bopp D., Brown E. W., Fontana J., Iqbal Z., Kinney A., Limberger R., Musser K. A. et al. 2014; Rapid whole-genome sequencing for surveillance of Salmonella enterica serovar enteritidis. Emerg Infect Dis 20:1306–1314 [View Article][PubMed]
    [Google Scholar]
  11. DePristo M. A., Banks E., Poplin R., Garimella K. V., Maguire J. R., Hartl C., Philippakis A. A., del Angel G., Rivas M. A. et al. 2011; A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498 [View Article][PubMed]
    [Google Scholar]
  12. Dykhuizen D. E., Green L. 1991; Recombination in Escherichia coli and the definition of biological species. J Bacteriol 173:7257–7268[PubMed]
    [Google Scholar]
  13. Engelthaler D. M., Hicks N. D., Gillece J. D., Roe C. C., Schupp J. M., Driebe E. M., Gilgado F., Carriconde F., Trilles L. et al. 2014; Cryptococcus gattii in North American Pacific Northwest: whole-population genome analysis provides insights into species evolution and dispersal. MBio 5:e01464-1414 [View Article][PubMed]
    [Google Scholar]
  14. Engelthaler D. M., Valentine M., Bowers J., Pistole J., Driebe E. M., Terriquez J., Nienstadt L., Carroll M., Schumacher M. et al. 2016; Hypervirulent emm59 clone in invasive group A Streptococcus outbreak, southwestern United States. Emerg Infect Dis 22:734–738 [View Article][PubMed]
    [Google Scholar]
  15. Eppinger M., Mammel M. K., Leclerc J. E., Ravel J., Cebula T. A. 2011; Genomic anatomy of Escherichia coli O157:H7 outbreaks. Proc Natl Acad Sci U S A 108:20142–20147 [View Article][PubMed]
    [Google Scholar]
  16. Etienne K. A., Roe C. C., Smith R. M., Vallabhaneni S., Duarte C., Escadon P., Castaneda E., Gomez B. L., de Bedout C. et al. 2016; Whole-genome sequencing to determine origin of multinational outbreak of Sarocladium kiliense bloodstream infections. Emerg Infect Dis 22:476–481 [View Article][PubMed]
    [Google Scholar]
  17. Felsenstein J. 2005 PHYLIP (Phylogeny Inference Package) Version 3.6, 3.6 ed. University of Washington, Seattle: Department of Genome Sciences;
    [Google Scholar]
  18. Foster J. T., Beckstrom-Sternberg S. M., Pearson T., Beckstrom-Sternberg J. S., Chain P. S., Roberto F. F., Hnath J., Brettin T., Keim P. 2009; Whole-genome-based phylogeny and divergence of the genus Brucella. J Bacteriol 191:2864–2870 [View Article][PubMed]
    [Google Scholar]
  19. Gardner S. N., Slezak T. 2010; Scalable SNP analyses of 100+ bacterial or viral genomes. J Forensic Res 01:107 [View Article]
    [Google Scholar]
  20. Gardner S. N., Hall B. G. 2013; When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes. PLoS One 8:e81760 [View Article][PubMed]
    [Google Scholar]
  21. Gardner S. N., Slezak T., Hall B. G. 2015; kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics 31:2877–2878 [View Article][PubMed]
    [Google Scholar]
  22. Hsu L. Y., Harris S. R., Chlebowicz M. A., Lindsay J. A., Koh T. H., Krishnan P., Tan T. Y., Hon P. Y., Grubb W. B. et al. 2015; Evolutionary dynamics of methicillin-resistant Staphylococcus aureus within a healthcare system. Genome Biol 16:81 [View Article][PubMed]
    [Google Scholar]
  23. Huang W., Li L., Myers J. R., Marth G. T. 2012; ART: a next-generation sequencing read simulator. Bioinformatics 28:593–594 [View Article][PubMed]
    [Google Scholar]
  24. Katz L. S., Petkau A., Beaulaurier J., Tyler S., Antonova E. S., Turnsek M. A., Guo Y., Wang S., Paxinos E. E. et al. 2013; Evolutionary dynamics of Vibrio cholerae O1 following a single-source introduction to Haiti. MBio 4:e00398-13 [View Article][PubMed]
    [Google Scholar]
  25. Keim P. S., Wagner D. M. 2009; Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases. Nat Rev Microbiol 7:813–821 [View Article][PubMed]
    [Google Scholar]
  26. Koboldt D. C., Zhang Q., Larson D. E., Shen D., McLellan M. D., Lin L., Miller C. A., Mardis E. R., Ding L., Wilson R. K. 2012; VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22:568–576 [View Article][PubMed]
    [Google Scholar]
  27. Langmead B., Salzberg S. L. 2012; Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359 [View Article][PubMed]
    [Google Scholar]
  28. Leaché A. D., Banbury B. L., Felsenstein J., de Oca A. N., Stamatakis A. 2015; Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies. Syst Biol 64:1032–1047 [View Article][PubMed]
    [Google Scholar]
  29. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. 1000 Genome Project Data Processing Subgroup 2009; The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079 [View Article][PubMed]
    [Google Scholar]
  30. Li H. 2013; Aligning sequence reads, clone sequences and assembly contigs with Bwa-Mem. arXiv.org:1303.3997 [Q-bio.Gn]
    [Google Scholar]
  31. McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S. et al. 2010; The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303 [View Article][PubMed]
    [Google Scholar]
  32. Nye T. M., Liò P., Gilks W. R. 2006; A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics 22:117–119 [View Article][PubMed]
    [Google Scholar]
  33. Olson N. D., Lund S. P., Colman R. E., Foster J. T., Sahl J. W., Schupp J. M., Keim P., Morrow J. B., Salit M. L., Zook J. M. 2015; Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 6:235 [View Article][PubMed]
    [Google Scholar]
  34. Parkhill J., Wren B. W., Thomson N. R., Titball R. W., Holden M. T., Prentice M. B., Sebaihia M., James K. D., Churcher C. et al. 2001; Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413:523–527 [View Article][PubMed]
    [Google Scholar]
  35. Pettengill J. B., Luo Y., Davis S., Chen Y., Gonzalez-Escalona N., Ottesen A., Rand H., Allard M. W., Strain E. 2014; An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella. PeerJ 2:e620 [View Article][PubMed]
    [Google Scholar]
  36. Price M. N., Dehal P. S., Arkin A. P. 2010; FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490 [View Article][PubMed]
    [Google Scholar]
  37. Rasko D. A., Webster D. R., Sahl J. W., Bashir A., Boisen N., Scheutz F., Paxinos E. E., Sebra R., Chin C. S. et al. 2011; Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med 365:709–717 [View Article][PubMed]
    [Google Scholar]
  38. Rentería M. E., Cortes A., Medland S. E. 2013; Using PLINK for Genome-Wide Association Studies (GWAS) and data analysis. Methods Mol Biol 1019:193–213 [View Article][PubMed]
    [Google Scholar]
  39. Sahl J. W., Steinsland H., Redman J. C., Angiuoli S. V., Nataro J. P., Sommerfelt H., Rasko D. A. 2011; A comparative genomic analysis of diverse clonal types of enterotoxigenic Escherichia coli reveals pathovar-specific conservation. Infect Immun 79:950–960 [View Article][PubMed]
    [Google Scholar]
  40. Sahl J. W., Beckstrom-Sternberg S. M., Babic-Sternberg J., Gillece J. D., Hepp C. M., Auerbach R. K., Tembe W., Wagner D. M., Keim P. S., Pearson T. 2015a; The In Silico Genotyper (ISG): an open-source pipeline to rapidly identify and annotate nucleotide variants for comparative genomics applications. bioRxiv 015578:
    [Google Scholar]
  41. Sahl J. W., Morris C. R., Emberger J., Fraser C. M., Ochieng J. B., Juma J., Fields B., Breiman R. F., Gilmour M. et al. 2015b; Defining the phylogenomics of Shigella species: a pathway to diagnostics. J Clin Microbiol 53:951–960 [View Article]
    [Google Scholar]
  42. Sahl J. W., Schupp J. M., Rasko D. A., Colman R. E., Foster J. T., Keim P. 2015c; Phylogenetically typing bacterial strains from partial SNP genotypes observed from direct sequencing of clinical specimen metagenomic data. Genome Medicine 7:52 [View Article]
    [Google Scholar]
  43. Sahl J. W., Sistrunk J. R., Fraser C. M., Hine E., Baby N., Begum Y., Luo Q., Sheikh A., Qadri F. et al. 2015d; Examination of the enterotoxigenic Escherichia coli population structure during human infection. mBio 6:e00501-15 [View Article]
    [Google Scholar]
  44. Sarovich D. S., Price E. P. 2014; SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets. BMC Res Notes 7:618 [View Article][PubMed]
    [Google Scholar]
  45. Stamatakis A. 2014; RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313 [View Article][PubMed]
    [Google Scholar]
  46. Touchon M., Hoede C., Tenaillon O., Barbe V., Baeriswyl S., Bidet P., Bingen E., Bonacorsi S., Bouchier C. et al. 2009; Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5:e1000344 [View Article][PubMed]
    [Google Scholar]
  47. Treangen T. J., Ondov B. D., Koren S., Phillippy A. M. 2014; The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol 15:524 [View Article][PubMed]
    [Google Scholar]
  48. Zaharia M., Bolosky W. J., Curtis K., Fox A., Patterson D., Shenker S., Stoica I., Karp R. M., Sittler T. 2011; Faster and more accurate sequence alignment with Snap. arXiv.org: arXiv.1111.5572 [Cs.Ds]
    [Google Scholar]
  49. Cui, Y. Sequence Read Archive. SRA010790 (2013)
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000074
Loading
/content/journal/mgen/10.1099/mgen.0.000074
Loading

Data & Media loading...

Supplements

Supplementary File 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error