1887

Abstract

Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP.

  • This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000261
2019-05-17
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/5/5/mgen000261.html?itemId=/content/journal/mgen/10.1099/mgen.0.000261&mimeType=html&fmt=ahah

References

  1. Harris SR, Feil EJ, Holden MT, Quail MA, Nickerson EK et al. Evolution of MRSA during hospital transmission and intercontinental spread. Science 2010; 327:469–474 [View Article][PubMed]
    [Google Scholar]
  2. Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis 2013; 13:137–146 [View Article][PubMed]
    [Google Scholar]
  3. Harris SR, Cartwright EJ, Török ME, Holden MT, Brown NM et al. Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study. Lancet Infect Dis 2013; 13:130–136 [View Article][PubMed]
    [Google Scholar]
  4. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 2011; 27:2987–2993 [View Article][PubMed]
    [Google Scholar]
  5. Depristo MA, Banks E, Poplin R, Garimella KV, Maguire JR et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011; 43:491–498 [View Article][PubMed]
    [Google Scholar]
  6. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv 20121207.3907
    [Google Scholar]
  7. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 2012; 22:568–576 [View Article][PubMed]
    [Google Scholar]
  8. Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 2012; 44:226–232 [View Article][PubMed]
    [Google Scholar]
  9. David S, Rusniok C, Mentasti M, Gomez-Valero L, Harris SR et al. Multiple major disease-associated clones of Legionella pneumophila have emerged recently and independently. Genome Res 2016; 26:1555–1564 [View Article][PubMed]
    [Google Scholar]
  10. Moradigaravand D, Martin V, Peacock SJ, Parkhill J. Evolution and epidemiology of multidrug-resistant Klebsiella pneumoniae in the United Kingdom and Ireland. mBio 2017; 8:e0197616 [View Article][PubMed]
    [Google Scholar]
  11. Chu HY, Sprouffske K, Wagner A. The role of recombination in evolutionary adaptation of Escherichia coli to a novel nutrient. J Evol Biol 2017; 30:1692–1711 [View Article][PubMed]
    [Google Scholar]
  12. Coker OO, Chaiprasert A, Ngamphiw C, Tongsima S, Regmi SM et al. Genetic signatures of Mycobacterium tuberculosis Nonthaburi genotype revealed by whole genome analysis of isolates from tuberculous meningitis patients in Thailand. PeerJ 2016; 4:e1905 [View Article][PubMed]
    [Google Scholar]
  13. Cheah SE, Johnson MD, Zhu Y, Tsuji BT, Forrest A et al. Polymyxin resistance in Acinetobacter baumannii: genetic mutations and transcriptomic changes in response to clinically relevant dosage regimens. Sci Rep 2016; 6:26233 [View Article][PubMed]
    [Google Scholar]
  14. Zhu L, Olsen RJ, Nasser W, Beres SB, Vuopio J et al. A molecular trigger for intercontinental epidemics of group A Streptococcus. J Clin Invest 2015; 125:3545–3559 [View Article][PubMed]
    [Google Scholar]
  15. Seed KD, Yen M, Shapiro BJ, Hilaire IJ, Charles RC et al. Evolutionary consequences of intra-patient phage predation on microbial populations. Elife 2014; 3:e03497 [View Article][PubMed]
    [Google Scholar]
  16. Scanlan PD, Hall AR, Blackshields G, Friman VP, Davis MR et al. Coevolution with bacteriophages drives genome-wide host evolution and constrains the acquisition of abiotic-beneficial mutations. Mol Biol Evol 2015; 32:1425–1435 [View Article][PubMed]
    [Google Scholar]
  17. Carroll LM, Wiedmann M, den Bakker H, Siler J, Warchocki S et al. Whole-genome sequencing of drug-resistant Salmonella enterica isolates from dairy cattle and humans in New York and Washington states reveals source and geographic associations. Appl Environ Microbiol 2017; 83:AEM.00140–17 [View Article][PubMed]
    [Google Scholar]
  18. Stasiewicz MJ, Oliver HF, Wiedmann M, den Bakker HC. Whole-genome sequencing allows for improved identification of persistent Listeria monocytogenes in food-associated environments. Appl Environ Microbiol 2015; 81:6024–6037 [View Article][PubMed]
    [Google Scholar]
  19. Davis S, Pettengill JB, Luo Y, Payne J, Shpuntoff A et al. CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data. Peer J Comput Sci 2015; 1:e20 [View Article]
    [Google Scholar]
  20. Sahl JW, Lemmer D, Travis J, Schupp JM, Gillece JD et al. NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats. Microb Genom 2016; 2:8 [View Article][PubMed]
    [Google Scholar]
  21. PHE Bioinformatics Unit PHEnix. [Internet]. Available from https://github.com/phe-bioinformatics/PHEnix Cited 30 December 2018
  22. Seemann T. Snippy: fast bacterial variant calling from NGS reads [Internet]. Available from https://github.com/tseemann/snippy Cited 30 December 2018
  23. National Center for Biotechnology Information NCBI. [Internet]. Available from https://www.ncbi.nlm.nih.gov/ Cited 30 December 2018
  24. Hall BG. Simulating DNA coding sequence evolution with EvolveAGene 3. Mol Biol Evol 2008; 25:688–695 [View Article][PubMed]
    [Google Scholar]
  25. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. Versatile and open software for comparing large genomes. Genome Biol 2004; 5:R12 [View Article][PubMed]
    [Google Scholar]
  26. Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics 2012; 28:593–594 [View Article][PubMed]
    [Google Scholar]
  27. McTavish EJ, Pettengill J, Davis S, Rand H, Strain E et al. TreeToReads-a pipeline for simulating raw reads from phylogenies. BMC Bioinformatics 2017; 18:178 [View Article][PubMed]
    [Google Scholar]
  28. Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 2014; 24:1384–1395 [View Article][PubMed]
    [Google Scholar]
  29. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv 20131303–3997
    [Google Scholar]
  30. Broad Institute Picard [Internet]. Available from http://broadinstitute.github.io/picard/ cited 13 July 2018
  31. Lamelas A, Harris SR, Röltgen K, Dangy JP, Hauser J et al. Emergence of a new epidemic Neisseria meningitidis serogroup A clone in the African meningitis belt: high-resolution picture of genomic changes that mediate immune evasion. mBio 2014; 5:e0197414 [View Article][PubMed]
    [Google Scholar]
  32. Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 2015; 43:e15e15 [View Article][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000261
Loading
/content/journal/mgen/10.1099/mgen.0.000261
Loading

Data & Media loading...

Supplements

Supplementary data

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error