Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

Nicola De Maio; Liam P. Shaw; Alasdair Hubbard; Sophie George; Nicholas D. Sanderson; Jeremy Swann; Ryan Wick; Manal AbuOun; Emma Stubberfield; Sarah J. Hoosdally; Derrick W. Crook; Timothy E. A. Peto; Anna E. Sheppard; Mark J. Bailey; Daniel S. Read; Muna F. Anjum; A. Sarah Walker; Nicole Stoesser; on behalf of the REHAB consortium

doi:10.1099/mgen.0.000294

Volume 5, Issue 9

Research Article

Open Access

Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

Nicola De Maio^1,†, Liam P. Shaw^1,†, Alasdair Hubbard², Sophie George^1,3, Nicholas D. Sanderson¹, Jeremy Swann¹, Ryan Wick⁴, Manal AbuOun⁵, Emma Stubberfield⁵, Sarah J. Hoosdally¹, Derrick W. Crook^1,3, Timothy E. A. Peto^1,3, Anna E. Sheppard^1,3, Mark J. Bailey⁶, Daniel S. Read⁶, Muna F. Anjum⁵, A. Sarah Walker^1,3, Nicole Stoesser¹ and on behalf of the REHAB consortium
View Affiliations Hide Affiliations

Affiliations: ¹ Nuffield Department of Medicine, University of Oxford, Oxford, UK ² Department of Tropical Disease Biology, Liverpool School of Tropical Medicine, Liverpool, L3 5QA, UK ³ NIHR HPRU Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK ⁴ Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Melbourne, Australia ⁵ Department of Bacteriology, Animal and Plant Health Agency, Addlestone, Surrey, KT15 3NB, UK ⁶ Centre for Ecology & Hydrology, Benson Lane, Crowmarsh Gifford, Wallingford, OX10 8BB, UK

† These authors contributed equally to this work
Published: 01 September 2019 https://doi.org/10.1099/mgen.0.000294

Abstract

Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long-read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods affect hybrid assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the family Enterobacteriaceae, as these frequently have highly plastic, repetitive genetic structures, and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies, as well as comparing to long-read-only assembly with Flye followed by short-read polishing with Pilon. Hybrid assembly with either PacBio or ONT reads facilitated high-quality genome reconstruction, and was superior to the long-read assembly and polishing approach evaluated with respect to accuracy and completeness. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.

Received: 31/01/2019
Accepted: 19/08/2019
Published Online: 01/09/2019

Keyword(s): bacterial genomics , Enterobacteriaceae , hybrid assembly , long-read sequencing and plasmid assembly

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000294

2019-09-01

2024-04-26

Full text loading...

/deliver/fulltext/mgen/5/9/mgen000294.html?itemId=/content/journal/mgen/10.1099/mgen.0.000294&mimeType=html&fmt=ahah

References

Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW. Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 2012; 13:601–612 [View Article]
[Google Scholar]
Bradley P, Gordon NC, Walker TM, Dunn L, Heys S et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat Commun 2015; 6:10063 [View Article]
[Google Scholar]
Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ. Within-Host evolution of bacterial pathogens. Nat Rev Microbiol 2016; 14:150–162 [View Article]
[Google Scholar]
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010; 464:59–65 [View Article]
[Google Scholar]
George S, Pankhurst L, Hubbard A, Votintseva A, Stoesser N et al. Resolving plasmid structures in Enterobacteriaceae using the MinION nanopore sequencer: assessment of MinION and MinION/Illumina hybrid data assembly approaches. Microb Genom 2017; 3:e000118 [View Article]
[Google Scholar]
Logan LK, Weinstein RA. The epidemiology of carbapenem-resistant Enterobacteriaceae: the impact and evolution of a global menace. J Infect Dis 2017; 215:S28–S36 [View Article]
[Google Scholar]
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 2017; 27:722–736 [View Article]
[Google Scholar]
Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 2015; 12:733–735 [View Article]
[Google Scholar]
Rhoads A, Au KF. Pacbio sequencing and its applications. Genomics Proteomics Bioinformatics 2015; 13:278–289 [View Article]
[Google Scholar]
Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 2016; 17:333–351 [View Article]
[Google Scholar]
Rang FJ, Kloosterman WP, de Ridder J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol 2018; 19:90 [View Article]
[Google Scholar]
Risse J, Thomson M, Patrick S, Blakely G, Koutsovoulos G et al. A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data. Gigascience 2015; 4:60 [View Article]
[Google Scholar]
Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 13:e1005595 [View Article]
[Google Scholar]
Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom 2017; 3:e000132 [View Article]
[Google Scholar]
Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E et al. Real-Time, portable genome sequencing for Ebola surveillance. Nature 2016; 530:228–232 [View Article]
[Google Scholar]
Bayliss SC, Hunt VL, Yokoyama M, Thorpe HA, Feil EJ. The use of Oxford nanopore native barcoding for complete genome assembly. Gigascience 2017; 6:1–6 [View Article]
[Google Scholar]
Dilthey A, Meyer SA, Kaasch AJ. Increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing. bioRxiv 2019; 680827:
[Google Scholar]
Wellcome Sanger Institute NCTC 3000 Project. https://www.sanger.ac.uk/resources/downloads/bacteria/nctc/
Carattoli A. Resistance plasmid families in Enterobacteriaceae. Antimicrob Agents Chemother 2009; 53:2227–2238 [View Article]
[Google Scholar]
Lamble S, Batty E, Attar M, Buck D, Bowden R et al. Improved workflows for high throughput library preparation using the transposome-based Nextera system. BMC Biotechnol 2013; 13:104 [View Article]
[Google Scholar]
De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 2018; 34:2666–2669 [View Article]
[Google Scholar]
Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 2015; 31:3350–3352 [View Article]
[Google Scholar]
Thorvaldsdóttir H, Robinson JT, Mesirov JP, Viewer IG. Integrative genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 2013; 14:178–192 [View Article]
[Google Scholar]
Wick R, Holt K. 2019; Benchmarking of long-read assembly tools for bacterial whole genomes. https://github.com/rrwick/Long-read-assembler-comparison
Kolmogorov M, Yuan J, Lin Y, Pevzner PA, long Aof. Error-Prone reads using repeat graphs. Nature Biotechnology 2019; 37:540
[Google Scholar]
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 2014; 9:e112963 [View Article]
[Google Scholar]
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 2015; 25:1043–1055 [View Article]
[Google Scholar]
Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 2010; 11:538 [View Article]
[Google Scholar]
Watson M. 2018; A simple test for uncorrected insertions and deletions (indels) in bacterial genomes. http://www.opiniomics.org/a-simple-test-for-uncorrected-insertions-and-deletions-indels-in-bacterial-genomes/
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 30:2068–2069 [View Article]
[Google Scholar]
Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods 2015; 12:59–60 [View Article]
[Google Scholar]
Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015; 31:3691–3693 [View Article]
[Google Scholar]
Clark SC, Egan R, Frazier PI, Wang Z. Ale: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics 2013; 29:435–443 [View Article]
[Google Scholar]
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9:357–359 [View Article]
[Google Scholar]
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. Versatile and open software for comparing large genomes. Genome Biol 2004; 5:R12 [View Article]
[Google Scholar]
Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol 2013; 14:R47 [View Article]
[Google Scholar]
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018; 34:3094–3100 [View Article]
[Google Scholar]
Rabsch W, Helm RA, Eisenstark A. Diversity of phage types among archived cultures of the Demerec collection of Salmonella enterica serovar typhimurium strains. Appl Environ Microbiol 2004; 70:664–669 [View Article]
[Google Scholar]

http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000294

Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

M Gen 5, e000294 (2019); https://doi.org/10.1099/mgen.0.000294

/content/journal/mgen/10.1099/mgen.0.000294

Volume 5, Issue 9

Research Article

Open Access

Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

Abstract

Supplementary material 1

Supplementary material 2

Most read this month

Most cited Most Cited RSS feed

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification

ResFinder – an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes

MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies

SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

Completing bacterial genome assemblies with multiplex MinION sequencing

ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping

Identification of Klebsiella capsule synthesis loci from whole genome data

Emergence, molecular mechanisms and global spread of carbapenem-resistant Acinetobacter baumannii

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

Microreact: visualizing and sharing data for genomic epidemiology and phylogeography