1887

Abstract

doi: 10.1099/mgen.0.000122.001.

As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows–Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. Plasmodium falciparum or Brugia malayi) and one minority member (i.e. human or the Wolbachia endosymbiont wBm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In Plasmodium, at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the Plasmodium genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined Plasmodium–human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000122
2017-07-08
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/3/9/mgen000122.html?itemId=/content/journal/mgen/10.1099/mgen.0.000122&mimeType=html&fmt=ahah

References

  1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25:1754–1760 [View Article][PubMed]
    [Google Scholar]
  2. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013; 3:13033997
    [Google Scholar]
  3. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM et al. A map of human genome variation from population-scale sequencing. Nature 2010; 467:1061–1073 [View Article][PubMed]
    [Google Scholar]
  4. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM et al. An integrated map of genetic variation from 1,092 human genomes. Nature 2012; 491:56–65 [View Article][PubMed]
    [Google Scholar]
  5. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM et al. A global reference for human genetic variation. Nature 2015; 526:68–74 [View Article][PubMed]
    [Google Scholar]
  6. Cancer Genome Atlas Research Network Integrated genomic analyses of ovarian carcinoma. Nature 2011; 474:609–615 [View Article][PubMed]
    [Google Scholar]
  7. Cancer Genome Atlas Research Network Comprehensive molecular characterization of gastric adenocarcinoma. Nature 2014; 513:202–209 [View Article][PubMed]
    [Google Scholar]
  8. Abeshouse A, Ahn J, Akbani R, Ally A, Amin S et al. The molecular taxonomy of primary prostate cancer. Cell 2015; 163:1011–1025 [View Article][PubMed]
    [Google Scholar]
  9. Cancer Genome Atlas Research Network Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 2013; 499:43–49 [View Article][PubMed]
    [Google Scholar]
  10. Linehan WM, Spellman PT, Ricketts CJ, Creighton CJ, Fei SS et al. Comprehensive molecular characterization of papillary renal-cell carcinoma. N Engl J Med 2016; 374:135–145 [View Article][PubMed]
    [Google Scholar]
  11. Davis CF, Ricketts CJ, Wang M, Yang L, Cherniack AD et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 2014; 26:319–330 [View Article][PubMed]
    [Google Scholar]
  12. Akbani R, Akdemir Kadir C, Aksoy BA, Albert M, Ally A et al. Genomic classification of cutaneous melanoma. Cell 2015; 161:1681–1696 [View Article][PubMed]
    [Google Scholar]
  13. Cancer Genome Atlas Network Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 2015; 517:576–582 [View Article][PubMed]
    [Google Scholar]
  14. Cancer Genome Atlas Research Network Comprehensive genomic characterization of squamous cell lung cancers. Nature 2012; 489:519–525 [View Article][PubMed]
    [Google Scholar]
  15. Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma. Nature 2014; 511:543–550 [View Article][PubMed]
    [Google Scholar]
  16. Agrawal N, Akbani R, Aksoy BA, Ally A, Arachchi H et al. Integrated genomic characterization of papillary thyroid carcinoma. Cell 2014; 159:676–690 [View Article][PubMed]
    [Google Scholar]
  17. Cancer Genome Atlas Research Network Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 2014; 507:315–322 [View Article][PubMed]
    [Google Scholar]
  18. Ley TJ, Miller C, Ding L, Raphael BJ, Mungall AJ et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med 2013; 368:2059–2074 [View Article][PubMed]
    [Google Scholar]
  19. Cancer Genome Atlas Network Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012; 487:330–337 [View Article][PubMed]
    [Google Scholar]
  20. Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature 2012; 490:61–70 [View Article][PubMed]
    [Google Scholar]
  21. Kandoth C, Schultz N, Cherniack AD, Akbani R, Liu Y et al. Integrated genomic characterization of endometrial carcinoma. Nature 2013; 497:67–73 [View Article][PubMed]
    [Google Scholar]
  22. Brennan CW, Verhaak RG, McKenna A, Campos B, Noushmehr H et al. The somatic genomic landscape of glioblastoma. Cell 2013; 155:462–477 [View Article][PubMed]
    [Google Scholar]
  23. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A et al. An integrated map of structural variation in 2,504 human genomes. Nature 2015; 526:75–81 [View Article][PubMed]
    [Google Scholar]
  24. Riley DR, Sieber KB, Robinson KM, White JR, Ganesan A et al. Bacteria-human somatic cell lateral gene transfer is enriched in cancer samples. PLoS Comput Biol 2013; 9:e1003107 [View Article][PubMed]
    [Google Scholar]
  25. Kumar N, Lin M, Zhao X, Ott S, Santana-Cruz I et al. Efficient enrichment of bacterial mRNA from host-bacteria total RNA samples. Sci Rep 2016; 6:34850 [View Article][PubMed]
    [Google Scholar]
  26. Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J et al. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature 2012; 487:375–379 [View Article][PubMed]
    [Google Scholar]
  27. Oyola SO, Manske M, Campino S, Claessens A, Hamilton WL et al. Optimized whole-genome amplification strategy for extremely AT-biased template. DNA Res 2014; 21:661–671 [View Article][PubMed]
    [Google Scholar]
  28. Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics 2012; 28:593–594 [View Article][PubMed]
    [Google Scholar]
  29. Ghedin E, Wang S, Spiro D, Caler E, Zhao Q et al. Draft genome of the filarial nematode parasite Brugia malayi . Science 2007; 317:1756–1760 [View Article][PubMed]
    [Google Scholar]
  30. Foster J, Ganatra M, Kamal I, Ware J, Makarova K et al. The Wolbachia genome of Brugia malayi: endosymbiont evolution within a human pathogenic nematode. PLoS Biol 2005; 3:e121 [View Article][PubMed]
    [Google Scholar]
  31. Gardner MJ, Hall N, Fung E, White O, Berriman M et al. Genome sequence of the human malaria parasite Plasmodium falciparum . Nature 2002; 419:498–511 [View Article][PubMed]
    [Google Scholar]
  32. Ioannidis P, Johnston KL, Riley DR, Kumar N, White JR et al. Extensively duplicated and transcriptionally active recent lateral gene transfer from a bacterial Wolbachia endosymbiont to its host filarial nematode Brugia malayi . BMC Genomics 2013; 14:639 [View Article][PubMed]
    [Google Scholar]
  33. Human Microbiome Project Consortium Structure, function and diversity of the healthy human microbiome. Nature 2012; 486:207–214 [View Article][PubMed]
    [Google Scholar]
  34. Human Microbiome Project Consortium A framework for human microbiome research. Nature 2012; 486:215–221 [View Article][PubMed]
    [Google Scholar]
  35. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Meth 2015; 12:357–360 [View Article][PubMed]
    [Google Scholar]
  36. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013; 29:15–21 [View Article][PubMed]
    [Google Scholar]
  37. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009; 10:R25 [View Article][PubMed]
    [Google Scholar]
  38. Lee WP, Stromberg MP, Ward A, Stewart C, Garrison EP et al. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 2014; 9:e90581 [View Article][PubMed]
    [Google Scholar]
  39. Humphrys MS, Creasy T, Sun Y, Shetty AC, Chibucos MC et al. Simultaneous transcriptional profiling of bacteria and their host cells. PLoS One 2013; 8:e80597 [View Article][PubMed]
    [Google Scholar]
  40. Vannucci FA, Foster DN, Gebhart CJ. Laser microdissection coupled with RNA-seq analysis of porcine enterocytes infected with an obligate intracellular pathogen (Lawsonia intracellularis). BMC Genomics 2013; 14:421 [View Article][PubMed]
    [Google Scholar]
  41. Rienksma RA, Suarez-Diez M, Mollenkopf HJ, Dolganov GM, Dorhoi A et al. Comprehensive insights into transcriptional adaptation of intracellular mycobacteria by microbe-enriched dual RNA sequencing. BMC Genomics 2015; 16:34 [View Article][PubMed]
    [Google Scholar]
  42. Avraham R, Haseley N, Brown D, Penaranda C, Jijon HB et al. Pathogen cell-to-cell variability drives heterogeneity in host immune responses. Cell 2015; 162:1309–1321 [View Article][PubMed]
    [Google Scholar]
  43. Mavromatis CH, Bokil NJ, Totsika M, Kakkanat A, Schaale K et al. The co-transcriptome of uropathogenic Escherichia coli-infected mouse macrophages reveals new insights into host-pathogen interactions. Cell Microbiol 2015; 17:730–746 [View Article][PubMed]
    [Google Scholar]
  44. Aprianto R, Slager J, Holsappel S, Veening JW. Time-resolved dual RNA-seq reveals extensive rewiring of lung epithelial and pneumococcal transcriptomes during early infection. Genome Biol 2016; 17:198 [View Article][PubMed]
    [Google Scholar]
  45. Westermann AJ, Förstner KU, Amman F, Barquist L, Chao Y et al. Dual RNA-seq unveils noncoding RNA functions in host-pathogen interactions. Nature 2016; 529:496–501 [View Article][PubMed]
    [Google Scholar]
  46. Baddal B, Muzzi A, Censini S, Calogero RA, Torricelli G et al. Dual RNA-seq of nontypeable Haemophilus influenzae and host cell transcriptomes reveals novel insights into host-pathogen cross talk. MBio 2015; 6:e01765-15 [View Article][PubMed]
    [Google Scholar]
  47. Tierney L, Linde J, Müller S, Brunke S, Molina JC et al. An interspecies regulatory Network inferred from simultaneous RNA-seq of Candida albicans invading innate immune cells. Front Microbiol 2012; 3:85 [View Article][PubMed]
    [Google Scholar]
  48. Chibucos MC, Soliman S, Gebremariam T, Lee H, Daugherty S et al. An integrated genomic and transcriptomic survey of mucormycosis-causing fungi. Nat Commun 2016; 7:12218 [View Article][PubMed]
    [Google Scholar]
  49. Bruno VM, Shetty AC, Yano J, Fidel PL, Noverr MC et al. Transcriptomic analysis of vulvovaginal candidiasis identifies a role for the NLRP3 inflammasome. MBio 2015; 6:e00182-15 [View Article][PubMed]
    [Google Scholar]
  50. Liu Y, Shetty AC, Schwartz JA, Bradford LL, Xu W et al. New signaling pathways govern the host response to C. albicans infection in various niches. Genome Res 2015; 25:679–689 [View Article][PubMed]
    [Google Scholar]
  51. National Center for Biotechnology Information Sequence Read Archive Overview [updated January 10 2017]; 2017 https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=announcement
  52. Kostic AD, Ojesina AI, Pedamallu CS, Jung J, Verhaak RG et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat Biotechnol 2011; 29:393–396 [View Article][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000122
Loading
/content/journal/mgen/10.1099/mgen.0.000122
Loading

Data & Media loading...

Supplements

Supplementary File 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error