scholarly journals Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing

BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 685 ◽  
Author(s):  
Yu Liu ◽  
Mehmet Koyutürk ◽  
Sean Maxwell ◽  
Min Xiang ◽  
Martina Veigl ◽  
...  
2011 ◽  
Vol 32 (6) ◽  
pp. E2246-E2258 ◽  
Author(s):  
Paola Benaglio ◽  
Terri L. McGee ◽  
Leonardo P. Capelli ◽  
Shyana Harper ◽  
Eliot L. Berson ◽  
...  

2015 ◽  
Vol 60 (3) ◽  
pp. 1249-1257 ◽  
Author(s):  
Hajime Kanamori ◽  
Christian M. Parobek ◽  
David J. Weber ◽  
David van Duin ◽  
William A. Rutala ◽  
...  

Next-generation sequencing (NGS) analysis has emerged as a promising molecular epidemiological method for investigating health care-associated outbreaks. Here, we used NGS to investigate a 3-year outbreak of multidrug-resistantAcinetobacter baumannii(MDRAB) at a large academic burn center. A reference genome from the index case was generated usingde novoassembly of PacBio reads. Forty-six MDRAB isolates were analyzed by pulsed-field gel electrophoresis (PFGE) and sequenced using an Illumina platform. After mapping to the index case reference genome, four samples were excluded due to low coverage, leaving 42 samples for further analysis. Multilocus sequence types (MLST) and the presence of acquired resistance genes were also determined from the sequencing data. A transmission network was inferred from genomic and epidemiological data using a Bayesian framework. Based on single-nucleotide variant (SNV) differences, this MDRAB outbreak represented three sequential outbreaks caused by distinct clones. The first and second outbreaks were caused by sequence type 2 (ST2), while the third outbreak was caused by ST79. For the second outbreak, the MLST and PFGE results were discordant. However, NGS-based SNV typing detected a recombination event and consequently enabled a more accurate phylogenetic analysis. The distribution of resistance genes varied among the three outbreaks. The first- and second-outbreak strains possessed ablaOXA-23-likegroup, while the third-outbreak strains harbored ablaOXA-40-likegroup. NGS-based analysis demonstrated the superior resolution of outbreak transmission networks for MDRAB and provided insight into the mechanisms of strain diversification between sequential outbreaks through recombination.


2009 ◽  
Vol 70 ◽  
pp. S107
Author(s):  
Martha B. Ladner ◽  
Gordon Bentley ◽  
Damian Goodridge ◽  
Henry A. Erlich ◽  
Elizabeth Trachtenberg

PLoS ONE ◽  
2011 ◽  
Vol 6 (1) ◽  
pp. e15292 ◽  
Author(s):  
Quan Long ◽  
Daniel C. Jeffares ◽  
Qingrun Zhang ◽  
Kai Ye ◽  
Viktoria Nizhynska ◽  
...  

2014 ◽  
Vol 15 (12) ◽  
Author(s):  
Seyed Yahya Anvar ◽  
Lusine Khachatryan ◽  
Martijn Vermaat ◽  
Michiel van Galen ◽  
Irina Pulyakhina ◽  
...  

2020 ◽  
Vol 32 (2) ◽  
pp. 163
Author(s):  
M. Okada ◽  
Y. Nagai ◽  
S. Matoba ◽  
Y. Sakuraba ◽  
S. Sugimura

It has been suggested that bovine IVF embryos have a higher frequency of occurrence of chromosomal abnormalities than invivo-fertilised embryos, which may explain low pregnancy success, but the details have not been clarified (Yao et al. 2018 Sci. Rep. 8, 7460). In this study, chromosomal aneuploidy in blastocysts of bovine IVF and invivo-fertilised was analysed by copy number variations (CNVs) based on next-generation sequencing. The IVF bovine embryos were cultured in well of-the-well culture dishes (LinKID micro25: Dai Nippon Printing) containing 125µL of CR1aa supplemented with 5% calf serum at 38.5°C in 5% O2 and 5% CO2 for 8 days after insemination. Invitro development of embryos was monitored using time-lapse cinematography (Sugimura et al. 2010 Biol. Reprod. 83, 970-78). Invivo embryos were produced by collection of a superstimulated Japanese Black cow. Embryos that reached the blastocyst stage were divided into inner cell mass (ICM) and trophectoderm (TE) fractions by a micromanipulator with a blade. The TE and ICM samples were biopsied individually from 10 IVF and 4 invivo-derived embryos, and extracted DNA was amplified using the SurePlex DNA amplification System (Illumina). The whole-genome amplified DNA libraries were sequenced using MiSeq (Illumina). The sequencing reads were mapped onto the Bos taurus reference genome ARS-UCD1.2, obtained from the National Center for Biotechnology Information. In all 29 autosomal chromosomes and the X chromosome, CNV analysis was performed by CNV-seq (Xie and Tammi 2009 BMC Bioinformatics 10, 80). Male or female Japanese Black cattle DNA sequence was used for the reference genome. The parameter of CNV-seq was run with P-value=0.001, log2=0.6, and window size=1M. Four IVF embryos showed chromosomal duplications or deletions in either ICM- or TE-cell samples (4/10, 40%). The CNV loci between ICM and TE cells were relatively similar in each embryo. One of them was a code 1-expanded blastocyst with normal cleavage. Interestingly, CNV was not identified in another code 1-expanded blastocyst that underwent direct cleavage from 1 cell to 3 or more cells. In invivo embryos, only one embryo had a CNV (1/4, 25%). Observed CNVs in both IVF and invivo embryos were segmental duplication or deletion in each chromosome. Hence, to improve pregnancy success in bovine IVF embryos, cytogenetic evaluation may be useful for quality evaluation of embryos that are prone to chromosomal abnormalities, as well as morphological scoring.


2020 ◽  
Author(s):  
Phillip A. Richmond ◽  
Alice M. Kaye ◽  
Godfrain Jacques Kounkou ◽  
Tamar V. Av-Shalom ◽  
Wyeth W. Wasserman

AbstractAcross the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially obviating the need for processing, or allowing optimized mapping approaches to be deployed. Here, we present a method termed FlexTyper which facilitates a “reverse mapping” approach in which high throughput sequence queries, in the form of k-mer searches, are run against indexed short-read datasets in order to extract useful information. This reverse mapping approach enables the rapid counting of target sequences of interest. We demonstrate FlexTyper’s utility for recovering depth of coverage, and accurate genotyping of SNP sites across the human genome. We show that genotyping unmapped reads can correctly inform a sample’s population, sex, and relatedness in a family setting. Detection of pathogen sequences within RNA-seq data was sensitive and accurate, performing comparably to existing methods, but with increased flexibility. We present two examples of ways in which this flexibility allows the analysis of genome features not well-represented in a linear reference. First, we analyze contigs from African genome sequencing studies, showing how they distribute across families from three distinct populations. Second, we show how gene-marking k-mers for the killer immune receptor locus allow allele detection in a region that is challenging for standard read mapping pipelines. The future adoption of the reverse mapping approach represented by FlexTyper will be enabled by more efficient methods for FM-index generation and biology-informed collections of reference queries. In the long-term, selection of population-specific references or weighting of edges in pan-population reference genome graphs will be possible using the FlexTyper approach. FlexTyper is available at https://github.com/wassermanlab/OpenFlexTyper.Author SummaryIn the past 15 years, next generation sequencing technology has revolutionized our capacity to process and analyze DNA sequencing data. From agriculture to medicine, this technology is enabling a deeper understanding of the blueprint of life. Next generation sequencing data is composed of short sequences of DNA, referred to as “reads”, which are often shorter than 200 base pairs making them many orders of magnitude smaller than the entirety of a human genome. Gaining insights from this data has typically leveraged a reference-guided mapping approach, where the reads are aligned to a reference genome and then post-processed to gain actionable information such as presence or absence of genomic sequence, or variation between the reference genome and the sequenced sample. Many experts in the field of genomics have concluded that selecting a single, linear reference genome for mapping reads against is limiting, and several current research endeavors are focused on exploring options for improved analysis methods to unlock the full utility of sequencing data. Among these improvements are the usage of sex-matched genomes, population-specific reference genomes, and emergent graph-based reference pan-genomes. However, advanced methods that use raw DNA sequencing data to inform the choice of reference genome and guide the alignment of reads to enriched reference genomes are needed. Here we develop a method termed FlexTyper, which creates a searchable index of the short read data and enables flexible, user-guided queries to provide valuable insights without the need for reference-guided mapping. We demonstrate the utility of our method by identifying sample ancestry and sex in human whole genome sequencing data, detecting viral pathogen reads in RNA-seq data, African-enriched genome regions absent from the global reference, and HLA alleles that are complex to discern using standard read mapping. We anticipate early adoption of FlexTyper within analysis pipelines as a pre-mapping component, and further envision the bioinformatics and genomics community will leverage the tool for creative uses of sequence queries from unmapped data.


Sign in / Sign up

Export Citation Format

Share Document