scholarly journals JuLI: accurate detection of DNA fusions in clinical sequencing for precision oncology

2019 ◽  
Author(s):  
Hyun-Tae Shin ◽  
Nayoung K. D. Kim ◽  
Jae Won Yun ◽  
Boram Lee ◽  
Sungkyu Kyung ◽  
...  

ABSTRACTAccurate detection of genomic fusions by high-throughput sequencing in clinical samples with inadequate tumor purity and formalin-fixed paraffin embedded (FFPE) tissue is an essential task in precise oncology. We developed the fusion detection algorithm Junction Location Identifier (JuLI) for optimization of high-depth clinical sequencing. We implemented novel filtering steps to minimize false positives and a joint calling function to increase sensitivity in clinical setting. We comprehensively validated the algorithm using high-depth sequencing data from cancer cell lines and clinical samples and whole genome sequencing data from NA12878. We showed that JuLI outperformed state-of-the-art fusion callers in cases with high-depth clinical sequencing and rescued a driver fusion from false negative in plasma cell-free DNA. JuLI is freely available via GitHub (https://github.com/sgilab/JuLI).

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zhongbo Chen ◽  
◽  
David Zhang ◽  
Regina H. Reynolds ◽  
Emil K. Gustavsson ◽  
...  

AbstractKnowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer’s disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.


2019 ◽  
Author(s):  
Ronan M. Doyle ◽  
Denise M. O’Sullivan ◽  
Sean D. Aller ◽  
Sebastian Bruchmann ◽  
Taane Clark ◽  
...  

AbstractBackgroundAntimicrobial resistance (AMR) poses a threat to public health. Clinical microbiology laboratories typically rely on culturing bacteria for antimicrobial susceptibility testing (AST). As the implementation costs and technical barriers fall, whole-genome sequencing (WGS) has emerged as a ‘one-stop’ test for epidemiological and predictive AST results. Few published comparisons exist for the myriad analytical pipelines used for predicting AMR. To address this, we performed an inter-laboratory study providing sets of participating researchers with identical short-read WGS data sequenced from clinical isolates, allowing us to assess the reproducibility of the bioinformatic prediction of AMR between participants and identify problem cases and factors that lead to discordant results.MethodsWe produced ten WGS datasets of varying quality from cultured carbapenem-resistant organisms obtained from clinical samples sequenced on either an Illumina NextSeq or HiSeq instrument. Nine participating teams (‘participants’) were provided these sequence data without any other contextual information. Each participant used their own pipeline to determine the species, the presence of resistance-associated genes, and to predict susceptibility or resistance to amikacin, gentamicin, ciprofloxacin and cefotaxime.ResultsIndividual participants predicted different numbers of AMR-associated genes and different gene variants from the same clinical samples. The quality of the sequence data, choice of bioinformatic pipeline and interpretation of the results all contributed to discordance between participants. Although much of the inaccurate gene variant annotation did not affect genotypic resistance predictions, we observed low specificity when compared to phenotypic AST results but this improved in samples with higher read depths. Had the results been used to predict AST and guide treatment a different antibiotic would have been recommended for each isolate by at least one participant.ConclusionsWe found that participants produced discordant predictions from identical WGS data. These challenges, at the final analytical stage of using WGS to predict AMR, suggest the need for refinements when using this technology in clinical settings. Comprehensive public resistance sequence databases and standardisation in the comparisons between genotype and resistance phenotypes will be fundamental before AST prediction using WGS can be successfully implemented in standard clinical microbiology laboratories.


2019 ◽  
Author(s):  
William L Hamilton ◽  
Roberto Amato ◽  
Rob W van der Pluijm ◽  
Christopher G Jacob ◽  
Huynh Hong Quang ◽  
...  

SummaryBackgroundA multidrug resistant co-lineage of Plasmodium falciparum malaria, named KEL1/PLA1, spread across Cambodia c.2008-2013, causing high treatment failure rates to the frontline combination therapy dihydroartemisinin-piperaquine. Here, we report on the evolution and spread of KEL1/PLA1 in subsequent years.MethodsWe analysed whole genome sequencing data from 1,673 P. falciparum clinical samples collected in 2008-2018 from northeast Thailand, Laos, Cambodia and Vietnam. By investigating genome-wide relatedness between parasites, we inferred patterns of shared ancestry in the KEL1/PLA1 population.FindingsKEL1/PLA1 spread rapidly from 2015 into all of the surveyed countries and now exceeds 80% of the P. falciparum population in several regions. These parasites maintained a high level of genetic relatedness reflecting their common origin. However, several genetic subgroups have recently emerged within this co-lineage with diverse geographical distributions. Some of these emerging KEL1/PLA1 subgroups carry recent mutations in the chloroquine resistance transporter (crt) gene, which arise on a specific genetic background comprising multiple genomic regions.InterpretationAfter emerging and circulating for several years within Cambodia, the P. falciparum KEL1/PLA1 co-lineage diversified into multiple subgroups and acquired new genetic features including novel crt mutations. These subgroups have rapidly spread into neighbouring countries, suggesting enhanced fitness. These findings highlight the urgent need for elimination of this increasingly drug-resistant parasite co-lineage, and the importance of genetic surveillance in accelerating elimination efforts.FundingWellcome Trust, Bill & Melinda Gates Foundation, UK Medical Research Council, UK Department for International Development.Research in contextEvidence before this studyThis study updates our previous work describing the emergence and spread of a multidrug resistant P. falciparum co-lineage (KEL1/PLA1) within Cambodia up to 2013. Since then, a regional genetic surveillance project, GenRe-Mekong, has reported that markers of dihydroartemisinin-piperaquine (DHA-PPQ) resistance have increased in frequency in neighbouring countries. A PubMed search (terms: “artemisinin”, “piperaquine”, “resistance”, “southeast asia”) for articles listed since our previous study (from 30/10/2017 to 05/01/2019) yielded 28 results, including reports of a recent sharp decline in DHA-PPQ clinical efficacy in Vietnam; the spread of genetic markers of DHA-PPQ resistance into neighbouring countries by Imwong and colleagues; and multiple reports associating mutations in the crt gene with piperaquine resistance, including newly emerging crt variants in Southeast Asia.Added value of this studyWe analysed P. falciparum whole genomes collected up to early 2018 from Eastern Southeast Asia (Cambodia and surrounding regions), describing the fine-scale epidemiology of multiple KEL1/PLA1 genetic subgroups that have spread out from Cambodia since 2015 and taken over indigenous parasite populations in northeastern Thailand, southern and central Vietnam and parts of southern Laos. Several newly emerging crt mutations accompanied the spread and expansion of KEL1/PLA1 subgroups, suggesting an active proliferation of biologically fit, multidrug resistant parasites.Implications of all the available evidenceThe problem of P. falciparum multidrug resistance has dramatically worsened in Eastern Southeast Asia since previous reports. KEL1/PLA1 has diversified and spread widely across Eastern Southeast Asia since 2015, becoming the predominant parasite group in several regions. This may have been fuelled by continued parasite exposure to DHA-PPQ, resulting in sustained selection after KEL1/PLA1 became established. Continued drug pressure enabled the acquisition of further mutations, resulting in higher levels of resistance. These data demonstrate the value of pathogen genetic surveillance and the urgent need to eliminate these dangerous parasites.


2021 ◽  
Vol 12 ◽  
Author(s):  
Shengzhe Bian ◽  
Yangyang Jia ◽  
Qiuyao Zhan ◽  
Nai-Kei Wong ◽  
Qinghua Hu ◽  
...  

Vibrio parahaemolyticus has emerged as a significant enteropathogen in human and marine habitats worldwide, notably in regions where aquaculture products constitute a major nutritional source. It is a growing cause of diseases including gastroenteritis, wound infections, and septicemia. Serotyping assays use commercially available antisera to identify V. parahaemolyticus strains, but this approach is limited by high costs, complicated procedures, cross-immunoreactivity, and often subjective interpretation. By leveraging high-throughput sequencing technologies, we developed an in silico method based on comparison of gene clusters for lipopolysaccharide (LPSgc) and capsular polysaccharide (CPSgc) by firstly using the unique-gene strategy. The algorithm, VPsero, which exploits serogroup-specific genes as markers, covers 43 K and all 12 O serogroups in serotyping assays. VPsero is capable of predicting serotypes from assembled draft genomes, outputting LPSgc/CPSgc sequences, and recognizing possible novel serogroups or populations. Our tool displays high specificity and sensitivity in prediction toward V. parahaemolyticus strains, with an average sensitivity in serogroup prediction of 0.910 for O and 0.961 for K serogroups and a corresponding average specificity of 0.990 for O and 0.998 for K serogroups.


2018 ◽  
Author(s):  
Matthew Z. DeMaere ◽  
Aaron E. Darling

AbstractMost microbes inhabiting the planet cannot be easily grown in the lab. Metagenomic techniques provide a means to study these organisms, and recent advances in the field have enabled the resolution of individual genomes from metagenomes, so-called Metagenome Assembled Genomes (MAGs). In addition to expanding the catalog of known microbial diversity, the systematic retrieval of MAGs stands as a tenable divide and conquer reduction of metagenome analysis to the simpler problem of single genome analysis. Many leading approaches to MAG retrieval depend upon time-series or transect data, whose effectiveness is a function of community complexity, target abundance and depth of sequencing. Without the need for time-series data, promising alternative methods are based upon the high-throughput sequencing technique called Hi-C.The Hi-C technique produces read-pairs which capture in-vivo DNA-DNA proximity interactions (contacts). The physical structure of the community modulates the signal derived from these interactions and a hierarchy of interaction rates exists (īntra-chromosomal > Inter-chromosomal > Inter-cellular).We describe an unsupervised method that exploits the hierarchical nature of Hi-C interaction rates to resolve MAGs from a single time-point. As a quantitative demonstration, next, we validate the method against the ground truth of a simulated human faecal microbiome. Lastly, we directly compare our method against a recently announced proprietary service ProxiMeta, which also performs MAG retrieval using Hi-C data.bin3C has been implemented as a simple open-source pipeline and makes use of the unsupervised community detection algorithm Infomap (https://github.com/cerebis/bin3C).


2021 ◽  
Vol 43 (3) ◽  
pp. 1937-1949
Author(s):  
Laura A. E. Van Poelvoorde ◽  
Mathieu Gand ◽  
Marie-Alice Fraiture ◽  
Sigrid C. J. De Keersmaecker ◽  
Bavo Verhaegen ◽  
...  

The worldwide emergence and spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) since 2019 has highlighted the importance of rapid and reliable diagnostic testing to prevent and control the viral transmission. However, inaccurate results may occur due to false negatives (FN) caused by polymorphisms or point mutations related to the virus evolution and compromise the accuracy of the diagnostic tests. Therefore, PCR-based SARS-CoV-2 diagnostics should be evaluated and evolve together with the rapidly increasing number of new variants appearing around the world. However, even by using a large collection of samples, laboratories are not able to test a representative collection of samples that deals with the same level of diversity that is continuously evolving worldwide. In the present study, we proposed a methodology based on an in silico and in vitro analysis. First, we used all information offered by available whole-genome sequencing data for SARS-CoV-2 for the selection of the two PCR assays targeting two different regions in the genome, and to monitor the possible impact of virus evolution on the specificity of the primers and probes of the PCR assays during and after the development of the assays. Besides this first essential in silico evaluation, a minimal set of testing was proposed to generate experimental evidence on the method performance, such as specificity, sensitivity and applicability. Therefore, a duplex reverse-transcription droplet digital PCR (RT-ddPCR) method was evaluated in silico by using 154 489 whole-genome sequences of SARS-CoV-2 strains that were representative for the circulating strains around the world. The RT-ddPCR platform was selected as it presented several advantages to detect and quantify SARS-CoV-2 RNA in clinical samples and wastewater. Next, the assays were successfully experimentally evaluated for their sensitivity and specificity. A preliminary evaluation of the applicability of the developed method was performed using both clinical and wastewater samples.


2016 ◽  
Author(s):  
Thomas Willems ◽  
Dina Zielinski ◽  
Assaf Gordon ◽  
Melissa Gymrek ◽  
Yaniv Erlich

AbstractShort tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, STRs have proven problematic to genotype from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping, haplotyping, and phasing STRs from whole genome sequencing data and report a genome-wide analysis and validation of de novo STR mutations.


2018 ◽  
Author(s):  
Arda Soylev ◽  
Thong Le ◽  
Hajar Amini ◽  
Can Alkan ◽  
Fereydoun Hormozdiari

AbstractMotivationSeveral algorithms have been developed that use high throughput sequencing technology to characterize structural variations. Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions, and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions; and duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants.ResultsWe developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing data sets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real data sets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state of the art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (less than 5% for the top 50 predictions).AvailabilityTARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/[email protected] and [email protected]


2020 ◽  
Author(s):  
Zhongbo Chen ◽  
David Zhang ◽  
Regina H. Reynolds ◽  
Emil K. Gustavsson ◽  
Sonia García Ruiz ◽  
...  

ABSTRACTKnowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript/s to be more abundant in Alzheimer’s disease with more severe tau and amyloid pathological burden. Thus, we demonstrate the importance of human-lineage-specific sequences in brain development and neurological disease. We release our annotation through vizER (https://snca.atica.um.es/browser/app/vizER).


Sign in / Sign up

Export Citation Format

Share Document