scholarly journals OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps

2017 ◽  
Author(s):  
Le Li ◽  
Tsz-Piu Kwok ◽  
Alden King-Yung Leung ◽  
Yvonne Y. Y. Lai ◽  
Iris K. Pang ◽  
...  

AbstractHuman genomes contain structural variations (SVs) that are associated with various phenotypic variations and diseases. SV detection by sequencing is incomplete due to limited read length. Nanochannel-based optical mapping (OM) allows direct observation of SVs up to hundreds of kilo-bases in size on individual DNA molecules, making it a promising alternative technology for identifying large SVs. SV detection from optical maps is non-trivial due to complex types of error present in OM data, and no existing methods can simultaneously handle all these complex errors and the wide spectrum of SV types. Here we present a novel method, OMSV, for accurate and comprehensive identification of SVs from optical maps. OMSV detects both homozygous and heterozygous SVs, SVs of various types and sizes, and SVs with and without creating/destroying restriction sites. In an extensive series of tests based on real and simulated data, OMSV achieved both high sensitivity and specificity, with clear performance gains over the latest existing method. Applying OMSV to a human cell line, we identified hundreds of SVs >2kbp, with 65% of them missed by sequencing-based callers. Independent experimental validations confirmed the high accuracy of these SVs. We also demonstrate how OMSV can incorporate sequencing data to determine precise SV break points and novel sequences in the SVs not contained in the reference. We provide OMSV as open-source software to facilitate systematic studies of large SVs.

2019 ◽  
Author(s):  
Jiajun Wang ◽  
Meng-Yin Li ◽  
Jie Yang ◽  
Ya-Qian Wang ◽  
Xue-Yuan Wu ◽  
...  

DNA lesion such as metholcytosine(<sup>m</sup>C), 8-OXO-guanine(<sup>O</sup>G), inosine(I) <i>etc</i> could cause the genetic diseases. Identification of the varieties of lesion bases are usually beyond the capability of conventional DNA sequencing which is mainly designed to discriminate four bases only. Therefore, lesion detection remain challenge due to the massive varieties and less distinguishable readouts for minor structural variations. Moreover, standard amplification and labelling hardly works in DNA lesions detection. Herein, we designed a single molecule interface from the mutant K238Q Aerolysin, whose confined sensing region shows the high compatible to capture and then directly convert each base lesion into distinguishable current readouts. Compared with previous single molecule sensing interface, the resolution of the K238Q Aerolysin nanopore is enhanced by 2-order. The novel K238Q could direct discriminate at least 3 types (<sup>m</sup>C, <sup>O</sup>G, I) lesions without lableing and quantify modification sites under mixed hetero-composition condition of oligonucleotide. Such nanopore could be further applied to diagnose genetic diseases at high sensitivity.


2021 ◽  
Author(s):  
Fei Ge ◽  
Jingtao Qu ◽  
Peng Liu ◽  
Lang Pan ◽  
Chaoying Zou ◽  
...  

Heretofore, little is known about the mechanism underlying the genotype-dependence of embryonic callus (EC) induction, which has severely inhibited the development of maize genetic engineering. Here, we report the genome sequence and annotation of a maize inbred line with high EC induction ratio, A188, which is assembled from single-molecule sequencing and optical genome mapping. We assembled a 2,210 Mb genome with a scaffold N50 size of 11.61 million bases (Mb), compared to those of 9.73 Mb for B73 and 10.2 Mb for Mo17. Comparative analysis revealed that ~30% of the predicted A188 genes had large structural variations to B73, Mo17 and W22 genomes, which caused considerable protein divergence and might lead to phenotypic variations between the four inbred lines. Combining our new A188 genome, previously reported QTLs and RNA sequencing data, we reveal 8 large structural variation genes and 4 differentially expressed genes playing potential roles in EC induction.


2017 ◽  
Author(s):  
Tslil Gabrieli ◽  
Hila Sharim ◽  
Yael Michaeli ◽  
Yuval Ebenstein

ABSTRACTVariations in the genetic code, from single point mutations to large structural or copy number alterations, influence susceptibility, onset, and progression of genetic diseases and tumor transformation. Next-generation sequencing analysis is unable to reliably capture aberrations larger than the typical sequencing read length of several hundred bases. Long-read, single-molecule sequencing methods such as SMRT and nanopore sequencing can address larger variations, but require costly whole genome analysis. Here we describe a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis. The isolated target remains intact and is ideally suited for optical genome mapping and long-read sequencing at high coverage. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of optical maps and sequencing data at a fraction of the cost of whole genome sequencing.


2016 ◽  
Author(s):  
John P Didion ◽  
Francis S Collins

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves a four-fold increase in trimming accuracy and a decrease in execution time of ~50% (using 16 parallel execution threads). Furthermore, Atropos maintains high accuracy even when trimming simulated data with a high rate of error. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of most current-generation sequencing data sets. Atropos is open source and free software written in Python and available at https://github.com/jdidion/atropos.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Shou Liu ◽  
Wenjian Cao ◽  
Yichi Niu ◽  
Jiayi Luo ◽  
Yanhua Zhao ◽  
...  

ARID1A is one of the most frequently mutated epigenetic regulators in a wide spectrum of cancers. Recent studies have shown that ARID1A deficiency induces global changes in the epigenetic landscape of enhancers and promoters. These broad and complex effects make it challenging to identify the driving mechanisms of ARID1A deficiency in promoting cancer progression. Here, we identified the anti-senescence effect of Arid1a deficiency in the progression of pancreatic intraepithelial neoplasia (PanIN) by profiling the transcriptome of individual PanINs in a mouse model. In a human cell line model, we found that ARID1A deficiency upregulates the expression of Aldehyde Dehydrogenase 1 Family Member A1 (ALDH1A1), which plays an essential role in attenuating the senescence induced by oncogenic KRAS through scavenging reactive oxygen species (ROS). As a subunit of the SWI/SNF chromatin remodeling complex, our ATAC sequencing data showed that ARID1A deficiency increases the accessibility of the enhancer region of ALDH1A1. This study provides the first evidence that ARID1A deficiency promotes pancreatic tumorigenesis by attenuating KRAS-induced senescence through the upregulation of ALDH1A1 expression.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dat Thanh Nguyen ◽  
Quang Thinh Trac ◽  
Thi-Hau Nguyen ◽  
Ha-Nam Nguyen ◽  
Nir Ohad ◽  
...  

Abstract Background Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. Results We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. Conclusions With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.


2019 ◽  
Vol 21 (6) ◽  
pp. 1971-1986 ◽  
Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Ernesto Picardi ◽  
David S Horner ◽  
Graziano Pesole

Abstract A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1444
Author(s):  
Nazeefa Fatima ◽  
Anna Petri ◽  
Ulf Gyllensten ◽  
Lars Feuk ◽  
Adam Ameur

Long-read single molecule sequencing is increasingly used in human genomics research, as it allows to accurately detect large-scale DNA rearrangements such as structural variations (SVs) at high resolution. However, few studies have evaluated the performance of different single molecule sequencing platforms for SV detection in human samples. Here we performed Oxford Nanopore Technologies (ONT) whole-genome sequencing of two Swedish human samples (average 32× coverage) and compared the results to previously generated Pacific Biosciences (PacBio) data for the same individuals (average 66× coverage). Our analysis inferred an average of 17k and 23k SVs from the ONT and PacBio data, respectively, with a majority of them overlapping with an available multi-platform SV dataset. When comparing the SV calls in the two Swedish individuals, we find a higher concordance between ONT and PacBio SVs detected in the same individual as compared to SVs detected by the same technology in different individuals. Downsampling of PacBio reads, performed to obtain similar coverage levels for all datasets, resulted in 17k SVs per individual and improved overlap with the ONT SVs. Our results suggest that ONT and PacBio have a similar performance for SV detection in human whole genome sequencing data, and that both technologies are feasible for population-scale studies.


2021 ◽  
Author(s):  
Chen Yang ◽  
Theodora Lo ◽  
Ka Ming Nip ◽  
Saber Hafezqorani ◽  
René L Warren ◽  
...  

Abstract Background: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, non-uniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical tools, such as microbial abundance estimation and metagenome assembly algorithms. When developing and testing bioinformatics tools and pipelines, the use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to provide a ground truth and assess the performance in a controlled environment. Results: Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes, and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. Conclusions: The Meta-NanoSim characterization module investigates read features including chimeric information and abundance levels, while the simulation module simulates large and complex multi-sample microbial communities with different abundance profiles. All trained models and the software are freely accessible at Github: https://github.com/bcgsc/NanoSim .


2018 ◽  
Vol 115 (44) ◽  
pp. 11150-11155 ◽  
Author(s):  
Miao-Hsuan Chien ◽  
Mario Brameshuber ◽  
Benedikt K. Rossboth ◽  
Gerhard J. Schütz ◽  
Silvan Schmid

Absorption microscopy is a promising alternative to fluorescence microscopy for single-molecule imaging. So far, molecular absorption has been probed optically via the attenuation of a probing laser or via photothermal effects. The sensitivity of optical probing is not only restricted by background scattering but it is fundamentally limited by laser shot noise, which minimizes the achievable single-molecule signal-to-noise ratio. Here, we present nanomechanical photothermal microscopy, which overcomes the scattering and shot-noise limit by detecting the photothermal heating of the sample directly with a temperature-sensitive substrate. We use nanomechanical silicon nitride drums, whose resonant frequency detunes with local heating. Individual Au nanoparticles with diameters from 10 to 200 nm and single molecules (Atto 633) are scanned with a heating laser with a peak irradiance of 354 ± 45 µW/µm2 using 50× long-working-distance objective. With a stress-optimized drum we reach a sensitivity of 16 fW/Hz1/2 at room temperature, resulting in a single-molecule signal-to-noise ratio of >70. The high sensitivity combined with the inherent wavelength independence of the nanomechanical sensor presents a competitive alternative to established tools for the analysis and localization of nonfluorescent single molecules and nanoparticles.


Sign in / Sign up

Export Citation Format

Share Document