scholarly journals Interpretable prioritization of splice variants in diagnostic next-generation sequencing

2021 ◽  
Author(s):  
Daniel Danis ◽  
Julius O.B. Jacobsen ◽  
Leigh Carmody ◽  
Michael Gargano ◽  
Julie A McMurry ◽  
...  

ABSTRACTA critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5′ and 3′ ends of introns. To address this gap, we developed the Super Quick Informationcontent Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content (IC) of wildtype and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splicealtering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state of the art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings

2019 ◽  
Author(s):  
Tal Einav ◽  
Rob Phillips

AbstractAlthough the key promoter elements necessary to drive transcription inEscherichia colihave long been understood, we still cannot predict the behavior of arbitrary novel promoters, hampering our ability to characterize the myriad of sequenced regulatory architectures as well as to design novel synthetic circuits. This work builds on a beautiful recent experiment by Urtechoet al.who measured the gene expression of over 10,000 promoters spanning all possible combinations of a small set of regulatory elements. Using this data, we demonstrate that a central claim in energy matrix models of gene expression – that each promoter element contributes independently and additively to gene expression – contradicts experimental measurements. We propose that a key missing ingredient from such models is the avidity between the -35 and -10 RNA polymerase binding sites and develop what we call arefined energy matrixmodel that incorporates this effect. We show that this the refined energy matrix model can characterize the full suite of gene expression data and explore several applications of this framework, namely, how multivalent binding at the -35 and -10 sites can buffer RNAP kinetics against mutations and how promoters that bind overly tightly to RNA polymerase can inhibit gene expression. The success of our approach suggests that avidity represents a key physical principle governing the interaction of RNA polymerase to its promoter.Significance StatementCellular behavior is ultimately governed by the genetic program encoded in its DNA and through the arsenal of molecular machines that actively transcribe its genes, yet we lack the ability to predict how an arbitrary DNA sequence will perform. To that end, we analyze the performance of over 10,000 regulatory sequences and develop a model that can predict the behavior of any sequence based on its composition. By considering promoters that only vary by one or two elements, we can characterize how different components interact, providing fundamental insights into the mechanisms of transcription.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Paola Frisone ◽  
Davide Pradella ◽  
Anna Di Matteo ◽  
Elisa Belloni ◽  
Claudia Ghigna ◽  
...  

Alterations in expression and/or activity of splicing factors as well as mutations incis-acting splicing regulatory sequences contribute to cancer phenotypes. Genome-wide studies have revealed more than 15,000 tumor-associated splice variants derived from genes involved in almost every aspect of cancer cell biology, including proliferation, differentiation, cell cycle control, metabolism, apoptosis, motility, invasion, and angiogenesis. In the past decades, several RNA binding proteins (RBPs) have been implicated in tumorigenesis. SAM68 (SRC associated in mitosis of 68 kDa) belongs to the STAR (signal transduction and activation of RNA metabolism) family of RBPs. SAM68 is involved in several steps of mRNA metabolism, from transcription to alternative splicing and then to nuclear export. Moreover, SAM68 participates in signaling pathways associated with cell response to stimuli, cell cycle transitions, and viral infections. Recent evidence has linked this RBP to the onset and progression of different tumors, highlighting misregulation of SAM68-regulated splicing events as a key step in neoplastic transformation and tumor progression. Here we review recent studies on the role of SAM68 in splicing regulation and we discuss its contribution to aberrant pre-mRNA processing in cancer.


2021 ◽  
Vol 4 (3) ◽  
pp. 62
Author(s):  
Giulia Riolo ◽  
Silvia Cantara ◽  
Claudia Ricci

Alternative splicing (AS) is a crucial process to enhance gene expression driving organism development. Interestingly, more than 95% of human genes undergo AS, producing multiple protein isoforms from the same transcript. Any alteration (e.g., nucleotide substitutions, insertions, and deletions) involving consensus splicing regulatory sequences in a specific gene may result in the production of aberrant and not properly working proteins. In this review, we introduce the key steps of splicing mechanism and describe all different types of genomic variants affecting this process (splicing variants in acceptor/donor sites or branch point or polypyrimidine tract, exonic, and deep intronic changes). Then, we provide an updated approach to improve splice variants detection. First, we review the main computational tools, including the recent Machine Learning-based algorithms, for the prediction of splice site variants, in order to characterize how a genomic variant interferes with splicing process. Next, we report the experimental methods to validate the predictive analyses are defined, distinguishing between methods testing RNA (transcriptomics analysis) or proteins (proteomics experiments). For both prediction and validation steps, benefits and weaknesses of each tool/procedure are accurately reported, as well as suggestions on which approaches are more suitable in diagnostic rather than in clinical research.


2021 ◽  
Author(s):  
Jun Hu ◽  
Shunji Kotsuki ◽  
Yasunori Igarashi ◽  
Mykola Talerko ◽  
Kazuhito Ichii

<p>The Chernobyl Nuclear Power Plant (CNPP) accident that happened in 1986 is the largest source of anthropogenic radionuclides released into the environment in history. In recent 20 years, the climate and land-use changes have increased the frequency of large forest fires in and around the Chernobyl Exclusion Zone. It is critical to extract the burned areas accurately because they are the basis to estimate the biomass burning emission and then analyze the second diffusion of radioactive residue released from the CNPP accident. In this study, we established a burned area extracting method based on the random forest (RF) algorithm using the Moderate Resolution Imaging Spectroradiometer (MODIS) MOD09GA / MYD09GA and LANDSAT -7 ETM+ /-8 OLI images. The field observation in 2015 and MODIS MOD14A1 (thermal anomaly data) product were adopted to generate sampling points for RF. The reflectance difference spectroscopy of near-infrared band and difference in vegetation indices (NDVI, NBR, NDWI) between pre- and post-fire imagery were used as input data for the RF classifier. Subsequently, the historical burned area in 2015 and 2020 were detected using the trained RF classifier. The preliminary results of the identified burned area show good consistency with the MODIS MCD64A1.006 product of NASA and FireCCI51product of ESA. It should be noted that our RF algorithm can even detect the relatively small fire scars compared to the two existing products due to the usage of high-resolution LANDSAT image.</p><p> </p>


2007 ◽  
Vol 293 (6) ◽  
pp. L1454-L1462 ◽  
Author(s):  
Haishan Xu ◽  
Shijian Chu

Amiloride-sensitive epithelial sodium channel (ENaC) is a major sodium channel in the lung facilitating fluid absorption. ENaC is composed of α-, β-, and γ-subunits, and the α-subunit is indispensable for ENaC function in the lung. In human lungs, the α-subunit is expressed as various splice variants. Among them, α1- and α2-subunits are two major variants with different upstream regulatory sequences that possess similar channel characteristics when tested in Xenopus oocytes. Despite the importance of α-ENaC, little was known about the relative abundance of its variants in lung epithelial cells. Furthermore, lung infection and inflammation are often accompanied by reduced α-ENaC expression, oxidative stress, and pulmonary edema. However, it was not clear how oxidative stress affects expression of α-ENaC variants. In this study, we examined relative expression levels of α-subunit variants in four human lung epithelial cell lines. We also tested the hypothesis that oxidative stress inhibits α-ENaC expression. Our results show that both α1- and α2-ENaC variants are expressed in the cells we tested, but relative abundance varies. In the two monolayer-forming cell lines, H441 and Calu-3, α2-ENaC is the predominant variant. We also show that H2O2 specifically suppresses α1- and α2-ENaC variant expression in H441 and Calu-3 cells in a dose-dependent fashion. This suppression is achieved by inhibition of their promoters and is attenuated by dexamethasone. These data demonstrate the importance of the α2-subunit variant and suggest that glucocorticoids and antioxidants may be useful in correcting infection/inflammation-induced lung fluid imbalance.


2001 ◽  
Vol 120 (5) ◽  
pp. A507-A507
Author(s):  
D KANG ◽  
Y WHANG ◽  
J YOO ◽  
I SONG ◽  
J OH ◽  
...  

1979 ◽  
Author(s):  
Jan Hermans

Measurements of light scattering have given much information about formation and properties of fibrin. These studies have determined mass-length ratio of linear polymers (protofibrils) and of fibers, kinetics of polymerization and of lateral association and volume-mass ratio of thick fibers. This ratio is 5 to 1. On the one hand, this high value suggests that the fiber contains channels that allow the diffusion of enzymes such as Factor XHIa and plasmin; on the other hand, the high value appears paradoxical for a stiff fiber made up of elongated units (fibrin monomers) arranged in parallel. Such a high fiber volume is a property of only a small set out of many high-symmetry models of fibrin, which may be constructed from overlapping three-domain monomers which are arranged into strands, are aligned nearly parallel to the fiber axis and make adequate longitudinal and lateral contacts. These models contain helical protofibrils related to each other by rotation axes parallel to the fiber axis. The protofibrils may contain 2, 3 or 4 monomers per helical turn and there are four possible symmetries. A large specific volume is achieved if the ends of each monomer are slightly displaced from the protofibril axis, either by a shift or by a tilt of the monomer. The fiber containing tilted monomers is more highly interconnected; the two ends of a tilted monomer form lateral contacts with different adjacent protofibrils, whereas the two ends of a non-tilted monomer contact the same adjacent protofibril(s).


Sign in / Sign up

Export Citation Format

Share Document