scholarly journals On the relation of gene essentiality to intron structure: a computational and deep learning approach

2021 ◽  
Vol 4 (6) ◽  
pp. e202000951
Author(s):  
Ethan Schonfeld ◽  
Edward Vendrow ◽  
Joshua Vendrow ◽  
Elan Schonfeld

Essential genes have been studied by copy number variants and deletions, both associated with introns. The premise of our work is that introns of essential genes have distinct characteristic properties. We provide support for this by training a deep learning model and demonstrating that introns alone can be used to classify essentiality. The model, limited to first introns, performs at an increased level, implicating first introns in essentiality. We identify unique properties of introns of essential genes, finding that their structure protects against deletion and intron-loss events, especially centered on the first intron. We show that GC density is increased in the first introns of essential genes, allowing for increased enhancer activity, protection against deletions, and improved splice site recognition. We find that first introns of essential genes are of remarkably smaller size than their nonessential counterparts, and to protect against common 3′ end deletion events, essential genes carry an increased number of (smaller) introns. To demonstrate the importance of the seven features we identified, we train a feature-based model using only these features and achieve high performance.


2020 ◽  
Author(s):  
Ethan Schonfeld ◽  
Edward Vendrow ◽  
Joshua Vendrow ◽  
Elan Schonfeld

AbstractIdentification and study of human-essential genes has become of practical importance with the realization that disruption or loss of nearby essential genes can introduce latent-vulnerabilities to cancer cells. Essential genes have been studied by copy-number-variants and deletion events, which are associated with introns. The premise of our work is that introns of essential genes have characteristic properties that are distinct from the introns of nonessential genes. We provide support for the existence of characteristic properties by training a deep learning model on introns of essential and nonessential genes and demonstrated that introns alone can be used to classify essential and nonessential genes with high accuracy (AUC of 0.846). We further demonstrated that the accuracy of the same deep-learning model limited to first introns will perform at an increased level, thereby demonstrating the critical importance of introns and particularly first introns in gene essentiality. Using a computational approach, we identified several novel properties of introns of essential genes, finding that their structure protects against deletion and intron-loss events, and that these traits are especially centered on the first intron. We showed that GC density is increased in the first introns of essential genes, allowing for increased enhancer activity, protection against deletions, and improved splice-site recognition. Furthermore, we found that first introns of essential genes are of remarkably smaller size than their nonessential counterparts, and to protect against common 3’ end deletion events, essential genes carry an increased number of (smaller) introns. To demonstrate the importance of the seven features we identified, we trained a feature–based model using only information from these features and achieved high accuracy (AUC of 0.787).



2017 ◽  
Author(s):  
Jinfang Zheng ◽  
Xiaoli Zhang ◽  
Xunyi Zhao ◽  
Xiaoxue Tong ◽  
Xu Hong ◽  
...  

AbstractRNA binding protein (RBP) plays an important role in cell processes. Identifying RBPs by computation and experiment are both essential. Recently, RBPPred is proposed in our group to predict RBP with a high performance. However, RBPPred is too slow for that it will generate PSSM matrix as its feature. Herein, we develop a deep learning model called Deep-RBPPred. The model has three advantages comparing to previous models. 1. Deep-RBPPred only needs few physicochemical properties. 2. Deep-RBPPred runs much faster. 3. Deep-RBPPred has a good generalization ability. In the meantime, the performance is still as good as the stats-of-the-art method. In the testing in A. thaliana, S. cerevisiae and H. sapiens proteomics, MCC (AUC) are 0.6077 (0.9421), 0.573 (0.9034) and 0.8141(0.9515) respectively when the score cutoff is set to 0.5. In the verifying in Gerstberger-1538, the SN of our model is 90.38%. The running times are 9s, 7s, 8s and 10s, respectively, for H.sapiens, A.thaliana, S.cerevisiae and Gerstberger-1538 when it is tested in GPU. Deep-RBPPred forecasts 94.65% of 299 new RBP and about 8% higher sensitivity than RBPPred. We also apply deep-RBPPred in 19 eukaryotes proteomics and 11 bacteria proteomics downloaded from Uniprot. The result shows that rate of RBPs in eukaryotes proteome are much higher than bacteria proteome. Testing in 6 proteomics shows the many RBPs may be still undiscovered so far.



2020 ◽  
Author(s):  
Ting Sun ◽  
Yufei He ◽  
Wendong Li ◽  
Guang Liu ◽  
Lin Li ◽  
...  

AbstractBackgroundIDH wild-type glioblastoma (GBM) is the most aggressive tumor in the central nervous system in spite of extensive therapies. Neoantigen based personalized immune therapies achieve promising results in melanoma and lung cancer, but few neoantigen based models perform well in IDH wild-type GBM. Unlike the neoantigen load and occurrence that are well studied and often found useless, the association between neoantigen intrinsic features and prognosis remain unclear in IDH wild-type GBM.ResultsWe presented a novel neoantigen intrinsic feature-based deep learning model (neoDL) to stratify IDH wild-type GBMs into subgroups with different survivals. We first calculated a total of 2928 intrinsic features for each neoantigen and filtered out those not associated with survival, followed by applying neoDL in the TCGA data cohort. Leave one out cross validation (LOOCV) in the TCGA demonstrated that neoDL successfully classified IDH wild-type GBMs into different prognostic subgroups, which was further validated in an independent data cohorts from Asian population. Long-term survival IDH wild-type GBMs identified by neoDL were found characterized by 12 protective neoantigen intrinsic features and enriched in development and cell cycle.ConclusionsOur results provide a novel model, neoDL, that can be therapeutically exploited to identify IDH wild-type GBM with good prognosis who will most likely benefit from neoantigen based personalized immunetherapy.



Author(s):  
Mohammed Y. Kamil

COVID-19 disease has rapidly spread all over the world at the beginning of this year. The hospitals' reports have told that low sensitivity of RT-PCR tests in the infection early stage. At which point, a rapid and accurate diagnostic technique, is needed to detect the Covid-19. CT has been demonstrated to be a successful tool in the diagnosis of disease. A deep learning framework can be developed to aid in evaluating CT exams to provide diagnosis, thus saving time for disease control. In this work, a deep learning model was modified to Covid-19 detection via features extraction from chest X-ray and CT images. Initially, many transfer-learning models have applied and comparison it, then a VGG-19 model was tuned to get the best results that can be adopted in the disease diagnosis. Diagnostic performance was assessed for all models used via the dataset that included 1000 images. The VGG-19 model achieved the highest accuracy of 99%, sensitivity of 97.4%, and specificity of 99.4%. The deep learning and image processing demonstrated high performance in early Covid-19 detection. It shows to be an auxiliary detection way for clinical doctors and thus contribute to the control of the pandemic.



2021 ◽  
Vol 7 (15) ◽  
pp. eabd7416
Author(s):  
Zhenze Yang ◽  
Chi-Hua Yu ◽  
Markus J. Buehler

Materials-by-design is a paradigm to develop previously unknown high-performance materials. However, finding materials with superior properties is often computationally or experimentally intractable because of the astronomical number of combinations in design space. Here we report an AI-based approach, implemented in a game theory–based conditional generative adversarial neural network (cGAN), to bridge the gap between a material’s microstructure—the design space—and physical performance. Our end-to-end deep learning model predicts physical fields like stress or strain directly from the material microstructure geometry, and reaches an astonishing accuracy not only for predicted field data but also for derivative material property predictions. Furthermore, the proposed approach offers extensibility by predicting complex materials behavior regardless of component shapes, boundary conditions, and geometrical hierarchy, providing perspectives of performing physical modeling and simulations. The method vastly improves the efficiency of evaluating physical properties of hierarchical materials directly from the geometry of its structural makeup.



Author(s):  
Ling Zhu ◽  
◽  
Zhenbo Li ◽  
Chen Li ◽  
Jing Wu ◽  
...  


2021 ◽  
Vol 11 ◽  
Author(s):  
Mehdi Astaraki ◽  
Guang Yang ◽  
Yousuf Zakko ◽  
Iuliana Toma-Dasu ◽  
Örjan Smedby ◽  
...  

ObjectivesBoth radiomics and deep learning methods have shown great promise in predicting lesion malignancy in various image-based oncology studies. However, it is still unclear which method to choose for a specific clinical problem given the access to the same amount of training data. In this study, we try to compare the performance of a series of carefully selected conventional radiomics methods, end-to-end deep learning models, and deep-feature based radiomics pipelines for pulmonary nodule malignancy prediction on an open database that consists of 1297 manually delineated lung nodules.MethodsConventional radiomics analysis was conducted by extracting standard handcrafted features from target nodule images. Several end-to-end deep classifier networks, including VGG, ResNet, DenseNet, and EfficientNet were employed to identify lung nodule malignancy as well. In addition to the baseline implementations, we also investigated the importance of feature selection and class balancing, as well as separating the features learned in the nodule target region and the background/context region. By pooling the radiomics and deep features together in a hybrid feature set, we investigated the compatibility of these two sets with respect to malignancy prediction.ResultsThe best baseline conventional radiomics model, deep learning model, and deep-feature based radiomics model achieved AUROC values (mean ± standard deviations) of 0.792 ± 0.025, 0.801 ± 0.018, and 0.817 ± 0.032, respectively through 5-fold cross-validation analyses. However, after trying out several optimization techniques, such as feature selection and data balancing, as well as adding context features, the corresponding best radiomics, end-to-end deep learning, and deep-feature based models achieved AUROC values of 0.921 ± 0.010, 0.824 ± 0.021, and 0.936 ± 0.011, respectively. We achieved the best prediction accuracy from the hybrid feature set (AUROC: 0.938 ± 0.010).ConclusionThe end-to-end deep-learning model outperforms conventional radiomics out of the box without much fine-tuning. On the other hand, fine-tuning the models lead to significant improvements in the prediction performance where the conventional and deep-feature based radiomics models achieved comparable results. The hybrid radiomics method seems to be the most promising model for lung nodule malignancy prediction in this comparative study.



2019 ◽  
Author(s):  
Amarin Jettakul ◽  
Duangdao Wichadakul ◽  
Peerapon Vateekul

AbstractThe Bacteria Biotope (BB) task is biomedical relation extraction (RE) that aims to study the interaction between bacteria and their locations. This task is considered to pertain to fundamental knowledge in applied microbiology. Some previous investigations have used feature-based models; others have presented deep-learning-based models such as convolutional and recurrent neural networks used with the shortest dependency paths (SDPs). Although SDPs contain valuable and concise information, sections of significant information necessary to define bacterial location relationships are often neglected. In addition, the traditional word embedding used in previous studies may suffer from word ambiguation across linguistic contexts.Here, we present a deep learning model for biomedical RE. The model incorporates feature combinations of SDPs and full sentences with various attention mechanisms. We also used pre-trained contextual representations based on domain-specific vocabularies. In order to assess the model’s robustness, we introduced a mean F1 score on many models using different random seeds. The experiments were conducted on the standard BB corpus in BioNLP-ST’16. Our experimental results revealed that the model performed better (in terms of both maximum and average F1 scores; 60.77% and 57.63%, respectively) compared with other existing models.We demonstrated that our proposed contributions to this task can be used to extract rich lexical, syntactic, and semantic features that effectively boost the model’s performance. Moreover, we analyzed the trade-off between precision and recall in order to choose the proper cut-off to use in real-world applications.



2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Amarin Jettakul ◽  
Duangdao Wichadakul ◽  
Peerapon Vateekul

Abstract Background The Bacteria Biotope (BB) task is a biomedical relation extraction (RE) that aims to study the interaction between bacteria and their locations. This task is considered to pertain to fundamental knowledge in applied microbiology. Some previous investigations conducted the study by applying feature-based models; others have presented deep-learning-based models such as convolutional and recurrent neural networks used with the shortest dependency paths (SDPs). Although SDPs contain valuable and concise information, some parts of crucial information that is required to define bacterial location relationships are often neglected. Moreover, the traditional word-embedding used in previous studies may suffer from word ambiguation across linguistic contexts. Results Here, we present a deep learning model for biomedical RE. The model incorporates feature combinations of SDPs and full sentences with various attention mechanisms. We also used pre-trained contextual representations based on domain-specific vocabularies. To assess the model’s robustness, we introduced a mean F1 score on many models using different random seeds. The experiments were conducted on the standard BB corpus in BioNLP-ST’16. Our experimental results revealed that the model performed better (in terms of both maximum and average F1 scores; 60.77% and 57.63%, respectively) compared with other existing models. Conclusions We demonstrated that our proposed contributions to this task can be used to extract rich lexical, syntactic, and semantic features that effectively boost the model’s performance. Moreover, we analyzed the trade-off between precision and recall to choose the proper cut-off to use in real-world applications.



Sign in / Sign up

Export Citation Format

Share Document