scholarly journals NetGenes: A database of essential genes predicted using features from interaction networks

2020 ◽  
Author(s):  
Vimaladhasan Senthamizhan ◽  
Balaraman Ravindran ◽  
Karthik Raman

AbstractEssential gene prediction models built so far are heavily reliant on sequence-based features and the scope of network-based features has been narrow. Previous work from our group demonstrated the importance of using network-based features for predicting essential genes with high accuracy. Here, we applied our approach for the prediction of essential genes to organisms from the STRING database and hosted the results in a standalone website. Our database, NetGenes, contains essential gene predictions for 2700+ bacteria predicted using features derived from STRING protein-protein functional association networks. Housing a total of 3.5M+ genes, NetGenes offers various features like essentiality scores, annotations and feature vectors for each gene. NetGenes is available at https://rbc-dsai.iitm.github.io/NetGenes/

2021 ◽  
Vol 12 ◽  
Author(s):  
Vimaladhasan Senthamizhan ◽  
Balaraman Ravindran ◽  
Karthik Raman

Essential gene prediction models built so far are heavily reliant on sequence-based features, and the scope of network-based features has been narrow. Previous work from our group demonstrated the importance of using network-based features for predicting essential genes with high accuracy. Here, we apply our approach for the prediction of essential genes to organisms from the STRING database and host the results in a standalone website. Our database, NetGenes, contains essential gene predictions for 2,700+ bacteria predicted using features derived from STRING protein–protein functional association networks. Housing a total of over 2.1 million genes, NetGenes offers various features like essentiality scores, annotations, and feature vectors for each gene. NetGenes database is available from https://rbc-dsai-iitm.github.io/NetGenes/.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0242943
Author(s):  
Sutanu Nandi ◽  
Piyali Ganguli ◽  
Ram Rup Sarkar

Essential gene prediction helps to find minimal genes indispensable for the survival of any organism. Machine learning (ML) algorithms have been useful for the prediction of gene essentiality. However, currently available ML pipelines perform poorly for organisms with limited experimental data. The objective is the development of a new ML pipeline to help in the annotation of essential genes of less explored disease-causing organisms for which minimal experimental data is available. The proposed strategy combines unsupervised feature selection technique, dimension reduction using the Kamada-Kawai algorithm, and semi-supervised ML algorithm employing Laplacian Support Vector Machine (LapSVM) for prediction of essential and non-essential genes from genome-scale metabolic networks using very limited labeled dataset. A novel scoring technique, Semi-Supervised Model Selection Score, equivalent to area under the ROC curve (auROC), has been proposed for the selection of the best model when supervised performance metrics calculation is difficult due to lack of data. The unsupervised feature selection followed by dimension reduction helped to observe a distinct circular pattern in the clustering of essential and non-essential genes. LapSVM then created a curve that dissected this circle for the classification and prediction of essential genes with high accuracy (auROC > 0.85) even with 1% labeled data for model training. After successful validation of this ML pipeline on both Eukaryotes and Prokaryotes that show high accuracy even when the labeled dataset is very limited, this strategy is used for the prediction of essential genes of organisms with inadequate experimentally known data, such as Leishmania sp. Using a graph-based semi-supervised machine learning scheme, a novel integrative approach has been proposed for essential gene prediction that shows universality in application to both Prokaryotes and Eukaryotes with limited labeled data. The essential genes predicted using the pipeline provide an important lead for the prediction of gene essentiality and identification of novel therapeutic targets for antibiotic and vaccine development against disease-causing parasites.


Genes ◽  
2019 ◽  
Vol 10 (1) ◽  
pp. 31 ◽  
Author(s):  
Fengyu Zhang ◽  
Wei Peng ◽  
Yunfei Yang ◽  
Wei Dai ◽  
Junrong Song

Essential genes play an indispensable role in supporting the life of an organism. Identification of essential genes helps us to understand the underlying mechanism of cell life. The essential genes of bacteria are potential drug targets of some diseases genes. Recently, several computational methods have been proposed to detect essential genes based on the static protein–protein interactive (PPI) networks. However, these methods have ignored the fact that essential genes play essential roles under certain conditions. In this work, a novel method was proposed for the identification of essential proteins by fusing the dynamic PPI networks of different time points (called by FDP). Firstly, the active PPI networks of each time point were constructed and then they were fused into a final network according to the networks’ similarities. Finally, a novel centrality method was designed to assign each gene in the final network a ranking score, whilst considering its orthologous property and its global and local topological properties in the network. This model was applied on two different yeast data sets. The results showed that the FDP achieved a better performance in essential gene prediction as compared to other existing methods that are based on the static PPI network or that are based on dynamic networks.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Shuo Liu ◽  
Shu-Xuan Wang ◽  
Wei Liu ◽  
Chen Wang ◽  
Fa-Zhan Zhang ◽  
...  

Abstract Essential genes are key elements for organisms to maintain their living. Building databases that store essential genes in the form of homologous clusters, rather than storing them as a singleton, can provide more enlightening information such as the general essentiality of homologous genes in multiple organisms. In 2013, the first database to store prokaryotic essential genes in clusters, CEG (Clusters of Essential Genes), was constructed. Afterward, the amount of available data for essential genes increased by a factor >3 since the last revision. Herein, we updated CEG to version 2, including more prokaryotic essential genes (from 16 gene datasets to 29 gene datasets) and newly added eukaryotic essential genes (nine species), specifically the human essential genes of 12 cancer cell lines. For prokaryotes, information associated with drug targets, such as protein structure, ligand–protein interaction, virulence factor and matched drugs, is also provided. Finally, we provided the service of essential gene prediction for both prokaryotes and eukaryotes. We hope our updated database will benefit more researchers in drug targets and evolutionary genomics. Database URL: http://cefg.uestc.cn/ceg


Author(s):  
Xue Zhang ◽  
Wangxin Xiao ◽  
Weijia Xiao

AbstractMotivationAccurately predicting essential genes using computational methods can greatly reduce the effort in finding them via wet experiments at both time and resource scales, and further accelerate the process of drug discovery. Several computational methods have been proposed for predicting essential genes in model organisms by integrating multiple biological data sources either via centrality measures or machine learning based methods. However, the methods aiming to predict human essential genes are still limited and the performance still need improve. In addition, most of the machine learning based essential gene prediction methods are lack of skills to handle the imbalanced learning issue inherent in the essential gene prediction problem, which might be one factor affecting their performance.ResultsWe proposed a deep learning based method, DeepHE, to predict human essential genes by integrating features derived from sequence data and protein-protein interaction (PPI) network. A deep learning based network embedding method was utilized to automatically learn features from PPI network. In addition, 89 sequence features were derived from DNA sequence and protein sequence for each gene. These two types of features were integrated to train a multilayer neural network. A cost-sensitive technique was used to address the imbalanced learning problem when training the deep neural network. The experimental results for predicting human essential genes showed that our proposed method, DeepHE, can accurately predict human gene essentiality with an average AUC higher than 94%, the area under precision-recall curve (AP) higher than 90%, and the accuracy higher than 90%. We also compared DeepHE with several widely used traditional machine learning models (SVM, Naïve Bayes, Random Forest, Adaboost). The experimental results showed that DeepHE greatly outperformed the compared machine learning models.ConclusionsWe demonstrated that human essential genes can be accurately predicted by designing effective machine learning algorithm and integrating representative features captured from available biological data. The proposed deep learning framework is effective for such task.Availability and ImplementationThe python code will be freely available upon the acceptance of this manuscript at https://github.com/xzhang2016/[email protected]


Author(s):  
Olufemi Aromolaran ◽  
Damilare Aromolaran ◽  
Itunuoluwa Isewon ◽  
Jelili Oyelade

Abstract   Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes’ biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. Short abstract Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets’ discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 443
Author(s):  
Chyan-long Jan

Because of the financial information asymmetry, the stakeholders usually do not know a company’s real financial condition until financial distress occurs. Financial distress not only influences a company’s operational sustainability and damages the rights and interests of its stakeholders, it may also harm the national economy and society; hence, it is very important to build high-accuracy financial distress prediction models. The purpose of this study is to build high-accuracy and effective financial distress prediction models by two representative deep learning algorithms: Deep neural networks (DNN) and convolutional neural networks (CNN). In addition, important variables are selected by the chi-squared automatic interaction detector (CHAID). In this study, the data of Taiwan’s listed and OTC sample companies are taken from the Taiwan Economic Journal (TEJ) database during the period from 2000 to 2019, including 86 companies in financial distress and 258 not in financial distress, for a total of 344 companies. According to the empirical results, with the important variables selected by CHAID and modeling by CNN, the CHAID-CNN model has the highest financial distress prediction accuracy rate of 94.23%, and the lowest type I error rate and type II error rate, which are 0.96% and 4.81%, respectively.


2009 ◽  
Vol 29 (2) ◽  
pp. 71-75 ◽  
Author(s):  
Wu-Jie Su ◽  
Wei-De Shen ◽  
Bing Li ◽  
Yan Wu ◽  
Guang Gao ◽  
...  

In the present study, we studied the feasibility of deleting essential genes in insect cells by using bacmid and purifying recombinant bacmid in Escherichia coli DH10B cells. To disrupt the orf4 (open reading frame 4) gene of BmNPV [Bm (Bombyx mori) nuclear polyhedrosis virus], a transfer vector was constructed and co-transfected with BmNPV bacmid into Bm cells. Three passages of viruses were carried out in Bm cells, followed by one round of purification. Subsequently, bacmid DNA was extracted and transformed into competent DH10B cells. A colony harbouring only orf4-disrupted bacmid DNA was identified by PCR. A mixture of recombinant (white colonies) and non-recombinant (blue colonies) bacmids were also transformed into DH10B cells. PCR with M13 primers showed that the recombinant and non-recombinant bacmids were separated after transformation. The result confirmed that purification of recombinant viruses could be carried out simply by transformation and indicated that this method could be used to delete essential genes. Orf4-disrupted bacmid DNA was extracted and transfected into Bm cells. Viable viruses were produced, showing that orf4 was not an essential gene.


2021 ◽  
Vol 39 (3_suppl) ◽  
pp. 112-112
Author(s):  
Satoshi Fujii ◽  
Daisuke Kotani ◽  
Masahiro Hattori ◽  
Nishihara Masato ◽  
Toshihide Shikanai ◽  
...  

112 Background: Numerous genetic and epigenetic abnormalities may lead to various morphologies of cancer. However, exactly which gene abnormality causes which morphology is unknown. The VSQ Project aims at investigating a novel algorithm by synergistically fusing DL technology and pathological diagnostics for the prediction of cancer genome abnormalities. This was achieved by elucidating the association between the morphological findings and genetic abnormalities, including BRAF V600E mutations and MSI status directly linked to the therapeutic strategies for advanced CRC patients (pts). Methods: Clinicopathological-genomic integrated DB derived from SCRUM-Japan GI-SCREEN, a nation-wide cancer genome screening project including CRC, were used. A total of 1,657 images of thin sections (one representative image per pt) cut from formalin-fixed and paraffin-embedded (FFPE) tissue specimens from primary or metastatic tumors with genetic abnormalities confirmed by next-generation sequencing (NGS) were investigated; 1,234 and 423 images (one per pt) were used for training and validation cohorts, respectively. First, we developed image-prediction models based on the morphological features precisely annotated by the single central pathologist, and then constructed the DL algorithms (gene-prediction models) that enabled the prediction of gene abnormalities by using images filtered by the image-prediction models. Results: We achieved high accuracy of AUC > 0.90 for 12 features among the 33 morphological features analyzed. Next, we created several DL algorithms that enabled the prediction of BRAF mutations and MSI. The prediction level reached a high accuracy of AUC = 0.955 for the BRAF mutations and AUC = 0.857 for MSI in the training cohort. We verified the AUCs in the validation cohort and achieved AUC = 0.831 and 0.883 for BRAF mutations and MSI, respectively. Conclusions: Our findings suggest that VSQ can appropriately predict BRAF mutation and MSI status in advanced CRC, potentially without performing NGS tests. VSQ may also enable prompt initiation of systemic treatments in CRC patients as well as establish an unprecedented next-generation pathology in the near future.


Sign in / Sign up

Export Citation Format

Share Document