scholarly journals Cross-Predicting Essential Genes between Two Model Eukaryotic Species Using Machine Learning

2021 ◽  
Vol 22 (10) ◽  
pp. 5056
Author(s):  
Tulio L. Campos ◽  
Pasi K. Korhonen ◽  
Neil D. Young

Experimental studies of Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular and cellular processes in metazoans at large. Since the publication of their genomes, functional genomic investigations have identified genes that are essential or non-essential for survival in each species. Recently, a range of features linked to gene essentiality have been inferred using a machine learning (ML)-based approach, allowing essentiality predictions within a species. Nevertheless, predictions between species are still elusive. Here, we undertake a comprehensive study using ML to discover and validate features of essential genes common to both C. elegans and D. melanogaster. We demonstrate that the cross-species prediction of gene essentiality is possible using a subset of features linked to nucleotide/protein sequences, protein orthology and subcellular localisation, single-cell RNA-seq, and histone methylation markers. Complementary analyses showed that essential genes are enriched for transcription and translation functions and are preferentially located away from heterochromatin regions of C. elegans and D. melanogaster chromosomes. The present work should enable the cross-prediction of essential genes between model and non-model metazoans.

2017 ◽  
Vol 13 (8) ◽  
pp. 1584-1596 ◽  
Author(s):  
Sutanu Nandi ◽  
Abhishek Subramanian ◽  
Ram Rup Sarkar

We propose an integrated machine learning process to predict gene essentiality in Escherichia coli K-12 MG1655 metabolism that outperforms known methods.


2017 ◽  
Vol 13 (3) ◽  
pp. 577-584 ◽  
Author(s):  
Yongming Yu ◽  
Licai Yang ◽  
Zhiping Liu ◽  
Chuansheng Zhu

Predicting bacterial essential genes using only fractal features.


2020 ◽  
Author(s):  
Xue Zhang ◽  
Weijia Xiao ◽  
Wangxin Xiao

ABSTRACTEssential genes are necessary to the survival or reproduction of a living organism. The prediction and analysis of gene essentiality can advance our understanding to basic life and human diseases, and further boost the development of new drugs. Wet lab methods for identifying essential genes are often costly, time consuming, and laborious. As a complement, computational methods have been proposed to predict essential genes by integrating multiple biological data sources. Most of these methods are evaluated on model organisms. However, prediction methods for human essential genes are still limited and the relationship between human gene essentiality and different biological information still needs to be explored. In addition, exploring suitable deep learning techniques to overcome the limitations of traditional machine learning methods and improve the prediction accuracy is also important and interesting. We propose a deep learning based method, DeepSF, to predict human essential genes. DeepSF integrates sequence features derived from DNA and protein sequence data with features extracted or learned from different types of functional data, such as gene ontology, protein complex, protein domain, and protein-protein interaction network. More than 200 features from these biological data are extracted/learned which are integrated together to train a cost-sensitive deep neural network by utilizing multiple deep leaning techniques. The experimental results of 10-fold cross validation show that DeepSF can accurately predict human gene essentiality with an average AUC of 95.17%, the area under precision-recall curve (auPRC) of 92.21%, the accuracy of 91.59%, and the F1 measure about 78.71%. In addition, the comparison experimental results show that DeepSF significantly outperforms several popular traditional machine learning models (SVM, Random Forest, and Adaboost), and performs slightly better than a recent deep learning model (DeepHE). We have demonstrated that the proposed method, DeepSF, is effective for predicting human essential genes. Deep learning techniques are promising at both feature learning and classification levels for the task of essential gene prediction.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0242943
Author(s):  
Sutanu Nandi ◽  
Piyali Ganguli ◽  
Ram Rup Sarkar

Essential gene prediction helps to find minimal genes indispensable for the survival of any organism. Machine learning (ML) algorithms have been useful for the prediction of gene essentiality. However, currently available ML pipelines perform poorly for organisms with limited experimental data. The objective is the development of a new ML pipeline to help in the annotation of essential genes of less explored disease-causing organisms for which minimal experimental data is available. The proposed strategy combines unsupervised feature selection technique, dimension reduction using the Kamada-Kawai algorithm, and semi-supervised ML algorithm employing Laplacian Support Vector Machine (LapSVM) for prediction of essential and non-essential genes from genome-scale metabolic networks using very limited labeled dataset. A novel scoring technique, Semi-Supervised Model Selection Score, equivalent to area under the ROC curve (auROC), has been proposed for the selection of the best model when supervised performance metrics calculation is difficult due to lack of data. The unsupervised feature selection followed by dimension reduction helped to observe a distinct circular pattern in the clustering of essential and non-essential genes. LapSVM then created a curve that dissected this circle for the classification and prediction of essential genes with high accuracy (auROC > 0.85) even with 1% labeled data for model training. After successful validation of this ML pipeline on both Eukaryotes and Prokaryotes that show high accuracy even when the labeled dataset is very limited, this strategy is used for the prediction of essential genes of organisms with inadequate experimentally known data, such as Leishmania sp. Using a graph-based semi-supervised machine learning scheme, a novel integrative approach has been proposed for essential gene prediction that shows universality in application to both Prokaryotes and Eukaryotes with limited labeled data. The essential genes predicted using the pipeline provide an important lead for the prediction of gene essentiality and identification of novel therapeutic targets for antibiotic and vaccine development against disease-causing parasites.


Author(s):  
Olufemi Aromolaran ◽  
Damilare Aromolaran ◽  
Itunuoluwa Isewon ◽  
Jelili Oyelade

Abstract   Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes’ biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. Short abstract Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets’ discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.


The work of multilayer glass structures for central and eccentric compression and bending are considered. The substantiation of the chosen research topic is made. The description and features of laminated glass for the structures investigated, their characteristics are presented. The analysis of the results obtained when testing for compression, compression with bending, simple bending of models of columns, beams, samples of laminated glass was made. Overview of the types and nature of destruction of the models are presented, diagrams of material operation are constructed, average values of the resistance of the cross-sections of samples are obtained, the table of destructive loads is generated. The need for development of a set of rules and guidelines for the design of glass structures, including laminated glass, for bearing elements, as well as standards for testing, rules for assessing the strength, stiffness, crack resistance and methods for determining the strength of control samples is emphasized. It is established that the strength properties of glass depend on the type of applied load and vary widely, and significantly lower than the corresponding normative values of the strength of heat-strengthened glass. The effect of the connecting polymeric material and manufacturing technology of laminated glass on the strength of the structure is also shown. The experimental values of the elastic modulus are different in different directions of the cross section and in the direction perpendicular to the glass layers are two times less than along the glass layers.


Author(s):  
Turan G. Bali ◽  
Amit Goyal ◽  
Dashan Huang ◽  
Fuwei Jiang ◽  
Quan Wen

Genetics ◽  
1988 ◽  
Vol 120 (4) ◽  
pp. 977-986
Author(s):  
K J Kemphues ◽  
M Kusch ◽  
N Wolf

Abstract We have analyzed a set of linkage group (LG) II maternal-effect lethal mutations in Caenorhabditis elegans isolated by a new screening procedure. Screens of 12,455 F1 progeny from mutagenized adults resulted in the recovery of 54 maternal-effect lethal mutations identifying 29 genes. Of the 54 mutations, 39 are strict maternal-effect mutations defining 17 genes. These 17 genes fall into two classes distinguished by frequency of mutation to strict maternal-effect lethality. The smaller class, comprised of four genes, mutated to strict maternal-effect lethality at a frequency close to 5 X 10(-4), a rate typical of essential genes in C. elegans. Two of these genes are expressed during oogenesis and required exclusively for embryogenesis (pure maternal genes), one appears to be required specifically for meiosis, and the fourth has a more complex pattern of expression. The other 13 genes were represented by only one or two strict maternal alleles each. Two of these are identical genes previously identified by nonmaternal embryonic lethal mutations. We interpret our results to mean that although many C. elegans genes can mutate to strict maternal-effect lethality, most genes mutate to that phenotype rarely. Pure maternal genes, however, are among a smaller class of genes that mutate to maternal-effect lethality at typical rates. If our interpretation is correct, we are near saturation for pure maternal genes in the region of LG II balanced by mnC1. We conclude that the number of pure maternal genes in C. elegans is small, being probably not much higher than 12.


Biology ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 163
Author(s):  
Swapnil Gupta ◽  
Panpan You ◽  
Tanima SenGupta ◽  
Hilde Nilsen ◽  
Kulbhushan Sharma

Genomic integrity is maintained by DNA repair and the DNA damage response (DDR). Defects in certain DNA repair genes give rise to many rare progressive neurodegenerative diseases (NDDs), such as ocular motor ataxia, Huntington disease (HD), and spinocerebellar ataxias (SCA). Dysregulation or dysfunction of DDR is also proposed to contribute to more common NDDs, such as Parkinson’s disease (PD), Alzheimer’s disease (AD), and Amyotrophic Lateral Sclerosis (ALS). Here, we present mechanisms that link DDR with neurodegeneration in rare NDDs caused by defects in the DDR and discuss the relevance for more common age-related neurodegenerative diseases. Moreover, we highlight recent insight into the crosstalk between the DDR and other cellular processes known to be disturbed during NDDs. We compare the strengths and limitations of established model systems to model human NDDs, ranging from C. elegans and mouse models towards advanced stem cell-based 3D models.


2021 ◽  
pp. 002224292199708
Author(s):  
Raji Srinivasan ◽  
Gülen Sarial-Abi

Algorithms increasingly used by brands sometimes fail to perform as expected or even worse, cause harm, causing brand harm crises. Unfortunately, algorithm failures are increasing in frequency. Yet, we know little about consumers’ responses to brands following such brand harm crises. Extending developments in the theory of mind perception, we hypothesize that following a brand harm crisis caused by an algorithm error (vs. human error), consumers will respond less negatively to the brand. We further hypothesize that consumers’ lower mind perception of agency of the algorithm (vs. human) for the error that lowers their perceptions of the algorithm’s responsibility for the harm caused by the error will mediate this relationship. We also hypothesize four moderators of this relationship: two algorithm characteristics, anthropomorphized algorithm and machine learning algorithm and two task characteristics where the algorithm is deployed, subjective (vs. objective) task and interactive (vs. non-interactive) task. We find support for the hypotheses in eight experimental studies including two incentive-compatible studies. We examine the effects of two managerial interventions to manage the aftermath of brand harm crises caused by algorithm errors. The research’s findings advance the literature on brand harm crises, algorithm usage, and algorithmic marketing and generate managerial guidelines to address the aftermath of such brand harm crises.


Sign in / Sign up

Export Citation Format

Share Document