Machine learning-based approaches for disease gene prediction

2020 ◽  
Vol 19 (5-6) ◽  
pp. 350-363
Author(s):  
Duc-Hau Le

Abstract Disease gene prediction is an essential issue in biomedical research. In the early days, annotation-based approaches were proposed for this problem. With the development of high-throughput technologies, interaction data between genes/proteins have grown quickly and covered almost genome and proteome; thus, network-based methods for the problem become prominent. In parallel, machine learning techniques, which formulate the problem as a classification, have also been proposed. Here, we firstly show a roadmap of the machine learning-based methods for the disease gene prediction. In the beginning, the problem was usually approached using a binary classification, where positive and negative training sample sets are comprised of disease genes and non-disease genes, respectively. The disease genes are ones known to be associated with diseases; meanwhile, non-disease genes were randomly selected from those not yet known to be associated with diseases. However, the later may contain unknown disease genes. To overcome this uncertainty of defining the non-disease genes, more realistic approaches have been proposed for the problem, such as unary and semi-supervised classification. Recently, more advanced methods, including ensemble learning, matrix factorization and deep learning, have been proposed for the problem. Secondly, 12 representative machine learning-based methods for the disease gene prediction were examined and compared in terms of prediction performance and running time. Finally, their advantages, disadvantages, interpretability and trust were also analyzed and discussed.

Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1713
Author(s):  
Manuela Petti ◽  
Lorenzo Farina ◽  
Federico Francone ◽  
Stefano Lucidi ◽  
Amalia Macali ◽  
...  

Disease gene prediction is to date one of the main computational challenges of precision medicine. It is still uncertain if disease genes have unique functional properties that distinguish them from other non-disease genes or, from a network perspective, if they are located randomly in the interactome or show specific patterns in the network topology. In this study, we propose a new method for disease gene prediction based on the use of biological knowledge-bases (gene-disease associations, genes functional annotations, etc.) and interactome network topology. The proposed algorithm called MOSES is based on the definition of two somewhat opposing sets of genes both disease-specific from different perspectives: warm seeds (i.e., disease genes obtained from databases) and cold seeds (genes far from the disease genes on the interactome and not involved in their biological functions). The application of MOSES to a set of 40 diseases showed that the suggested putative disease genes are significantly enriched in their reference disease. Reassuringly, known and predicted disease genes together, tend to form a connected network module on the human interactome, mitigating the scattered distribution of disease genes which is probably due to both the paucity of disease-gene associations and the incompleteness of the interactome.


Metagenomics ◽  
2017 ◽  
Vol 1 (1) ◽  
Author(s):  
Hayssam Soueidan ◽  
Macha Nikolski

AbstractOwing to the complexity and variability of metagenomic studies, modern machine learning approaches have seen increased usage to answer a variety of question encompassing the full range of metagenomic NGS data analysis.We review here the contribution of machine learning techniques for the field of metagenomics, by presenting known successful approaches in a unified framework. This review focuses on five important metagenomic problems:OTU-clustering, binning, taxonomic proffiing and assignment, comparative metagenomics and gene prediction. For each of these problems, we identify the most prominent methods, summarize the machine learning approaches used and put them into perspective of similar methods.We conclude our review looking further ahead at the challenge posed by the analysis of interactions within microbial communities and different environments, in a field one could call “integrative metagenomics”.


2012 ◽  
Vol 10 (10) ◽  
pp. 547
Author(s):  
Mei Zhang ◽  
Gregory Johnson ◽  
Jia Wang

<span style="font-family: Times New Roman; font-size: small;"> </span><p style="margin: 0in 0.5in 0pt; text-align: justify; mso-pagination: none; mso-layout-grid-align: none;" class="MsoNormal"><span style="color: black; font-size: 10pt; mso-themecolor: text1;"><span style="font-family: Times New Roman;">A takeover success prediction model aims at predicting the probability that a takeover attempt will succeed by using publicly available information at the time of the announcement.<span style="mso-spacerun: yes;"> </span>We perform a thorough study using machine learning techniques to predict takeover success.<span style="mso-spacerun: yes;"> </span>Specifically, we model takeover success prediction as a binary classification problem, which has been widely studied in the machine learning community.<span style="mso-spacerun: yes;"> </span>Motivated by the recent advance in machine learning, we empirically evaluate and analyze many state-of-the-art classifiers, including logistic regression, artificial neural network, support vector machines with different kernels, decision trees, random forest, and Adaboost.<span style="mso-spacerun: yes;"> </span>The experiments validate the effectiveness of applying machine learning in takeover success prediction, and we found that the support vector machine with linear kernel and the Adaboost with stump weak classifiers perform the best for the task.<span style="mso-spacerun: yes;"> </span>The result is consistent with the general observations of these two approaches.</span></span></p><span style="font-family: Times New Roman; font-size: small;"> </span>


Data ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 65 ◽  
Author(s):  
Kanadpriya Basu ◽  
Treena Basu ◽  
Ron Buckmire ◽  
Nishu Lal

Every year, academic institutions invest considerable effort and substantial resources to influence, predict and understand the decision-making choices of applicants who have been offered admission. In this study, we applied several supervised machine learning techniques to four years of data on 11,001 students, each with 35 associated features, admitted to a small liberal arts college in California to predict student college commitment decisions. By treating the question of whether a student offered admission will accept it as a binary classification problem, we implemented a number of different classifiers and then evaluated the performance of these algorithms using the metrics of accuracy, precision, recall, F-measure and area under the receiver operator curve. The results from this study indicate that the logistic regression classifier performed best in modeling the student college commitment decision problem, i.e., predicting whether a student will accept an admission offer, with an AUC score of 79.6%. The significance of this research is that it demonstrates that many institutions could use machine learning algorithms to improve the accuracy of their estimates of entering class sizes, thus allowing more optimal allocation of resources and better control over net tuition revenue.


2020 ◽  
Author(s):  
Gurcan Comert ◽  
Negash Begashaw ◽  
Ayse Turhan-Comert

AbstractIn this paper, we utilized and compared selected machine learning techniques to detect malaria out-break using observed variables of maximum temperature, minimum temperature, humidity, rainfall amount, positive case, and Plasmodium Falciparum rate. Random decision tree, logistic regression, and Gaussian processes are specially analyzed and adopted to be applied for malaria outbreak detection. The problem is a binary classification with outcomes of outbreak or no outbreak. Sample data provided in the literature from Maharashtra, India is used. Performance of the models are compared with the results from similar studies. Based on the sample data used, we were able to detect the malaria outbreak without any false positive or false negative errors in the testing dataset.


2020 ◽  
Vol 9 (1) ◽  
pp. 1667-1670

Autism spectrum disorder (ASD) is a neurological issue that begins from early in childhood and proceeds all through such an individual's reality. It will influence that individual's conduct, speech with others, interference, and learning. Right now anyway, Autism Spectrum Disorder is to be distinguished in the beginning period, which is conceivable. Early acknowledgment of Autism Spectrum Disorder will improve the general psychological wellness of that particular youngster. The machine learning methodology is applied to diagnose Autism Spectrum Disorder (ASD), and in this work, we have used machine learning techniques and Optimization on an ASD dataset. We have employed XGBoost algorithms to the dataset considered, and as a result, efficient outputs are obtained. This will be incredible for the use of doctors to help them recognize Autism Spectrum Disorder at an early prior stage.


Author(s):  
A. V. Deorankar ◽  
Shiwani S. Thakare

IoT is the network which connects and communicates with billions of devices through the internet and due to the massive use of IoT devices, the shared data between the devices or over the network is not confidential because of increasing growth of cyberattacks. The network traffic via loT systems is growing widely and introducing new cybersecurity challenges since these loT devices are connected to sensors that are directly connected to large-scale cloud servers. In order to reduce these cyberattacks, the developers need to raise new techniques for detecting infected loT devices. In this work, to control over this cyberattacks, the fog layer is introduced, to maintain the security of data on a cloud. Also the working of fog layer and different anomaly detection techniques to prevent the cyberattacks has been studied. The proposed AD-IoT can significantly detect malicious behavior using anomalies based on machine learning classification before distributing on a cloud layer. This work discusses the role of machine learning techniques for identifying the type of Cyberattacks. There are two ML techniques i.e. RF and MLP evaluated on the USNW-NB15 dataset. The accuracy and false alarm rate of the techniques are assessed, and the results revealed the superiority of the RF compared with MLP. The Accuracy measures by classifiers are 98 and 53 of RF and MLP respectively, which shows a huge difference and prove the RF as most efficient algorithm with binary classification as well as multi- classification.


2006 ◽  
Author(s):  
Christopher Schreiner ◽  
Kari Torkkola ◽  
Mike Gardner ◽  
Keshu Zhang

Sign in / Sign up

Export Citation Format

Share Document