Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology

AbstractIdentifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data.In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD).We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction.

Download Full-text

A Pragmatic Comparison of Supervised Machine Learning Classifiers for Disease Diagnosis

10.1109/icirca51532.2021.9544582 ◽

2021 ◽

Author(s):

Ifra Altaf ◽

Muheet Ahmed Butt ◽

Majid Zaman

Keyword(s):

Machine Learning ◽

Disease Diagnosis ◽

Supervised Machine Learning ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Supervised Machine Learning Classifiers

Download Full-text

Classifying Lensed Gravitational Waves in the Geometrical Optics Limit with Machine Learning

American Journal of Undergraduate Research ◽

10.33697/ajur.2019.019 ◽

2019 ◽

Vol 16 (2) ◽

pp. 5-16

Author(s):

Amit Singh ◽

Ivan Li ◽

Otto Hannuksela ◽

Tjonnie Li ◽

Kyungmin Kim

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Gravitational Wave ◽

Gravitational Waves ◽

Geometrical Optics ◽

Supervised Machine Learning ◽

Support Vector ◽

Multi Layer Perceptron ◽

Machine Learning Classifiers ◽

Learning Classifiers

Gravitational waves are theorized to be gravitationally lensed when they propagate near massive objects. Such lensing effects cause potentially detectable repeated gravitational wave patterns in ground- and space-based gravitational wave detectors. These effects are difficult to discriminate when the lens is small and the repeated patterns superpose. Traditionally, matched filtering techniques are used to identify gravitational-wave signals, but we instead aim to utilize machine learning techniques to achieve this. In this work, we implement supervised machine learning classifiers (support vector machine, random forest, multi-layer perceptron) to discriminate such lensing patterns in gravitational wave data. We train classifiers with spectrograms of both lensed and unlensed waves using both point-mass and singular isothermal sphere lens models. As the result, classifiers return F1 scores ranging from 0:852 to 0:996, with precisions from 0:917 to 0:992 and recalls ranging from 0:796 to 1:000 depending on the type of classifier and lensing model used. This supports the idea that machine learning classifiers are able to correctly determine lensed gravitational wave signals. This also suggests that in the future, machine learning classifiers may be used as a possible alternative to identify lensed gravitational wave events and to allow us to study gravitational wave sources and massive astronomical objects through further analysis. KEYWORDS: Gravitational Waves; Gravitational Lensing; Geometrical Optics; Machine Learning; Classification; Support Vector Machine; Random Tree Forest; Multi-layer Perceptron

Download Full-text

Defending Malicious Script Attacks Using Machine Learning Classifiers

Wireless Communications and Mobile Computing ◽

10.1155/2017/5360472 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

Nayeem Khan ◽

Johari Abdullah ◽

Adnan Shahid Khan

Keyword(s):

Machine Learning ◽

Web Application ◽

Malicious Code ◽

Supervised Machine Learning ◽

Feature Subset ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Wrapper Method ◽

Supervised Machine Learning Classifiers ◽

Client Side

The web application has become a primary target for cyber criminals by injecting malware especially JavaScript to perform malicious activities for impersonation. Thus, it becomes an imperative to detect such malicious code in real time before any malicious activity is performed. This study proposes an efficient method of detecting previously unknown malicious java scripts using an interceptor at the client side by classifying the key features of the malicious code. Feature subset was obtained by using wrapper method for dimensionality reduction. Supervised machine learning classifiers were used on the dataset for achieving high accuracy. Experimental results show that our method can efficiently classify malicious code from benign code with promising results.

Download Full-text

A Systematic Comparison and Evaluation of Supervised Machine Learning Classifiers Using Headache Dataset

Lecture Notes in Computer Science - Advanced Intelligent Computing Theories and Applications ◽

10.1007/978-3-319-22053-6_12 ◽

2015 ◽

pp. 101-108 ◽

Cited By ~ 4

Author(s):

Ahmed J. Aljaaf ◽

Dhiya Al-Jumeily ◽

Abir J. Hussain ◽

Paul Fergus ◽

Mohammed Al-Jumaily ◽

...

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Systematic Comparison ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Supervised Machine Learning Classifiers

Download Full-text

Port-Scanning Attack Detection Using Supervised Machine Learning Classifiers

10.1109/esmarta52612.2021.9515743 ◽

2021 ◽

Author(s):

Akram Q. M. Algaolahi ◽

Abdullah A. Hasan ◽

Amer Sallam ◽

Abdullah M. Sharaf ◽

Aseel A. Abdu ◽

...

Keyword(s):

Machine Learning ◽

Attack Detection ◽

Supervised Machine Learning ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Supervised Machine Learning Classifiers

Download Full-text

Local Binary Pattern-Based Texture Analysis to Predict IDH Genotypes of Glioma Cancer Using Supervised Machine Learning Classifiers

Advances in Intelligent Systems and Computing - Emerging Technologies in Data Mining and Information Security ◽

10.1007/978-981-33-4367-2_1 ◽

2021 ◽

pp. 3-13

Author(s):

Sonal Gore ◽

Jayant Jagtap

Keyword(s):

Machine Learning ◽

Texture Analysis ◽

Local Binary Pattern ◽

Supervised Machine Learning ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Supervised Machine Learning Classifiers

Download Full-text

Prediction of Orthosteric and Allosteric Regulations on Cannabinoid Receptors Using Supervised Machine Learning Classifiers

Molecular Pharmaceutics ◽

10.1021/acs.molpharmaceut.9b00182 ◽

2019 ◽

Vol 16 (6) ◽

pp. 2605-2615 ◽

Cited By ~ 11

Author(s):

Yuemin Bian ◽

Yankang Jing ◽

Lirong Wang ◽

Shifan Ma ◽

Jaden Jungho Jun ◽

...

Keyword(s):

Machine Learning ◽

Cannabinoid Receptors ◽

Supervised Machine Learning ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Supervised Machine Learning Classifiers

Download Full-text

A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis

PLoS ONE ◽

10.1371/journal.pone.0245909 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0245909

Author(s):

Furqan Rustam ◽

Madiha Khalid ◽

Waqar Aslam ◽

Vaibhav Rupapara ◽

Arif Mehmood ◽

...

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Performance Comparison ◽

Supervised Machine Learning ◽

Accuracy Score ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Analysis Technique ◽

Realistic Assessment

The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analysis using a supervised machine learning approach. Identification of Covid-19 sentiments from tweets would allow informed decisions for better handling the current pandemic situation. The used dataset is extracted from Twitter using IDs as provided by the IEEE data port. Tweets are extracted by an in-house built crawler that uses the Tweepy library. The dataset is cleaned using the preprocessing techniques and sentiments are extracted using the TextBlob library. The contribution of this work is the performance evaluation of various machine learning classifiers using our proposed feature set. This set is formed by concatenating the bag-of-words and the term frequency-inverse document frequency. Tweets are classified as positive, neutral, or negative. Performance of classifiers is evaluated on the accuracy, precision, recall, and F1 score. For completeness, further investigation is made on the dataset using the Long Short-Term Memory (LSTM) architecture of the deep learning model. The results show that Extra Trees Classifiers outperform all other models by achieving a 0.93 accuracy score using our proposed concatenated features set. The LSTM achieves low accuracy as compared to machine learning classifiers. To demonstrate the effectiveness of our proposed feature set, the results are compared with the Vader sentiment analysis technique based on the GloVe feature extraction approach.

Download Full-text

Prediction of COVID-19 Patient using Supervised Machine Learning Algorithm

Sains Malaysiana ◽

10.17576/jsm-2021-5008-28 ◽

2021 ◽

Vol 50 (8) ◽

pp. 2479-2497

Author(s):

Buvana M. ◽

Muthumayil K.

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Nasal Congestion ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Set ◽

Physiological Measurement ◽

Machine Learning Classifiers ◽

Balanced Diet ◽

Learning Classifiers

One of the most symptomatic diseases is COVID-19. Early and precise physiological measurement-based prediction of breathing will minimize the risk of COVID-19 by a reasonable distance from anyone; wearing a mask, cleanliness, medication, balanced diet, and if not well stay safe at home. To evaluate the collected datasets of COVID-19 prediction, five machine learning classifiers were used: Nave Bayes, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), and Decision Tree. COVID-19 datasets from the Repository were combined and re-examined to remove incomplete entries, and a total of 2500 cases were utilized in this study. Features of fever, body pain, runny nose, difficulty in breathing, shore throat, and nasal congestion, are considered to be the most important differences between patients who have COVID-19s and those who do not. We exhibit the prediction functionality of five machine learning classifiers. A publicly available data set was used to train and assess the model. With an overall accuracy of 99.88 percent, the ensemble model is performed commendably. When compared to the existing methods and studies, the proposed model is performed better. As a result, the model presented is trustworthy and can be used to screen COVID-19 patients timely, efficiently.

Download Full-text

Performance Evaluation of Supervised Machine Learning Classifiers for Analyzing Agricultural Big Data

Smart Network Inspired Paradigm and Approaches in IoT Applications ◽

10.1007/978-981-13-8614-5_8 ◽

2019 ◽

pp. 135-150

Author(s):

R. Anusuya ◽

S. Krishnaveni

Keyword(s):

Machine Learning ◽

Big Data ◽

Performance Evaluation ◽

Supervised Machine Learning ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Supervised Machine Learning Classifiers

Download Full-text