Graph Embedding Based Novel Gene Discovery Associated With Diabetes Mellitus

Diabetes mellitus is a group of complex metabolic disorders which has affected hundreds of millions of patients world-widely. The underlying pathogenesis of various types of diabetes is still unclear, which hinders the way of developing more efficient therapies. Although many genes have been found associated with diabetes mellitus, more novel genes are still needed to be discovered towards a complete picture of the underlying mechanism. With the development of complex molecular networks, network-based disease-gene prediction methods have been widely proposed. However, most existing methods are based on the hypothesis of guilt-by-association and often handcraft node features based on local topological structures. Advances in graph embedding techniques have enabled automatically global feature extraction from molecular networks. Inspired by the successful applications of cutting-edge graph embedding methods on complex diseases, we proposed a computational framework to investigate novel genes associated with diabetes mellitus. There are three main steps in the framework: network feature extraction based on graph embedding methods; feature denoising and regeneration using stacked autoencoder; and disease-gene prediction based on machine learning classifiers. We compared the performance by using different graph embedding methods and machine learning classifiers and designed the best workflow for predicting genes associated with diabetes mellitus. Functional enrichment analysis based on Human Phenotype Ontology (HPO), KEGG, and GO biological process and publication search further evaluated the predicted novel genes.

Download Full-text

Graph Learning for Combinatorial Optimization: A Survey of State-of-the-Art

Data Science and Engineering ◽

10.1007/s41019-021-00155-3 ◽

2021 ◽

Author(s):

Yun Peng ◽

Byron Choi ◽

Jianliang Xu

Keyword(s):

Machine Learning ◽

Combinatorial Optimization ◽

Graph Embedding ◽

Partial Solution ◽

Complex Data ◽

Learning Methods ◽

Graph Learning ◽

Second Stage ◽

End To End ◽

Embedding Methods

AbstractGraphs have been widely used to represent complex data in many applications, such as e-commerce, social networks, and bioinformatics. Efficient and effective analysis of graph data is important for graph-based applications. However, most graph analysis tasks are combinatorial optimization (CO) problems, which are NP-hard. Recent studies have focused a lot on the potential of using machine learning (ML) to solve graph-based CO problems. Most recent methods follow the two-stage framework. The first stage is graph representation learning, which embeds the graphs into low-dimension vectors. The second stage uses machine learning to solve the CO problems using the embeddings of the graphs learned in the first stage. The works for the first stage can be classified into two categories, graph embedding methods and end-to-end learning methods. For graph embedding methods, the learning of the the embeddings of the graphs has its own objective, which may not rely on the CO problems to be solved. The CO problems are solved by independent downstream tasks. For end-to-end learning methods, the learning of the embeddings of the graphs does not have its own objective and is an intermediate step of the learning procedure of solving the CO problems. The works for the second stage can also be classified into two categories, non-autoregressive methods and autoregressive methods. Non-autoregressive methods predict a solution for a CO problem in one shot. A non-autoregressive method predicts a matrix that denotes the probability of each node/edge being a part of a solution of the CO problem. The solution can be computed from the matrix using search heuristics such as beam search. Autoregressive methods iteratively extend a partial solution step by step. At each step, an autoregressive method predicts a node/edge conditioned to current partial solution, which is used to its extension. In this survey, we provide a thorough overview of recent studies of the graph learning-based CO methods. The survey ends with several remarks on future research directions.

Download Full-text

A Comparative Study of Classification-Based Machine Learning Methods for Novel Disease Gene Prediction

Advances in Intelligent Systems and Computing - Knowledge and Systems Engineering ◽

10.1007/978-3-319-11680-8_46 ◽

2015 ◽

pp. 577-588 ◽

Cited By ~ 9

Author(s):

Duc-Hau Le ◽

Nguyen Xuan Hoai ◽

Yung-Keun Kwon

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Disease Gene ◽

Gene Prediction ◽

Learning Methods ◽

Disease Gene Prediction ◽

Machine Learning Methods

Download Full-text

Evaluating the performance metrics of different machine learning classifiers by combined feature extraction method in Alzheimer's disease detection

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2019/397112019 ◽

2019 ◽

Vol 7 (11) ◽

pp. 652-658

Author(s):

Dinu A.J ◽

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Feature Extraction ◽

Extraction Method ◽

Performance Metrics ◽

Disease Detection ◽

Feature Extraction Method ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Combined Feature

Download Full-text

Recent advances in network-based methods for disease gene prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbaa303 ◽

2020 ◽

Author(s):

Sezin Kircali Ata ◽

Min Wu ◽

Yuan Fang ◽

Le Ou-Yang ◽

Chee Keong Kwoh ◽

...

Keyword(s):

Empirical Analysis ◽

Genome Wide Association Study ◽

Disease Gene ◽

Gene Prediction ◽

Representation Learning ◽

Graph Representation ◽

Molecular Networks ◽

Learning Methods ◽

Gene Association ◽

Disease Gene Prediction

Abstract Disease–gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease–gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease–gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.

Download Full-text

Analysis of cryptosystem recognition scheme based on Euclidean distance feature extraction in three machine learning classifiers

Journal of Physics Conference Series ◽

10.1088/1742-6596/1314/1/012184 ◽

2019 ◽

Vol 1314 ◽

pp. 012184

Author(s):

SiJie Fan ◽

YaQun Zhao ◽

SiJie Fan

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Euclidean Distance ◽

Machine Learning Classifiers ◽

Learning Classifiers

Download Full-text

Users' Emotions Analysis based on Hybrid Feature Extraction Techniques

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206658 ◽

2020 ◽

pp. 291-296

Author(s):

Sulis Sandiwarno

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Learning Systems ◽

Experimental Results ◽

Extraction Techniques ◽

Machine Learning Classifiers ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

E Learning ◽

Hybrid Feature Extraction

In order to solve some problems of importance of words and missing relations of semantic between words in the emotional analysis of e-learning systems, the TF-IWF algorithm weighted Word2vec algorithm model was proposed as a feature extraction algorithm. Moreover, to support this study, we employ Multinomial Naïve Bayes (MNB) to obtain more accurate results. There are three mainly steps, firstly, TF-IWF is employed used to compute the weight of word. Second, Word2vec algorithm is adopted to compute the vector of words, Third, we concatenate first and second steps. Finally, the users' opinions data is trained and classified through several machine learning classifiers especially MNB classifier. The experimental results indicate that the proposed method outperformed against previous approaches in terms of precision, recall, F-Score, and accuracy.

Download Full-text

Texture Feature Extraction: Impact of Variants on Performance of Machine Learning Classifiers: Study on Chest X-Ray – Pneumonia Images

Big Data Analytics - Lecture Notes in Computer Science ◽

10.1007/978-3-030-66665-1_11 ◽

2020 ◽

pp. 151-163

Author(s):

Anamika Gupta ◽

Anshuman Gupta ◽

Vaishnavi Verma ◽

Aayush Khattar ◽

Devansh Sharma

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Texture Feature ◽

Texture Feature Extraction ◽

X Ray ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Chest X Ray

Download Full-text

Sentiment Analysis in Portuguese Texts from Online Health Community Forums: Data, Model and Evaluation

10.5753/stil.2021.17785 ◽

2021 ◽

Author(s):

Yohan Bonescki Gumiel ◽

Isabela Lee ◽

Tayane Arantes Soares ◽

Thiago Castro Ferreira ◽

Adriana Pagano

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Data Model ◽

Negative Evaluation ◽

Support Vector ◽

Machine Learning Classifiers ◽

Online Health Community ◽

Health Community

This study introduces novel data and models for the task of Sentiment Analysis in Portuguese texts about Diabetes Mellitus. The corpus contains 1290 posts retrieved from online health community forums in Portuguese and annotated by two annotators according to 3 sentiment categories (e.g. Positive, Neutral and Negative). Evaluation of traditional (Support Vector Machine, Decision Tree, Random Forest and Logistic Regression classifiers) and state-ofthe-art (BERT-based models) machine learning classifiers for the task showed the advantage in performance of the latter models as expected. Data and models are available to the community upon request.

Download Full-text

Performance Evaluation of Feature Extraction and Dimensionality Reduction Techniques on Various machine learning classifiers

2019 IEEE 9th International Conference on Advanced Computing (IACC) ◽

10.1109/iacc48062.2019.8971466 ◽

2019 ◽

Author(s):

Md. Golam Sarowar ◽

Arthy Anjum Jamal ◽

Anik Saha ◽

Abir Saha

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Performance Evaluation ◽

Dimensionality Reduction ◽

Machine Learning Classifiers ◽

Reduction Techniques ◽

Learning Classifiers ◽

Dimensionality Reduction Techniques

Download Full-text

Tweets Classification on the Base of Sentiments for US Airline Companies

Entropy ◽

10.3390/e21111078 ◽

2019 ◽

Vol 21 (11) ◽

pp. 1078 ◽

Cited By ~ 7

Author(s):

Furqan Rustam ◽

Imran Ashraf ◽

Arif Mehmood ◽

Saleem Ullah ◽

Gyu Choi

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Sentiment Analysis ◽

Stochastic Gradient Descent ◽

Ensemble Classifiers ◽

Term Frequency ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Lower Accuracy ◽

The Impact

The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. Moreover, the performance of a deep long short-term memory (LSTM) network was analyzed on the selected dataset. The results show that the proposed VC performs better than that of other classifiers. The VC is able to achieve an accuracy of 0.789, and 0.791 with TF and TF-IDF feature extraction, respectively. The results demonstrate that ensemble classifiers achieve higher accuracy than non-ensemble classifiers. Experiments further proved that the performance of machine learning classifiers is better when TF-IDF is used as the feature extraction method. Word2vec feature extraction performs worse than TF and TF-IDF feature extraction. The LSTM achieves a lower accuracy than machine learning classifiers.

Download Full-text