Graph Learning for Combinatorial Optimization: A Survey of State-of-the-Art

AbstractGraphs have been widely used to represent complex data in many applications, such as e-commerce, social networks, and bioinformatics. Efficient and effective analysis of graph data is important for graph-based applications. However, most graph analysis tasks are combinatorial optimization (CO) problems, which are NP-hard. Recent studies have focused a lot on the potential of using machine learning (ML) to solve graph-based CO problems. Most recent methods follow the two-stage framework. The first stage is graph representation learning, which embeds the graphs into low-dimension vectors. The second stage uses machine learning to solve the CO problems using the embeddings of the graphs learned in the first stage. The works for the first stage can be classified into two categories, graph embedding methods and end-to-end learning methods. For graph embedding methods, the learning of the the embeddings of the graphs has its own objective, which may not rely on the CO problems to be solved. The CO problems are solved by independent downstream tasks. For end-to-end learning methods, the learning of the embeddings of the graphs does not have its own objective and is an intermediate step of the learning procedure of solving the CO problems. The works for the second stage can also be classified into two categories, non-autoregressive methods and autoregressive methods. Non-autoregressive methods predict a solution for a CO problem in one shot. A non-autoregressive method predicts a matrix that denotes the probability of each node/edge being a part of a solution of the CO problem. The solution can be computed from the matrix using search heuristics such as beam search. Autoregressive methods iteratively extend a partial solution step by step. At each step, an autoregressive method predicts a node/edge conditioned to current partial solution, which is used to its extension. In this survey, we provide a thorough overview of recent studies of the graph learning-based CO methods. The survey ends with several remarks on future research directions.

Download Full-text

Prediction of Maternal Hemorrhage: Using Machine Learning to Identify Patients at Risk (Preprint)

10.2196/preprints.34108 ◽

2021 ◽

Author(s):

Jill M Westcott ◽

Francine Hughes ◽

Wenke Liu ◽

Mark Grivainis ◽

Iffath Hoskins ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Postpartum Hemorrhage ◽

Mode Of Delivery ◽

Receiver Operating Curve ◽

Stage Model ◽

Learning Methods ◽

Second Stage ◽

Machine Learning Methods ◽

Patients At Risk

BACKGROUND Postpartum hemorrhage remains one of the largest causes of maternal morbidity and mortality in the United States. OBJECTIVE To utilize machine learning techniques to identify patients at risk for postpartum hemorrhage at obstetric delivery. METHODS Women aged 18 to 55 delivering at a major academic center from July 2013 to October 2018 were included for analysis (n = 30,867). A total of 497 variables were collected from the electronic medical record including demographic information, obstetric, medical, surgical, and family history, vital signs, laboratory results, labor medication exposures, and delivery outcomes. Postpartum hemorrhage was defined as a blood loss of ≥ 1000 mL at the time of delivery, regardless of delivery method, with 2179 positive cases observed (7.06%). Supervised learning with regression-, tree-, and kernel-based machine learning methods was used to create classification models based upon training (n = 21,606) and validation (n = 4,630) cohorts. Models were tuned using feature selection algorithms and domain knowledge. An independent test cohort (n = 4,631) determined final performance by assessing for accuracy, area under the receiver operating curve (AUC), and sensitivity for proper classification of postpartum hemorrhage. Separate models were created using all collected data versus limited to data available prior to the second stage of labor/at the time of decision to proceed with cesarean delivery. Additional models examined patients by mode of delivery. RESULTS Gradient boosted decision trees achieved the best discrimination in the overall model. The model including all data mildly outperformed the second stage model (AUC 0.979, 95% CI 0.971-0.986 vs. AUC 0.955, 95% CI 0.939-0.970). Optimal model accuracy was 98.1% with a sensitivity of 0.763 for positive prediction of postpartum hemorrhage. The second stage model achieved an accuracy of 98.0% with a sensitivity of 0.737. Other selected algorithms returned models that performed with decreased discrimination. Models stratified by mode of delivery achieved good to excellent discrimination, but lacked sensitivity necessary for clinical applicability. CONCLUSIONS Machine learning methods can be used to identify women at risk for postpartum hemorrhage who may benefit from individualized preventative measures. Models limited to data available prior to delivery perform nearly as well as those with more complete datasets, supporting their potential utility in the clinical setting. Further work is necessary to create successful models based upon mode of delivery. An unbiased approach to hemorrhage risk prediction may be superior to human risk assessment and represents an area for future research.

Download Full-text

Applications of Machine Learning in Cancer Prediction and Prognosis

Cancer Informatics ◽

10.1177/117693510600200030 ◽

2006 ◽

Vol 2 ◽

pp. 117693510600200 ◽

Cited By ~ 236

Author(s):

Joseph A. Cruz ◽

David S. Wishart

Keyword(s):

Machine Learning ◽

Cancer Susceptibility ◽

Optimization Techniques ◽

Cancer Prognosis ◽

Complex Data ◽

Protein Biomarkers ◽

Learning Methods ◽

Cancer Prediction ◽

Machine Learning Methods ◽

Complex Data Sets

Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15–25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

Download Full-text

EndCold: An End-to-End Framework for Cold Question Routing in Community Question Answering Services

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/449 ◽

2020 ◽

Author(s):

Jiankai Sun ◽

Jie Zhao ◽

Huan Sun ◽

Srinivasan Parthasarathy

Keyword(s):

Question Answering ◽

Structural Information ◽

Graph Embedding ◽

Textual Information ◽

Feature Vectors ◽

Community Question Answering ◽

End To End ◽

Embedding Methods ◽

Node Embeddings

Routing newly posted questions (a.k.a cold questions) to potential answerers with suitable expertise in Community Question Answering sites (CQAs) is an important and challenging task. The existing methods either focus only on embedding the graph structural information and are less effective for newly posted questions, or adopt manually engineered feature vectors that are not as representative as the graph embedding methods. Therefore, we propose to address the challenge of leveraging heterogeneous graph and textual information for cold question routing by designing an end-to-end framework that jointly learns CQA node embeddings and finds best answerers for cold questions. We conducted extensive experiments to confirm the usefulness of incorporating the textual information from question tags and demonstrate that an end-2-end framework can achieve promising performances on routing newly posted questions asked by both existing users and newly registered users.

Download Full-text

Machine Learning Methods for Local Motion Planning: A Study of End-to-End vs. Parameter Learning

10.1109/ssrr53300.2021.9597689 ◽

2021 ◽

Author(s):

Zifan Xu ◽

Xuesu Xiao ◽

Garrett Warnell ◽

Anirudh Nair ◽

Peter Stone

Keyword(s):

Machine Learning ◽

Motion Planning ◽

Parameter Learning ◽

Local Motion ◽

Learning Methods ◽

Machine Learning Methods ◽

End To End

Download Full-text

A Study on machine learning methods and applications in genetics and genomics

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.7.10653 ◽

2018 ◽

Vol 7 (1.7) ◽

pp. 201

Author(s):

K Jayanthi ◽

C Mahesh

Keyword(s):

Machine Learning ◽

Gene Prediction ◽

Genomic Data ◽

Data Sets ◽

Complex Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Complex Data Sets ◽

Machine Learning Applications ◽

Applications Of Machine Learning

Machine learning enables computers to help humans in analysing knowledge from large, complex data sets. One of the complex data is genetics and genomic data which needs to analyse various set of functions automatically by the computers. Hope this machine learning methods can provide more useful for making these data for further usage like gene prediction, gene expression, gene ontology, gene finding, gene editing and etc. The purpose of this study is to explore some machine learning applications and algorithms to genetic and genomic data. At the end of this study we conclude the following topics classifications of machine learning problems: supervised, unsupervised and semi supervised, which type of method is suitable for various problems in genomics, applications of machine learning and future views of machine learning in genomics.

Download Full-text

Unlocking GOES: A Statistical Framework for Quantifying the Evolution of Convective Structure in Tropical Cyclones

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-19-0286.1 ◽

2020 ◽

Vol 59 (10) ◽

pp. 1671-1689

Author(s):

Trey McNeely ◽

Ann B. Lee ◽

Kimberly M. Wood ◽

Dorit Hammerling

Keyword(s):

Machine Learning ◽

Tropical Cyclones ◽

Satellite Imagery ◽

The United States ◽

Intensity Change ◽

Complex Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Convective Structure ◽

Structure Patterns

AbstractTropical cyclones (TCs) rank among the most costly natural disasters in the United States, and accurate forecasts of track and intensity are critical for emergency response. Intensity guidance has improved steadily but slowly, as processes that drive intensity change are not fully understood. Because most TCs develop far from land-based observing networks, geostationary satellite imagery is critical to monitor these storms. However, these complex data can be challenging to analyze in real time, and off-the-shelf machine-learning algorithms have limited applicability on this front because of their “black box” structure. This study presents analytic tools that quantify convective structure patterns in infrared satellite imagery for overocean TCs, yielding lower-dimensional but rich representations that support analysis and visualization of how these patterns evolve during rapid intensity change. The proposed feature suite targets the global organization, radial structure, and bulk morphology (ORB) of TCs. By combining ORB and empirical orthogonal functions, we arrive at an interpretable and rich representation of convective structure patterns that serve as inputs to machine-learning methods. This study uses the logistic lasso, a penalized generalized linear model, to relate predictors to rapid intensity change. Using ORB alone, binary classifiers identifying the presence (vs absence) of such intensity-change events can achieve accuracy comparable to classifiers using environmental predictors alone, with a combined predictor set improving classification accuracy in some settings. More complex nonlinear machine-learning methods did not perform better than the linear logistic lasso model for current data.

Download Full-text

Graph Embedding Based Novel Gene Discovery Associated With Diabetes Mellitus

Frontiers in Genetics ◽

10.3389/fgene.2021.779186 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jianzong Du ◽

Dongdong Lin ◽

Ruan Yuan ◽

Xiaopei Chen ◽

Xiaoli Liu ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Feature Extraction ◽

Disease Gene ◽

Gene Prediction ◽

Graph Embedding ◽

Molecular Networks ◽

Novel Genes ◽

Machine Learning Classifiers ◽

Embedding Methods

Diabetes mellitus is a group of complex metabolic disorders which has affected hundreds of millions of patients world-widely. The underlying pathogenesis of various types of diabetes is still unclear, which hinders the way of developing more efficient therapies. Although many genes have been found associated with diabetes mellitus, more novel genes are still needed to be discovered towards a complete picture of the underlying mechanism. With the development of complex molecular networks, network-based disease-gene prediction methods have been widely proposed. However, most existing methods are based on the hypothesis of guilt-by-association and often handcraft node features based on local topological structures. Advances in graph embedding techniques have enabled automatically global feature extraction from molecular networks. Inspired by the successful applications of cutting-edge graph embedding methods on complex diseases, we proposed a computational framework to investigate novel genes associated with diabetes mellitus. There are three main steps in the framework: network feature extraction based on graph embedding methods; feature denoising and regeneration using stacked autoencoder; and disease-gene prediction based on machine learning classifiers. We compared the performance by using different graph embedding methods and machine learning classifiers and designed the best workflow for predicting genes associated with diabetes mellitus. Functional enrichment analysis based on Human Phenotype Ontology (HPO), KEGG, and GO biological process and publication search further evaluated the predicted novel genes.

Download Full-text