Search Efficient Binary Network Embedding

Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned continuous vector representations are inefficient for large-scale similarity search, which often involves finding nearest neighbors measured by distance or similarity in a continuous vector space. In this article, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations using a stochastic gradient descent-based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support faster node similarity search than using Euclidean or other distance measures. Extensive experiments and comparisons demonstrate that BinaryNE not only delivers more than 25 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods. The binary codes learned by BinaryNE also render competitive performance on node classification and node clustering tasks. The source code of the BinaryNE algorithm is available at https://github.com/daokunzhang/BinaryNE.

Download Full-text

A Novel Machine Learning Assisted Upscaling Workflow for Simulating the Waterflooding Process

10.2118/205595-ms ◽

2021 ◽

Author(s):

Yanji Wang ◽

Hangyu Li ◽

Jianchun Xu ◽

Ling Fan ◽

Xiaopu Wang ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Algorithm ◽

Learning Algorithms ◽

Flow Simulation ◽

Machine Learning Algorithms ◽

Scale Model ◽

Two Phase ◽

Flow Problems ◽

Similar Accuracy

Abstract Conventional flow-based two-phase upscaling for simulating the waterflooding process requires the calculations of upscaled two-phase parameters for each coarse interface or block. The whole procedure can be greatly time-consuming especially for large-scale reservoir models. To address this problem, flow-based two-phase upscaling techniques are combined with machine learning algorithms, in which the flow-based two-phase upscaling is needed only for a small fraction of coarse interfaces (or blocks), while the upscaled two-phase parameters for the rest of the coarse interfaces (or blocks) are directly provided by the machine learning algorithms instead of performing upscaling computation on each coarse interfaces (or blocks). The new two-phase upscaling workflow was tested for generic (left to right) flow problems using a 2D large-scale model. We observed similar accuracy for results using the machine learning assisted workflow compared with the results using full flow-based upscaling. And significant speedup (nearly 70) is achieved. The workflow developed in this work is one of the pioneering work in combining machine learning algorithm with the time-consuming flow-based two-phase upscaling method. It is a valuable addition to the existing multiscale techniques for subsurface flow simulation.

Download Full-text

Hoax News Classification using Machine Learning Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3753.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3938-3944

Keyword(s):

Machine Learning ◽

Social Media ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Training Data ◽

Stochastic Gradient Descent ◽

Support Vector ◽

The Impact ◽

F Measure

Hoax news on social media has had a dramatic effect on our society in recent years. The impact of hoax news felt by many people, anxiety, financial loss, and loss of the right name. Therefore we need a detection system that can help reduce hoax news on social media. Hoax news classification is one of the stages in the construction of a hoax news detection system, and this unsupervised learning algorithm becomes a method for creating hoax news datasets, machine learning tools for data processing, and text processing for detecting data. The next will produce a classification of a hoax or not a Hoax based on the text inputted. Hoax news classification in this study uses five algorithms, namely Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression, Stochastic Gradient Descent, and Neural Network (MLP). These five algorithms to produce the best algorithm that can use to detect hoax news, with the highest parameters, accuracy, F-measure, Precision, and recall. From the results of testing conducted on five classification algorithms produced shows that the NN-MPL algorithm has an average of 93% for the value of accuracy, F-Measure, and Precision, the highest compared to five other algorithms, but for the highest Recall value generated from the algorithm SVM which is 94%. the results of this experiment show that different effects for different classifiers, and that means that the more hoax data used as training data, the more accurate the system calculates accuracy in more detail.

Download Full-text

Large-Scale Local Online Similarity/Distance Learning Framework Based on Passive/Aggressive

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421510174 ◽

2021 ◽

Author(s):

Baida Hamdan ◽

Davood Zabihzadeh

Keyword(s):

Distance Learning ◽

Input Data ◽

Large Scale ◽

Learning Algorithm ◽

Metric Learning ◽

Distance Measures ◽

Discrimination Power ◽

Input Space ◽

Learning Framework ◽

Similarity Distance

Similarity/distance measures play a key role in many machine learning, pattern recognition, and data mining algorithms, which leads to the emergence of the metric learning field. Many metric learning algorithms learn a global distance function from data that satisfies the constraints of the problem. However, in many real-world datasets, where the discrimination power of features varies in the different regions of input space, a global metric is often unable to capture the complexity of the task. To address this challenge, local metric learning methods are proposed which learn multiple metrics across the different regions of the input space. Some advantages of these methods include high flexibility and learning a nonlinear mapping, but they typically achieve at the expense of higher time requirements and overfitting problems. To overcome these challenges, this research presents an online multiple metric learning framework. Each metric in the proposed framework is composed of a global and a local component learned simultaneously. Adding a global component to a local metric efficiently reduces the problem of overfitting. The proposed framework is also scalable with both sample size and the dimension of input data. To the best of our knowledge, this is the first local online similarity/distance learning framework based on Passive/Aggressive (PA). In addition, for scalability with the dimension of input data, Dual Random Projection (DRP) is extended for local online learning in the present work. It enables our methods to run efficiently on high-dimensional datasets while maintaining their predictive performance. The proposed framework provides a straightforward local extension to any global online similarity/distance learning algorithm based on PA. Experimental results on some challenging datasets from machine vision community confirm that the extended methods considerably enhance the performance of the related global ones without increasing the time complexity.

Download Full-text

Short-Term Power Prediction of Building Integrated Photovoltaic (BIPV) System Based on Machine Learning Algorithms

International Journal of Photoenergy ◽

10.1155/2021/5582418 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

R. Kabilan ◽

V. Chandran ◽

J. Yogapriya ◽

Alagar Karthick ◽

Priyesh P. Gandhi ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Data Science ◽

Learning Algorithm ◽

Accuracy Assessment ◽

Machine Learning Algorithms ◽

Photovoltaic System ◽

Power Prediction ◽

Building Integrated Photovoltaic ◽

Scale Integration

One of the biggest challenges is towards ensuring large-scale integration of photovoltaic systems into buildings. This work is aimed at presenting a building integrated photovoltaic system power prediction concerning the building’s various orientations based on the machine learning data science tools. The proposed prediction methodology comprises a data quality stage, machine learning algorithm, weather clustering assessment, and an accuracy assessment. The results showed that the application of linear regression coefficients to the forecast outputs of the developed photovoltaic power generation neural network improved the PV power generation’s forecast output. The final model resulted from accurate forecasts, exhibiting a root mean square error of 4.42% in NN, 16.86% in QSVM, and 8.76% in TREE. The results are presented with the building facade and roof application such as flat roof, south façade, east façade, and west façade.

Download Full-text

A Deep Learning Algorithm to Predict Hazardous Drinkers and the Severity of Alcohol-Related Problems Using K-NHANES

Frontiers in Psychiatry ◽

10.3389/fpsyt.2021.684406 ◽

2021 ◽

Vol 12 ◽

Author(s):

Suk-Young Kim ◽

Taesung Park ◽

Kwonyoung Kim ◽

Jihoon Oh ◽

Yoonjae Park ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Deep Learning Algorithm ◽

Conventional Machine ◽

Large Scale Survey ◽

Alcohol Related Problems

Purpose: The number of patients with alcohol-related problems is steadily increasing. A large-scale survey of alcohol-related problems has been conducted. However, studies that predict hazardous drinkers and identify which factors contribute to the prediction are limited. Thus, the purpose of this study was to predict hazardous drinkers and the severity of alcohol-related problems of patients using a deep learning algorithm based on a large-scale survey data.Materials and Methods: Datasets of National Health and Nutrition Examination Survey of South Korea (K-NHANES), a nationally representative survey for the entire South Korean population, were used to train deep learning and conventional machine learning algorithms. Datasets from 69,187 and 45,672 participants were used to predict hazardous drinkers and the severity of alcohol-related problems, respectively. Based on the degree of contribution of each variable to deep learning, it was possible to determine which variable contributed significantly to the prediction of hazardous drinkers.Results: Deep learning showed the higher performance than conventional machine learning algorithms. It predicted hazardous drinkers with an AUC (Area under the receiver operating characteristic curve) of 0.870 (Logistic regression: 0.858, Linear SVM: 0.849, Random forest classifier: 0.810, K-nearest neighbors: 0.740). Among 325 variables for predicting hazardous drinkers, energy intake was a factor showing the greatest contribution to the prediction, followed by carbohydrate intake. Participants were classified into Zone I, Zone II, Zone III, and Zone IV based on the degree of alcohol-related problems, showing AUCs of 0.881, 0.774, 0.853, and 0.879, respectively.Conclusion: Hazardous drinking groups could be effectively predicted and individuals could be classified according to the degree of alcohol-related problems using a deep learning algorithm. This algorithm could be used to screen people who need treatment for alcohol-related problems among the general population or hospital visitors.

Download Full-text

Exploratory Data Analysis and Machine Learning Algorithms to Classifying Stroke Disease

IJCONSIST JOURNALS ◽

10.33005/ijconsist.v2i02.49 ◽

2021 ◽

Vol 2 (02) ◽

pp. 77-82

Author(s):

Prismahardi Aji Riyantoko ◽

Tresna Maulana Fahrudin ◽

Kartika Maulida Hindrayani ◽

Mohammad Idhom

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Gradient Descent ◽

Exploratory Data Analysis ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Machine Learning Algorithm ◽

Exploratory Data

This paper presents data stroke disease that combine exploratory data analysis and machine learning algorithms. Using exploratory data analysis we can found the patterns, anomaly, give assumptions using statistical and graphical method. Otherwise, machine learning algorithm can classify the dataset using model, and we can compare many model. EDA have showed the result if the age of patient was attacked stroke disease between 25 into 62 years old. Machine learning algorithm have showed the highest are Logistic Regression and Stochastic Gradient Descent around 94,61%. Overall, the model of machine learning can provide the best performed and accuracy.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

Efficient Image Retrieval approach for Large-scale Chest X Ray data using Hand-Crafted Features and Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.890896 ◽

2018 ◽

Vol 6 (11) ◽

pp. 890-896

Author(s):

Irene Getzi S ◽

D. Christopher Durairaj ◽

V Joseph Raj

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Chest X Ray

Download Full-text

A Survey of Network Embedding for Drug Analysis and Prediction

Current Protein and Peptide Science ◽

10.2174/1389203721666200702145701 ◽

2020 ◽

Vol 21 ◽

Author(s):

Zhixian Liu ◽

Qingfeng Chen ◽

Wei Lan ◽

Jiahai Liang ◽

Yiping Pheobe Chen ◽

...

Keyword(s):

Deep Learning ◽

Protein Function ◽

Dimensional Space ◽

Auxiliary Information ◽

Matrix Decomposition ◽

Drug Analysis ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Network Embedding ◽

Similarity Estimation

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.

Download Full-text

Share Market Data Prediction Strategies using Deep Learning Algorithm

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191209093139 ◽

2019 ◽

Vol 13 ◽

Author(s):

A John. ◽

D. Praveen Dominic ◽

M. Adimoolam ◽

N. M. Balamurugan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Stock Market ◽

Predictive Analytics ◽

Learning Algorithm ◽

Market Price ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Mining Machine ◽

Gradient Descent Algorithm

Background:: Predictive analytics has a multiplicity of statistical schemes from predictive modelling, data mining, machine learning. It scrutinizes present and chronological data to make predictions about expectations or if not unexplained measures. Most predictive models are used for business analytics to overcome loses and profit gaining. Predictive analytics is used to exploit the pattern in old and historical data. Objective: People used to follow some strategies for predicting stock value to invest in the more profit-gaining stocks and those strategies to search the stock market prices which are incorporated in some intelligent methods and tools. Such strategies will increase the investor’s profits and also minimize their risks. So prediction plays a vital role in stock market gaining and is also a very intricate and challenging process. Method: The proposed optimized strategies are the Deep Neural Network with Stochastic Gradient for stock prediction. The Neural Network is trained using Back-propagation neural networks algorithm and stochastic gradient descent algorithm as optimal strategies. Results: The experiment is conducted for stock market price prediction using python language with the visual package. In this experiment RELIANCE.NS, TATAMOTORS.NS, and TATAGLOBAL.NS dataset are taken as input dataset and it is downloaded from National Stock Exchange site. The artificial neural network component including Deep Learning model is most effective for more than 100,000 data points to train this model. This proposed model is developed on daily prices of stock market price to understand how to build model with better performance than existing national exchange method.

Download Full-text