scholarly journals Metabolite-disease association prediction algorithm combining DeepWalk and random forest

2022 ◽  
Vol 27 (1) ◽  
pp. 58-67
Author(s):  
Jiaojiao Tie ◽  
Xiujuan Lei ◽  
Yi Pan
2020 ◽  
Vol 9 (1) ◽  
pp. 34
Author(s):  
Sima Dehnavi ◽  
Madjid Emamipour ◽  
Amin Golabpour

Introduction: Heart disease is known as one of the most important causes of death in today's society and so far no definitive method has been found to predict it and several factors are effective in contracting this disease. Therefore, the aim of this study was to provide a data mining model for predicting heart disease.Material and Methods: This study used standard data from UCI. These data include four Cleveland, Hungarian, Swiss and Long Beach VA databases. These data include 13 independent variables and one dependent variable. The data are missing, and the EM algorithm was used to control this loss, and at the end of the data, a suggestion algorithm was implemented that combined the two random forest algorithms and the artificial neural network.Results: In this study, data was divided into two training sets and 10-Fold method was used. To evaluate the algorithms, three indicators of sensitivity, specificity, accuracy were used and the accuracy of the prediction algorithm for four data Cleveland, Hungarian, Switzerland and Long Beach VA reached 87.65%, 94.37%, 93.45% and 85%, respectively. Then, the proposed algorithm was compared with similar articles in this field, and it was found that this algorithm is more accurate than similar methods.Conclusion: The results of this study showed that by combining the two algorithms of random forest and artificial neural network, a suitable model for predicting heart attacks can be provided.


2018 ◽  
Vol 13 ◽  
pp. 568-579 ◽  
Author(s):  
Xing Chen ◽  
Chun-Chun Wang ◽  
Jun Yin ◽  
Zhu-Hong You

Energies ◽  
2018 ◽  
Vol 11 (11) ◽  
pp. 3207 ◽  
Author(s):  
Yiqi Lu ◽  
Yongpan Li ◽  
Da Xie ◽  
Enwei Wei ◽  
Xianlu Bao ◽  
...  

To cope with the increasing charging demand of electric vehicle (EV), this paper presents a forecasting method of EV charging load based on random forest algorithm (RF) and the load data of a single charging station. This method is completed by the classification and regression tree (CART) algorithm to realize short-term forecast for the station. At the same time, the prediction algorithm of the daily charging capacity of charging stations with different scales and locations is proposed. By combining the regression and classification algorithms, the effective learning of a large amount of historical charging data is completed. The characteristic data is divided from different aspects, realizing the establishment of RF and the effective prediction of fluctuate charging load. By analyzing the data of each charging station in Shenzhen from the aspect of time and space, the algorithm is put into practice. The application form of current data in the algorithm is determined, and the accuracy of the prediction algorithm is verified to be reliable and practical. It can provide a reference for both power suppliers and users through the prediction of charging load.


2021 ◽  
Vol 25 (4) ◽  
pp. 973-991
Author(s):  
Yanben Wang ◽  
Jurong Bai

In the microblog network, users’ forwarding behavior is widespread and the propagation range is difficult to predict quantitatively. To solve this problem, machine learning algorithms are used to quantitatively predict propagation breadth and depth of microblog users’ forwarding behavior. The dataset is preprocessed, and the extracted features are divided into three types: user features, microblog features and social features. Then the dataset is analyzed in detail; machine learning algorithms are used to predict the propagation breadth and depth of users’ forwarding behavior; and the influence of the three types of features on prediction precision is studied. The experimental results show that the prediction precision of the improved random forest algorithm has less fluctuations, and it is not sensitive to the changes of various features. The improved random forest algorithm has higher precision and better generalization ability than the other algorithms, which shows that the prediction results have high reference value. Social features have the greatest influence on the prediction precision for each prediction algorithm. User features have the similar influence as microblog features on the prediction precision.


2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Zhenhong Xiao ◽  
Jianbang Shi ◽  
Rui Tan ◽  
Junyi Shen

This paper studies the competitiveness of listed companies in high-end equipment manufacturing industry by using random forest. Random forest is a supervised machine learning algorithm that is actually based on the regression and classification. It takes some important decisions that are always based upon the set of samples. It counts majority for the classification purposes while it takes an average for the regression. For empirical analysis, 88 listed companies are selected. It is found that there are great differences in comprehensive competitiveness among industries. Enterprise scale accounts for a high proportion in the comprehensive competitiveness, and its score often affects the comprehensive strength; and the gap between companies in the same industry is also obvious. The empirical evaluation results of this paper provide three enlightenments for enterprises to improve their comprehensive competitiveness, such as seizing the strategic opportunity to expand the market, expand the scale of enterprises, improve asset management, and narrow the industry gap.


2020 ◽  
Vol 9 (1) ◽  
pp. 1355-1360

Data mining is becoming more and more popular and essential in the field of medicine. The large amounts of data produced everyday by the medical industry are very complex and voluminous to be processed and analyzed by the usual traditional means. In such cases data mining comes into play. Despite the presence of several prediction algorithms, the efficiency is questionable due to the presence high error rate. Therefore it is necessary to choose a prediction algorithm that gives higher accuracy with fewer errors. The aim of this paper is to create a system for efficient and accurate prediction of cardiovascular disease. The datasets for the process is taken from UCI machine learning repository. The datasets are tested for accuracy using ANOVA technique. The algorithms are investigated using the WEKA tool. The best features for prediction are obtained from feature selection algorithms. Various classification algorithms are applied on the datasets to identify the most efficient algorithm. We observe that random forest gives consistently better accuracy than other algorithms. Tuning is done on the random forest algorithm to further improve the accuracy of prediction system.


2021 ◽  
Vol 16 ◽  
Author(s):  
Jian He ◽  
Rongao Yuan ◽  
Lei Xu ◽  
Yanzhi Guo ◽  
Menglong Li

Background: The number of human genetic variants deposited into publicly available databases has been increasing exponentially. Among these variants, non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), have been demonstrated to be strongly correlated with phenotypic variations of traits/diseases. Objective: However, the detailed mechanisms governing the disease association of SAPs remain unclear. Thus, further investigation of new attributes and improvement of the prediction becomes more and more urgent since amount of unknown disease-related SAPs need to be investigated. Method: Based on the principle of random forest (RF), we firstly constructed a new effective prediction model for SAPs associated with a particular disease from protein sequences. Four usual sequence signature extractions were separately performed to select the optimal features. Then SAP peptide lengths from 12 to 202 were also optimized. Results: The optimal models achieve higher than 90% accuracy and area under the curve (AUC) of over 0.9 on all 11 external testing datasets. Finally, the good performance on an independent test set with an accuracy higher than 95% proves the superiority of our method. Conclusion: In this paper, based on random forest (RF), we constructed 11 disease-association prediction models for SAPs from the protein sequence level. All models yield prediction accuracy higher than 90% and area under the curve (AUC) more than 0.9. Our method only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins.


2018 ◽  
Vol 5 (1) ◽  
pp. 47-55
Author(s):  
Florensia Unggul Damayanti

Data mining help industries create intelligent decision on complex problems. Data mining algorithm can be applied to the data in order to forecasting, identity pattern, make rules and recommendations, analyze the sequence in complex data sets and retrieve fresh insights. Yet, increasing of technology and various techniques among data mining availability data give opportunity to industries to explore and gain valuable information from their data and use the information to support business decision making. This paper implement classification data mining in order to retrieve knowledge in customer databases to support marketing department while planning strategy for predict plan premium. The dataset decompose into conceptual analytic to identify characteristic data that can be used as input parameter of data mining model. Business decision and application is characterized by processing step, processing characteristic and processing outcome (Seng, J.L., Chen T.C. 2010). This paper set up experimental of data mining based on J48 and Random Forest classifiers and put a light on performance evaluation between J48 and random forest in the context of dataset in insurance industries. The experiment result are about classification accuracy and efficiency of J48 and Random Forest , also find out the most attribute that can be used to predict plan premium in context of strategic planning to support business strategy.


2019 ◽  
Vol 139 (8) ◽  
pp. 850-857
Author(s):  
Hiromu Imaji ◽  
Takuya Kinoshita ◽  
Toru Yamamoto ◽  
Keisuke Ito ◽  
Masahiro Yoshida ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document