Application of machine learning algorithms in MBR simulation under big data platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text

Prediction of novel mouse TLR9 agonists using a random forest approach

BMC Molecular and Cell Biology ◽

10.1186/s12860-019-0241-0 ◽

2019 ◽

Vol 20 (S2) ◽

Author(s):

Varun Khanna ◽

Lei Li ◽

Johnson Fung ◽

Shoba Ranganathan ◽

Nikolai Petrovsky

Keyword(s):

Machine Learning ◽

Random Forest ◽

Correlation Coefficient ◽

Matthews Correlation Coefficient ◽

Learning Algorithms ◽

Ensemble Classifier ◽

Innate Immune ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.

Download Full-text

The Application and Research of the GA-BP Neural Network Algorithm in the MBR Membrane Fouling

Abstract and Applied Analysis ◽

10.1155/2014/673156 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Chunqing Li ◽

Zixiang Yang ◽

Hongying Yan ◽

Tao Wang

Keyword(s):

Neural Network ◽

Network Model ◽

Bp Neural Network ◽

Membrane Fouling ◽

Sewage Treatment ◽

Principal Component ◽

Operating Pressure ◽

Bp Network ◽

Factors Affecting ◽

Membrane Flux

It is one of the important issues in the field of today's sewage treatment of researching the MBR membrane flux prediction for membrane fouling. Firstly this paper used the principal component analysis method to achieve dimensionality and correlation of input variables and obtained the three major factors affecting membrane fouling most obvious: MLSS, total resistance, and operating pressure. Then it used the BP neural network to establish the system model of the MBR intelligent simulation, the relationship between three parameters, and membrane flux characterization of the degree of membrane fouling, because the BP neural network has slow training speed, is sensitive to the initial weights and the threshold, is easy to fall into local minimum points, and so on. So this paper used genetic algorithm to optimize the initial weights and the threshold of BP neural network and established the membrane fouling prediction model based on GA-BP network. As this research had shown, under the same conditions, the BP network model optimized by GA of MBR membrane fouling is better than that not optimized for prediction effect of membrane flux. It demonstrates that the GA-BP network model of MBR membrane fouling is more suitable for simulation of MBR membrane fouling process, comparing with the BP network.

Download Full-text

Research on Forecasting of China's Monetary Policy Based on Random Forest Algorithm.pdf

10.36227/techrxiv.12083805.v1 ◽

2020 ◽

Author(s):

chuanxin qiu

Keyword(s):

Neural Network ◽

Monetary Policy ◽

Random Forest ◽

Prediction Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm ◽

Bank Of China ◽

Macroeconomic Indicators ◽

Neural Network Algorithm

This paper uses the random forest algorithm model to quantify and predict the monetary policy of the People's Bank of China under the input of 16 indicators macroeconomic indicators. It is compared with three other machine learning algorithms (CART decision tree, support vector machine and neural network algorithm), discrete selection model and combined prediction model. The results show that the random forest algorithm shows better prediction accuracy in predicting the direction of the central bank's monetary policy.

Download Full-text

Research on Forecasting of China's Monetary Policy Based on Random Forest Algorithm.pdf

10.36227/techrxiv.12083805 ◽

2020 ◽

Author(s):

chuanxin qiu

Keyword(s):

Neural Network ◽

Monetary Policy ◽

Random Forest ◽

Prediction Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm ◽

Bank Of China ◽

Macroeconomic Indicators ◽

Neural Network Algorithm

This paper uses the random forest algorithm model to quantify and predict the monetary policy of the People's Bank of China under the input of 16 indicators macroeconomic indicators. It is compared with three other machine learning algorithms (CART decision tree, support vector machine and neural network algorithm), discrete selection model and combined prediction model. The results show that the random forest algorithm shows better prediction accuracy in predicting the direction of the central bank's monetary policy.

Download Full-text

Research on Forecasting of China's Monetary Policy Based on Random Forest Algorithm.pdf

10.36227/techrxiv.12083805.v2 ◽

2020 ◽

Author(s):

chuanxin qiu

Keyword(s):

Neural Network ◽

Monetary Policy ◽

Random Forest ◽

Prediction Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm ◽

Bank Of China ◽

Macroeconomic Indicators ◽

Neural Network Algorithm

This paper uses the random forest algorithm model to quantify and predict the monetary policy of the People's Bank of China under the input of 16 indicators macroeconomic indicators. It is compared with three other machine learning algorithms (CART decision tree, support vector machine and neural network algorithm), discrete selection model and combined prediction model. The results show that the random forest algorithm shows better prediction accuracy in predicting the direction of the central bank's monetary policy.

Download Full-text

Classification Breast Cancer Revisited with Machine Learning

International Journal on Data Science ◽

10.18517/ijods.1.1.42-50.2020 ◽

2020 ◽

Vol 1 (1) ◽

pp. 42-50

Author(s):

Hanna Arini Parhusip ◽

Bambang Susanto ◽

Lilik Linawati ◽

Suryasatriya Trihandaru ◽

Yohanes Sardjono ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm ◽

K Nearest Neighbor ◽

Cancer Data

The article presents the study of several machine learning algorithms that are used to study breast cancer data with 33 features from 569 samples. The purpose of this research is to investigate the best algorithm for classification of breast cancer. The data may have different scales with different large range one to the other features and hence the data are transformed before the data are classified. The used classification methods in machine learning are logistic regression, k-nearest neighbor, Naive bayes classifier, support vector machine, decision tree and random forest algorithm. The original data and the transformed data are classified with size of data test is 0.3. The SVM and Naive Bayes algorithms have no improvement of accuracy with random forest gives the best accuracy among all. Therefore the size of data test is reduced to 0.25 leading to improve all algorithms in transformed data classifications. However, random forest algorithm still gives the best accuracy.

Download Full-text

Machine Learning Algorithms in Fraud Detection: Case Study on Retail Consumer Financing Company

Asia Pacific Fraud Journal ◽

10.21532/apfjournal.v6i2.216 ◽

2021 ◽

Vol 6 (2) ◽

pp. 213

Author(s):

Nadya Intan Mustika ◽

Bagus Nenda ◽

Dona Ramadhan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Historical Data ◽

Learning Algorithm ◽

Fraud Detection ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Random Forest Algorithm ◽

Data Set

This study aims to implement a machine learning algorithm in detecting fraud based on historical data set in a retail consumer financing company. The outcome of machine learning is used as samples for the fraud detection team. Data analysis is performed through data processing, feature selection, hold-on methods, and accuracy testing. There are five machine learning methods applied in this study: Logistic Regression, K-Nearest Neighbor (KNN), Decision Tree, Random Forest, and Support Vector Machine (SVM). Historical data are divided into two groups: training data and test data. The results show that the Random Forest algorithm has the highest accuracy with a training score of 0.994999 and a test score of 0.745437. This means that the Random Forest algorithm is the most accurate method for detecting fraud. Further research is suggested to add more predictor variables to increase the accuracy value and apply this method to different financial institutions and different industries.

Download Full-text

Implementing machine learning in bipolar diagnosis in China

Translational Psychiatry ◽

10.1038/s41398-019-0638-8 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Yantao Ma ◽

Jun Ji ◽

Yun Huang ◽

Huimin Gao ◽

Zhiying Li ◽

...

Keyword(s):

Machine Learning ◽

Bipolar Disorder ◽

Random Forest ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm ◽

Linear Discriminant ◽

Cohort Data ◽

Selection Operator

AbstractBipolar disorder (BPD) is often confused with major depression, and current diagnostic questionnaires are subjective and time intensive. The aim of this study was to develop a new Bipolar Diagnosis Checklist in Chinese (BDCC) by using machine learning to shorten the Affective Disorder Evaluation scale (ADE) based on an analysis of registered Chinese multisite cohort data. In order to evaluate the importance of each item of the ADE, a case-control study of 360 bipolar disorder (BPD) patients, 255 major depressive disorder (MDD) patients and 228 healthy (no psychiatric diagnosis) controls (HCs) was conducted, spanning 9 Chinese health facilities participating in the Comprehensive Assessment and Follow-up Descriptive Study on Bipolar Disorder (CAFÉ-BD). The BDCC was formed by selected items from the ADE according to their importance as calculated by a random forest machine learning algorithm. Five classical machine learning algorithms, namely, a random forest algorithm, support vector regression (SVR), the least absolute shrinkage and selection operator (LASSO), linear discriminant analysis (LDA) and logistic regression, were used to retrospectively analyze the aforementioned cohort data to shorten the ADE. Regarding the area under the receiver operating characteristic (ROC) curve (AUC), the BDCC had high AUCs of 0.948, 0.921, and 0.923 for the diagnosis of MDD, BPD, and HC, respectively, despite containing only 15% (17/113) of the items from the ADE. Traditional scales can be shortened using machine learning analysis. By shortening the ADE using a random forest algorithm, we generated the BDCC, which can be more easily applied in clinical practice to effectively enhance both BPD and MDD diagnosis.

Download Full-text

Random Forest Algorithm for Enhanced Prediction of Drug Target Interactions

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1722.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2008-2012

Keyword(s):

Random Forest ◽

Drug Target ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm ◽

K Nearest Neighbor ◽

In Vivo Experiments

Identification of drug-target interaction (DTI) is an important challenge for research and development in the pharmaceutical industry. Biomedicine researchers have stepped from in vitro and in vivo experiments to in-silico methods for fast results. In the recent past, machine learning algorithms have become very popular for DTI predictions. This paper presents an ensemble approach- Random forest algorithm for DTI predictions. The performance of proposed approach is evaluated with respect to Matrix factorization, genetic algorithm, Support vector machines, K-nearest neighbor, Decision Trees and Logistic Regression over 4 benchmark datasets with diverse properties. The algorithm is evaluated over Accuracy and average ranking. Results establish that random forest algorithm is more suitable or DTI predictions as compared to other algorithms.

Download Full-text