Differentiating Thrombotic Microangiopathies Based on Laboratory Tests Other Than ADAMTS13 Using Machine Learning Technology

Abstract Introduction: ADAMTS13 activity level is crucial for differentiating thrombotic microangiopathies. However, ADAMTS13 testing is not readily available at site in many parts of the world. Hence, we developed an innovative algorithm that allow differentiation of thrombotic thrombocytopenic purpura (TTP) from other TMA's based on laboratory results other than ADAMTS13 using machine learning. Methods: Two hundred- eight adult patients with either TTP (N=64) or TMA other than TTP (N=144) (ADAMTS13 cutoff level of 10%) were classified using three machine learning techniques (decision tree, random forest, and neural network), using a set of easily measured 19 clinical variables such as fever, Hb, ALT and so on. Basically, each clinical variable is not correlated with TTP (Absolute values of correlation coefficients are lower than 0.5), so we applied machine learning algorithms. First, we divided patient data into three parts, train, test and validation set. And then, we applied these 3 machine learning techniques, decision tree, random forest and neural network. Principal component analysis was also performed. Results: As a single variable, platelet count, BUN and total bilirubin were the most important three variables that are predictive of differentiating TTP from other TMA's with accuracy of 82%. Random forest method increased accuracy to 85% and precision, and recall statistic is 0.828, and 0.832, respectively. Neural network did not do better without optimization than random forest method. Conclusion: Machine learning technology seems promising in differentiating TTP from other TMA's if ADAMTS13 value is not available. These algorithms could support the physician in tailoring the management of TMA. Correlation coefficient in our study Correlation coefficient in our study Scheme of Random Forest method used in our study Scheme of Random Forest method used in our study Disclosures Lee: SamsungSDS: Employment. Yun:Samsung SDS: Employment.

Download Full-text

Sport analytics for cricket game results using machine learning: An experimental study

Applied Computing and Informatics ◽

10.1016/j.aci.2019.11.006 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Kumash Kapadia ◽

Hussein Abdel-Jaber ◽

Fadi Thabtah ◽

Wael Hadi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Information Gain ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Learning Technology ◽

Home Team ◽

Feature Sets ◽

Learning Techniques

Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and the betting market for IPL is growing significantly every year. With cricket being a very dynamic game, bettors and bookies are incentivised to bet on the match results because it is a game that changes ball-by-ball. This paper investigates machine learning technology to deal with the problem of predicting cricket match results based on historical match data of the IPL. Influential features of the dataset have been identified using filter-based methods including Correlation-based Feature Selection, Information Gain (IG), ReliefF and Wrapper. More importantly, machine learning techniques including Naïve Bayes, Random Forest, K-Nearest Neighbour (KNN) and Model Trees (classification via regression) have been adopted to generate predictive models from distinctive feature sets derived by the filter-based methods. Two featured subsets were formulated, one based on home team advantage and other based on Toss decision. Selected machine learning techniques were applied on both feature sets to determine a predictive model. Experimental tests show that tree-based models particularly Random Forest performed better in terms of accuracy, precision and recall metrics when compared to probabilistic and statistical models. However, on the Toss featured subset, none of the considered machine learning algorithms performed well in producing accurate predictive models.

Download Full-text

Comparative Analysis of Machine Learning Techniques to Identify Churn for Telecom Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.34.19210 ◽

2018 ◽

Vol 7 (3.34) ◽

pp. 291

Author(s):

M Malleswari ◽

R.J Manira ◽

Praveen Kumar ◽

Murugan .

Keyword(s):

Machine Learning ◽

Big Data ◽

Random Forest ◽

Decision Tree ◽

Apache Spark ◽

Machine Learning Techniques ◽

Churn Prediction ◽

Learning Techniques ◽

Boosted Tree ◽

Customer Attrition

Big data analytics has been the focus for large scale data processing. Machine learning and Big data has great future in prediction. Churn prediction is one of the sub domain of big data. Preventing customer attrition especially in telecom is the advantage of churn prediction. Churn prediction is a day-to-day affair involving millions. So a solution to prevent customer attrition can save a lot. This paper propose to do comparison of three machine learning techniques Decision tree algorithm, Random Forest algorithm and Gradient Boosted tree algorithm using Apache Spark. Apache Spark is a data processing engine used in big data which provides in-memory processing so that the processing speed is higher. The analysis is made by extracting the features of the data set and training the model. Scala is a programming language that combines both object oriented and functional programming and so a powerful programming language. The analysis is implemented using Apache Spark and modelling is done using scala ML. The accuracy of Decision tree model came out as 86%, Random Forest model is 87% and Gradient Boosted tree is 85%.

Download Full-text

Classification of Agriculture Farm Machinery Using Machine Learning and Internet of Things

Symmetry ◽

10.3390/sym13030403 ◽

2021 ◽

Vol 13 (3) ◽

pp. 403

Author(s):

Muhammad Waleed ◽

Tai-Won Um ◽

Tariq Kamal ◽

Syed Muhammad Usman

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Farm Machinery ◽

Learning Techniques

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.

Download Full-text

Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques

First International Conference on Sustainable Technologies for Computational Intelligence - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-0029-9_50 ◽

2019 ◽

pp. 637-643

Author(s):

T. Tulasi Bhavani ◽

M. Kameswara Rao ◽

A. Manohar Reddy

Keyword(s):

Machine Learning ◽

Random Forest ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Techniques ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Learning Techniques

Download Full-text

Machine Learning Based Indoor Localisation Using Wi-Fi And Smartphone

Journal of Independent Studies and Research - Computing ◽

10.31645/06 ◽

2020 ◽

Author(s):

Zulqarnain Khokhar ◽

◽

Murtaza Ahmed Siddiqi ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Indoor Localization ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Devices ◽

Gradient Boosting ◽

Learning Techniques ◽

Indoor Localisation

Wi-Fi based indoor positioning with the help of access points and smart devices have become an integral part in finding a device or a person’s location. Wi-Fi based indoor localization technology has been among the most attractive field for researchers for a number of years. In this paper, we have presented Wi-Fi based in-door localization using three different machine-learning techniques. The three machine learning algorithms implemented and compared are Decision Tree, Random Forest and Gradient Boosting classifier. After making a fingerprint of the floor based on Wi-Fi signals, mentioned algorithms were used to identify device location at thirty different positions on the floor. Random Forest and Gradient Boosting classifier were able to identify the location of the device with accuracy higher than 90%. While Decision Tree was able to identify the location with accuracy a bit higher than 80%.

Download Full-text

Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition

10.1101/599704 ◽

2019 ◽

Author(s):

Jaron Thompson ◽

Renee Johansen ◽

John Dunbar ◽

Brian Munsky

Keyword(s):

Neural Network ◽

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Organic Carbon ◽

Machine Learning Techniques ◽

Indicator Species Analysis ◽

Learning Techniques ◽

Species Analysis ◽

Microbiome Data

AbstractMicrobial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson’s correlation coefficients of .636 and .676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest.

Download Full-text

Prediction of Air-Conditioning Energy Consumption in R&D Building Using Multiple Machine Learning Techniques

Energies ◽

10.3390/en13071847 ◽

2020 ◽

Vol 13 (7) ◽

pp. 1847

Author(s):

Jun-Mao Liao ◽

Ming-Jui Chang ◽

Luh-Maan Chang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Energy Consumption ◽

Random Forest ◽

Energy Conservation ◽

Air Conditioning ◽

Machine Learning Techniques ◽

Support Vector ◽

Generic Model ◽

Learning Techniques

With the global increase in demand for energy, energy conservation of research and development buildings has become of primary importance for building owners. Knowledge based on the patterns in energy consumption of previous years could be used to predict the near-future energy usage of buildings, to optimize and facilitate more effective energy consumption. Hence, this research aimed to develop a generic model for predicting energy consumption. Air-conditioning was used to exemplify the generic model for electricity consumption, as it is the process that often consumes the most energy in a public building. The purpose of this paper is to present this model and the related findings. After causative factors were determined, the methods of linear regression and various machine learning techniques—including the earlier machine learning techniques of support vector machine, random forest, and multilayer perceptron, and the later machine learning techniques of deep neural network, recurrent neural network, long short-term memory, and gated recurrent unit—were applied for prediction. Among them, the prediction of random forest resulted in an R2 of 88% ahead of the first month and 81% ahead of the third month. These experimental results demonstrate that the prediction model is reliable and significantly accurate. Building owners could further enrich the model for energy conservation and management.

Download Full-text

Crop Quality Prediction Using Ml And Neural Networks

International Journal on Cybernetics & Informatics ◽

10.5121/ijci.2021.100202 ◽

2021 ◽

Vol 10 (02) ◽

pp. 07-11

Author(s):

Kanakaveti Narasimha Dheeraj ◽

Goutham. R. J ◽

Arthi. L

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Random Forest ◽

Machine Learning Techniques ◽

Quality Prediction ◽

Random Forest Regression ◽

Power Efficient ◽

Learning Techniques ◽

Better Than

Agriculture is said to be the backbone of the economy. Farmers toil hard with different kinds of crops to make good and healthy food for the country. There are more existing systems but uses outdated machine-learning techniques based on RNN( Recurrent neural network) which makes the process slower and more time-consuming. Here We are proposing a new CNN(Convolutional neural network ) based system which is fast and gives accurate results within seconds. CNN is power-efficient and is more suitable for real-time implementation. In this project, we use CNN algorithms which is very much better than the RNN algorithms used in the existing system.More parameters will be taken for the consideration of prediction in the proposed system. And we use Random Forest Regression, Multiple Linear Regression

Download Full-text

Detection and Severity Evaluation of Combined Rail Defects Using Deep Learning

Vibration ◽

10.3390/vibration4020022 ◽

2021 ◽

Vol 4 (2) ◽

pp. 341-356

Author(s):

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Mean Absolute Error ◽

Absolute Error ◽

Machine Learning Techniques ◽

Rolling Stock ◽

Raw Data ◽

Learning Techniques ◽

Combined Defects

Various techniques have been developed to detect railway defects. One of the popular techniques is machine learning. This unprecedented study applies deep learning, which is a branch of machine learning techniques, to detect and evaluate the severity of rail combined defects. The combined defects in the study are settlement and dipped joint. Features used to detect and evaluate the severity of combined defects are axle box accelerations simulated using a verified rolling stock dynamic behavior simulation called D-Track. A total of 1650 simulations are run to generate numerical data. Deep learning techniques used in the study are deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN). Simulated data are used in two ways: simplified data and raw data. Simplified data are used to develop the DNN model, while raw data are used to develop the CNN and RNN model. For simplified data, features are extracted from raw data, which are the weight of rolling stock, the speed of rolling stock, and three peak and bottom accelerations from two wheels of rolling stock. In total, there are 14 features used as simplified data for developing the DNN model. For raw data, time-domain accelerations are used directly to develop the CNN and RNN models without processing and data extraction. Hyperparameter tuning is performed to ensure that the performance of each model is optimized. Grid search is used for performing hyperparameter tuning. To detect the combined defects, the study proposes two approaches. The first approach uses one model to detect settlement and dipped joint, and the second approach uses two models to detect settlement and dipped joint separately. The results show that the CNN models of both approaches provide the same accuracy of 99%, so one model is good enough to detect settlement and dipped joint. To evaluate the severity of the combined defects, the study applies classification and regression concepts. Classification is used to evaluate the severity by categorizing defects into light, medium, and severe classes, and regression is used to estimate the size of defects. From the study, the CNN model is suitable for evaluating dipped joint severity with an accuracy of 84% and mean absolute error (MAE) of 1.25 mm, and the RNN model is suitable for evaluating settlement severity with an accuracy of 99% and mean absolute error (MAE) of 1.58 mm.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text