scholarly journals Cardiotocography Data Analysis to Predict Fetal Health Risks with Tree-Based Ensemble Learning

Author(s):  
Pankaj Bhowmik ◽  
◽  
Pulak Chandra Bhowmik ◽  
U. A. Md. Ehsan Ali ◽  
Md. Sohrawordi

A sizeable number of women face difficulties during pregnancy, which eventually can lead the fetus towards serious health problems. However, early detection of these risks can save both the invaluable life of infants and mothers. Cardiotocography (CTG) data provides sophisticated information by monitoring the heart rate signal of the fetus, is used to predict the potential risks of fetal wellbeing and for making clinical conclusions. This paper proposed to analyze the antepartum CTG data (available on UCI Machine Learning Repository) and develop an efficient tree-based ensemble learning (EL) classifier model to predict fetal health status. In this study, EL considers the Stacking approach, and a concise overview of this approach is discussed and developed accordingly. The study also endeavors to apply distinct machine learning algorithmic techniques on the CTG dataset and determine their performances. The Stacking EL technique, in this paper, involves four tree-based machine learning algorithms, namely, Random Forest classifier, Decision Tree classifier, Extra Trees classifier, and Deep Forest classifier as base learners. The CTG dataset contains 21 features, but only 10 most important features are selected from the dataset with the Chi-square method for this experiment, and then the features are normalized with Min-Max scaling. Following that, Grid Search is applied for tuning the hyperparameters of the base algorithms. Subsequently, 10-folds cross validation is performed to select the meta learner of the EL classifier model. However, a comparative model assessment is made between the individual base learning algorithms and the EL classifier model; and the finding depicts EL classifiers’ superiority in fetal health risks prediction with securing the accuracy of about 96.05%. Eventually, this study concludes that the Stacking EL approach can be a substantial paradigm in machine learning studies to improve models’ accuracy and reduce the error rate.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Utkarsh Saxena ◽  
Soumen Moulik ◽  
Soumya Ranjan Nayak ◽  
Thomas Hanne ◽  
Diptendu Sinha Roy

We attempt to predict the accidental fall of human beings due to sudden abnormal changes in their health parameters such as blood pressure, heart rate, and sugar level. In medical terminology, this problem is known as Syncope. The primary motivation is to prevent such falls by predicting abnormal changes in these health parameters that might trigger a sudden fall. We apply various machine learning algorithms such as logistic regression, a decision tree classifier, a random forest classifier, K-Nearest Neighbours (KNN), a support vector machine, and a naive Bayes classifier on a relevant dataset and verify our results with the cross-validation method. We observe that the KNN algorithm provides the best accuracy in predicting such a fall. However, the accuracy results of some other algorithms are also very close. Thus, we move one step further and propose an ensemble model, Majority Voting, which aggregates the prediction results of multiple machine learning algorithms and finally indicates the probability of a fall that corresponds to a particular human being. The proposed ensemble algorithm yields 87.42% accuracy, which is greater than the accuracy provided by the KNN algorithm.


2020 ◽  
Vol 9 (1) ◽  
pp. 1894-1899 ◽  

The number of internet users has increased exponentially over the years and so have increased intrusive activities significantly. To detect an intrusion attack in a system connected over a network is one of the most challenging tasks in today’s world. A significant number of techniques have been developed which are based on machine learning approaches to detect these intrusion attacks. Even though these techniques are good, they are not good enough to detect all kinds of attacks. In this paper, the analysis of different machine learning algorithm will be performed on the NSL-KDD dataset with pre-processing steps like One-hot encoding, feature selection and random sampling to use in different machine learning models to find the best performing model to detect these attacks. The attacks are from the datasets are classified into four types of attacks: Probe, DoS, U2R, R2L while the non- attack is the Normal. The dataset is in two parts: KDD-Train and KDD-Test. The dataset is trained and tested to find accuracy and understand the performance of different machine learning algorithms and compare them. The Machine Learning algorithms used are Naive Bayes Classifier, Decision Tree Classifier, Random Forest Classifier, KNeighbours Classifier, Logistic Regression, SVM Classifier, Voting Classifier. These techniques are compared according to their capability to detect the attacks. This comparison will help to find the algorithm which would work the best to detect different kinds of intrusion attacks.


Author(s):  
Sheikh Shehzad Ahmed

The Internet is used practically everywhere in today's digital environment. With the increased use of the Internet comes an increase in the number of threats. DDoS attacks are one of the most popular types of cyber-attacks nowadays. With the fast advancement of technology, the harm caused by DDoS attacks has grown increasingly severe. Because DDoS attacks may readily modify the ports/protocols utilized or how they function, the basic features of these attacks must be examined. Machine learning approaches have also been used extensively in intrusion detection research. Still, it is unclear what features are applicable and which approach would be better suited for detection. With this in mind, the research presents a machine learning-based DDoS attack detection approach. To train the attack detection model, we employ four Machine Learning algorithms: Decision Tree classifier (ID3), k-Nearest Neighbors (k-NN), Logistic Regression, and Random Forest classifier. The results of our experiments show that the Random Forest classifier is more accurate in recognizing attacks.


Author(s):  
Hyontai Sug

For the classification task of machine learning algorithms independency between conditional attributes is a precondition for success of data mining. On the other hand, decision trees are one of the mostly used machine learning algorithms because of their good understandability. So, because dependency between conditional attributes can cause more complex trees, supplying conditional attributes independent each other is very important, the requirement of conditional attributes for decision trees as well as other machine learning algorithms is that they are independent each other and dependent on decisional attributes only. Statistical method to check independence between attributes is Chi-square test, but the test can be effective for categorical attributes only. So, the applicability of Chi-square test is limited, because most datasets for data mining have mixed attributes of categorical and numerical. In order to overcome the problem, and as a way to test dependency between conditional attributes, a novel method based on functional dependency based on data that can be applied to any datasets irrespective of data type of attributes is suggested. After removing highly dependent attributes between conditional attributes, we can generate better decision trees. Experiments were performed to show that the method is effective, and the experiments showed very good results.


2021 ◽  
Vol 23 (4) ◽  
pp. 1-21
Author(s):  
Nureni Ayofe AZEEZ ◽  
Sanjay Misra ◽  
Omotola Ifeoluwa LAWAL ◽  
Jonathan Oluranti

The use of social media platforms such as Facebook, Twitter, Instagram, WhatsApp, etc. have enabled a lot of people to communicate effectively and frequently with each other and this has enabled cyberbullying to occur more frequently while using these networks. Cyberbullying is known to be the cause of some serious health issues among social media users and creating a way to identify and detect this holds significant importance. This paper takes a look at unique features gotten from the Facebook dataset and develops a model that identifies and detect cyberbullying posts by applying machine learning algorithms (Naïve Bayes Algorithm and K-Nearest Neighbor). The project also uses a feature selection algorithm namely x2 test (Chi-Square test) to select important features which can improve the performance of the classifiers and decrease classification time. The result of this paper tends to detect cyberbullying in Facebook with a high degree of accuracy and also improve the performance of the machine learning classifiers.


Frauds in Financial Payment Services are the most prevalent form of cybercrime. The increased growth in e-commerce and mobile payments in recent years is behind the rising incidence of fraud in financial payment services. According to "McKinsey, fraud losses throughout the world could be close to $44 billion by 2025." Every year, fraudulent card transactions causes billions of US Dollar of loss. To reduce these losses, designing effective fraud detection algorithms is essential, which depend on sophisticated machine learning methods to help investigators in fraud. For banks and financial institutions, therefore, fraud detection systems have gained excellent significance. Though the fake transactions are very low when compared to genuine transaction, care must be taken to predict it so that the financial institutions can maintain the customer integrity. As fraud is unlikely to occur compared to normal operations, we have the class imbalance problem. We applied Synthetic Minority Oversampling TEchnique (SMOTE) and the Ensemble of sampling methods(Balanced Random Forest Classifier, Balanced Bagging Classifier, Easy Ensemble Classifier, RUS Boost) to Ensemble machine learning algorithms Performance assessment using sensitivity, specificity, precision, ROC area. The purpose of this article is to analyze different predictive models to see how precise they are to detect whether a transaction is a standard payment or a fraud. Instead of misclassifying a real transaction as fraud, this model seeks to improve detection of fraud. We noted that the technique of Ensemble learning using Maximum voting detects the fraud better than other classifiers. Decision Tree Classifier, Logistic Regression, Balanced Bagging classifier is combined and the proposed algorithm is OptimizedEnsembleFD Algorithm. The sample size is increased and deep learning is applied .It is found that the proposed system Smote Regularised Deep Autoencoders (SRD Autoencoders) neural network performs better with good recall and accuracy for this large dataset.


Author(s):  
Komal Bhaskar Thube

A programming language is a computer language developers use to develop software programs, scripts, or other sets of instruction for computers to execute. It is difficult to determine which programming language is widely used. In our work, I have analyzed and compared the classification results of various machine learning models and find out which programming language is widely used by developers. I have used Support Vector Machine (SVM), K neighbor classifier (KNN),Decision Tree Classifier(CART) for our comparative study. My task is to analyze different data and to classify them for the efficiency of each algorithm in terms of accuracy, precision, recall, and F1 Score. My best accuracy was 94.29% percent which was found using SVM. These techniques are coded in python and executed in Jupyter NoteBook, the Scientific Python Development Environment. Our experiments have shown that SVM is the best for predictive analysis and from our study that SVM is the well-suited algorithm for the prediction of the most widely used programming language.


2022 ◽  
Vol 14 (2) ◽  
pp. 271
Author(s):  
Yinghui Zhao ◽  
Ye Ma ◽  
Lindi Quackenbush ◽  
Zhen Zhen

Individual-tree aboveground biomass (AGB) estimation can highlight the spatial distribution of AGB and is vital for precision forestry. Accurately estimating individual tree AGB is a requisite for accurate forest carbon stock assessment of natural secondary forests (NSFs). In this study, we investigated the performance of three machine learning and three ensemble learning algorithms in tree species classification based on airborne laser scanning (ALS) and WorldView-3 imagery, inversed the diameter at breast height (DBH) using an optimal tree height curve model, and mapped individual tree AGB for a site in northeast China using additive biomass equations, tree species, and inversed DBH. The results showed that the combination of ALS and WorldView-3 performed better than either single data source in tree species classification, and ensemble learning algorithms outperformed machine learning algorithms (except CNN). Seven tree species had satisfactory accuracy of individual tree AGB estimation, with R2 values ranging from 0.68 to 0.85 and RMSE ranging from 7.47 kg to 36.83kg. The average individual tree AGB was 125.32 kg and the forest AGB was 113.58 Mg/ha in the Maoershan study site in Heilongjiang Province, China. This study provides a way to classify tree species and estimate individual tree AGB of NSFs based on ALS data and WorldView-3 imagery.


Author(s):  
Angela More

Abstract: Data analytics play vital roles in diagnosis and treatment in the health care sector. To enable practitioner decisionmaking, huge volumes of data should be processed with machine learning techniques to produce tools for prediction and classification Breast Cancer reports 1 million cases per year. We have proposed a prediction model, which is specifically designed for prediction of Breast Cancer using Machine learning algorithms Decision tree classifier, Naïve Bayes, SVM and KNearest Neighbour algorithms. The model predicts the type of tumour, the tumour can be benign (noncancerous) or malignant (cancerous) . The model uses supervised learning which is a machine learning concept where we provide dependent and independent columns to machine. It uses classification technique which predicts the type of tumour. Keywords: Cancer, Machine learning, Prediction, Data Visualization, SVM, Naïve Bayes, Classification.


Sign in / Sign up

Export Citation Format

Share Document