Economic Granularity Interval in Decision Tree Algorithm Standardization from an Open Innovation Perspective: Towards a Platform for Sustainable Matching

In the context of the application of artificial intelligence in an intellectual property trading platform, the number of demanders and suppliers that exchange scarce resources is growing continuously. Improvement of computational power promotes matching efficiency significantly. It is necessary to greatly reduce energy consumption in order to realize the machine learning process in terminals and microprocessors in edge computing (smart phones, wearable devices, automobiles, IoT devices, etc.) and reduce the resource burden of data centers. Machine learning algorithms generated in an open community lack standardization in practice, and hence require open innovation participation to reduce computing cost, shorten algorithm running time, and improve human-machine collaborative competitiveness. The purpose of this study was to find an economic range of the granularity in a decision tree, a popular machine learning algorithm. This work addresses the research questions of what the economic tree depth interval is and what the corresponding time cost is with increasing granularity given the number of matches. This study also aimed to balance the efficiency and cost via simulation. Results show that the benefit of decreasing the tree search depth brought by the increased evaluation granularity is not linear, which means that, in a given number of candidate matches, the granularity has a definite and relatively economical range. The selection of specific evaluation granularity in this range can obtain a smaller tree depth and avoid the occurrence of low efficiency, which is the excessive increase in the time cost. Hence, the standardization of an AI algorithm is applicable to edge computing scenarios, such as an intellectual property trading platform. The economic granularity interval can not only save computing resource costs but also save AI decision-making time and avoid human decision-maker time cost.

Download Full-text

Using Supervised Machine Learning Algorithms for Automated Lithology Prediction from Wireline Log Data

10.2118/208559-ms ◽

2021 ◽

Author(s):

Marian Popescu ◽

Rebecca Head ◽

Tim Ferriday ◽

Kate Evans ◽

Jose Montero ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Training Dataset ◽

Depth Interval ◽

Log Data ◽

Machine Learning Approach ◽

Lithology Prediction ◽

Logging While Drilling

Abstract This paper presents advancements in machine learning and cloud deployment that enable rapid and accurate automated lithology interpretation. A supervised machine learning technique is described that enables rapid, consistent, and accurate lithology prediction alongside quantitative uncertainty from large wireline or logging-while-drilling (LWD) datasets. To leverage supervised machine learning, a team of geoscientists and petrophysicists made detailed lithology interpretations of wells to generate a comprehensive training dataset. Lithology interpretations were based on applying determinist cross-plotting by utilizing and combining various raw logs. This training dataset was used to develop a model and test a machine learning pipeline. The pipeline was applied to a dataset previously unseen by the algorithm, to predict lithology. A quality checking process was performed by a petrophysicist to validate new predictions delivered by the pipeline against human interpretations. Confidence in the interpretations was assessed in two ways. The prior probability was calculated, a measure of confidence in the input data being recognized by the model. Posterior probability was calculated, which quantifies the likelihood that a specified depth interval comprises a given lithology. The supervised machine learning algorithm ensured that the wells were interpreted consistently by removing interpreter biases and inconsistencies. The scalability of cloud computing enabled a large log dataset to be interpreted rapidly; >100 wells were interpreted consistently in five minutes, yielding >70% lithological match to the human petrophysical interpretation. Supervised machine learning methods have strong potential for classifying lithology from log data because: 1) they can automatically define complex, non-parametric, multi-variate relationships across several input logs; and 2) they allow classifications to be quantified confidently. Furthermore, this approach captured the knowledge and nuances of an interpreter's decisions by training the algorithm using human-interpreted labels. In the hydrocarbon industry, the quantity of generated data is predicted to increase by >300% between 2018 and 2023 (IDC, Worldwide Global DataSphere Forecast, 2019–2023). Additionally, the industry holds vast legacy data. This supervised machine learning approach can unlock the potential of some of these datasets by providing consistent lithology interpretations rapidly, allowing resources to be used more effectively.

Download Full-text

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

ACTIVITY RECOGNITION FOR AMBIENT SENSING DATA AND RULE BASED ANOMALY DETECTION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliv-4-w3-2020-379-2020 ◽

2020 ◽

Vol XLIV-4/W3-2020 ◽

pp. 379-382

Author(s):

E. Seyedkazemi Ardebili ◽

S. Eken ◽

K. Küçük

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Smart Home ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Mathematical Methods ◽

Abnormal State ◽

Intelligent Management ◽

Ambient Sensing ◽

Multiple Residents

Abstract. After a brief look at the smart home, we conclude that to have a smart home, and it is necessary to have an intelligent management center. In this article, We have tried to make it possible for the smart home management center to be able to detect the presence of an abnormal state in the behavior of someone who lives in the house. In the proposed method, the daily algorithm examines the rate of changes of a person and provides a number which is henceforth called NNC (Number of normal changes) based on the person’s behavioral changes. We achieve the NNC number using a machine learning algorithm and performing a series of several simple statistical and mathematical calculations. NNC is a number that shows abnormal changes in residents’ behaviors in a smart home, i.e., this number is a small number for a regular person with constant planning and for a person who may not have any fixed principles and regular in personal life is a big number.To increase our accuracy in calculating NNC, we review all common machine learning algorithms and after tests we choose the decision tree because of its higher accuracy and speed and finally, NNC number is obtained by combining the Decision Tree algorithm with statistical and mathematical methods. In this method, we present a set of states and information obtained from the sensors along with the activities performed by the occupant of the house over a period of several days to the proposed algorithm. and the method ahead generates the main NNC number for those days for anyone living in a smart home. To generate this main NNC, we calculate each person’s daily NNC. That means we have daily NNCs for each person (based on his/her behaviors on that day) and the main NNC is the average of these daily NNC. We chose ARAS dataset (Human Activity Datasets in Multiple Homes with Multiple Residents) to implement our method and after tests and replications on the ARAS dataset, and to find anomalies in each person’s behavior in a day, we compare the main (average) NNC with that person’s daily NNC on that day. Finally, we can say, if the main NNC changes more than 30%, there is a possibility of an abnormality. and if the NNC changes more than 60% percent, we can say that an abnormal state or an uncommon event happened that day, and a declaration of an abnormal state will be issued to the resident of the house.

Download Full-text

Accurate Integrated System to detect Pulmonary and Extra Pulmonary Tuberculosis using Machine Learning Algorithms

INTELIGENCIA ARTIFICIAL ◽

10.4114/intartif.vol24iss68pp104-122 ◽

2021 ◽

Vol 24 (68) ◽

pp. 104-122

Author(s):

Rupinder Kaur ◽

Anurag Sharma

Keyword(s):

Machine Learning ◽

Pulmonary Tuberculosis ◽

Decision Tree ◽

Learning Algorithm ◽

Learning Algorithms ◽

Integrated System ◽

Machine Learning Algorithms ◽

Validation Dataset ◽

Extra Pulmonary Tuberculosis ◽

Extra Pulmonary

Several studies have been reported the use of machine learning algorithms in the detection of Tuberculosis, but studies that discuss the detection of both types of TB, i.e., Pulmonary and Extra Pulmonary Tuberculosis, using machine learning algorithms are lacking. Therefore, an integrated system based on machine learning models has been proposed in this paper to assist doctors and radiologists in interpreting patients’ data to detect of PTB and EPTB. Three basic machine learning algorithms, Decision Tree, Naïve Bayes, SVM, have been used to predict and compare their performance. The clinical data and the image data are used as input to the models and these datasets have been collected from various hospitals of Jalandhar, Punjab, India. The dataset used to train the model comprises 200 patients’ data containing 90 PTB patients, 67 EPTB patients, and 43 patients having NO TB. The validation dataset contains 49 patients, which exhibited the best accuracy of 95% for classifying PTB and EPTB using Decision Tree, a machine learning algorithm.

Download Full-text

Utilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm

Asian Journal of Applied Sciences ◽

10.24203/ajas.v9i1.6503 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Maad M. Mijwil ◽

Rana A. Abttan

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Decision Tree ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Continuous Data ◽

Decision Tree Algorithm ◽

Data Set ◽

Confidence Factor ◽

C4.5 Decision Tree

A decision tree (DTs) is one of the most popular machine learning algorithms that divide data repeatedly to form groups or classes. It is a supervised learning algorithm that can be used on discrete or continuous data for classification or regression. The most traditional classifier in this algorithm is the C4.5 decision tree, which is the point of this research. This classifier has the advantage of building a vast data set and does not stop until it reaches the desired goal. The problem with this classifier is that there are unnecessary nodes and branches leading to overfitting. This overfitting can negatively affect the classification process. In this context, the authors suggest utilizing a genetic algorithm to prune the effect of overfitting. This dataset study consists of four datasets: IRIS, Car Evaluation, GLASS, and WINE collected from UC Irvine (UCI) machine learning repository. The experimental results have confirmed the effectiveness of the genetic algorithm in pruning the effect of overfitting on the four datasets and optimizing confidence factor (CF) of the C4.5 decision tree. The proposed method has reached about 92% accuracy in this work.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets

Sensors ◽

10.3390/s21020656 ◽

2021 ◽

Vol 21 (2) ◽

pp. 656

Author(s):

Xavier Larriva-Novo ◽

Víctor A. Villagrá ◽

Mario Vega-Barbas ◽

Diego Rivera ◽

Mario Sanz Rodrigo

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

High Performance ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Statistical Characteristics ◽

Detection Techniques ◽

Traffic Characteristics ◽

Benchmark Datasets

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.

Download Full-text

CAFD: Context-Aware Fault Diagnostic Scheme towards Sensor Faults Utilizing Machine Learning

Sensors ◽

10.3390/s21020617 ◽

2021 ◽

Vol 21 (2) ◽

pp. 617

Author(s):

Umer Saeed ◽

Young-Doo Lee ◽

Sana Ullah Jan ◽

Insoo Koo

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Diagnostic System ◽

Machine Learning Algorithms ◽

Support Vector ◽

Context Aware ◽

Sensor Faults ◽

Training Time ◽

Low Intensity ◽

Fault Diagnostic

Sensors’ existence as a key component of Cyber-Physical Systems makes it susceptible to failures due to complex environments, low-quality production, and aging. When defective, sensors either stop communicating or convey incorrect information. These unsteady situations threaten the safety, economy, and reliability of a system. The objective of this study is to construct a lightweight machine learning-based fault detection and diagnostic system within the limited energy resources, memory, and computation of a Wireless Sensor Network (WSN). In this paper, a Context-Aware Fault Diagnostic (CAFD) scheme is proposed based on an ensemble learning algorithm called Extra-Trees. To evaluate the performance of the proposed scheme, a realistic WSN scenario composed of humidity and temperature sensor observations is replicated with extreme low-intensity faults. Six commonly occurring types of sensor fault are considered: drift, hard-over/bias, spike, erratic/precision degradation, stuck, and data-loss. The proposed CAFD scheme reveals the ability to accurately detect and diagnose low-intensity sensor faults in a timely manner. Moreover, the efficiency of the Extra-Trees algorithm in terms of diagnostic accuracy, F1-score, ROC-AUC, and training time is demonstrated by comparison with cutting-edge machine learning algorithms: a Support Vector Machine and a Neural Network.

Download Full-text

Machine Learning Algorithms, Applied to Intact Islets of Langerhans, Demonstrate Significantly Enhanced Insulin Staining at the Capillary Interface of Human Pancreatic β Cells

Metabolites ◽

10.3390/metabo11060363 ◽

2021 ◽

Vol 11 (6) ◽

pp. 363

Author(s):

Louise Cottle ◽

Ian Gilroy ◽

Kylie Deng ◽

Thomas Loudovaris ◽

Helen E. Thomas ◽

...

Keyword(s):

Machine Learning ◽

Islets Of Langerhans ◽

Learning Algorithm ◽

Contact Point ◽

Machine Learning Algorithms ◽

Automated Image Analysis ◽

Β Cells ◽

Pancreatic Β Cells ◽

Blood Glucose Concentrations ◽

Insulin Staining

Pancreatic β cells secrete the hormone insulin into the bloodstream and are critical in the control of blood glucose concentrations. β cells are clustered in the micro-organs of the islets of Langerhans, which have a rich capillary network. Recent work has highlighted the intimate spatial connections between β cells and these capillaries, which lead to the targeting of insulin secretion to the region where the β cells contact the capillary basement membrane. In addition, β cells orientate with respect to the capillary contact point and many proteins are differentially distributed at the capillary interface compared with the rest of the cell. Here, we set out to develop an automated image analysis approach to identify individual β cells within intact islets and to determine if the distribution of insulin across the cells was polarised. Our results show that a U-Net machine learning algorithm correctly identified β cells and their orientation with respect to the capillaries. Using this information, we then quantified insulin distribution across the β cells to show enrichment at the capillary interface. We conclude that machine learning is a useful analytical tool to interrogate large image datasets and analyse sub-cellular organisation.

Download Full-text