scholarly journals On Explaining Random Forests with SAT

Author(s):  
Yacine Izza ◽  
Joao Marques-Silva

Random Forest (RFs) are among the most widely used Machine Learning (ML) classifiers. Even though RFs are not interpretable, there are no dedicated non-heuristic approaches for computing explanations of RFs. Moreover, there is recent work on polynomial algorithms for explaining ML models, including naive Bayes classifiers. Hence, one question is whether finding explanations of RFs can be solved in polynomial time. This paper answers this question negatively, by proving that computing one PI-explanation of an RF is D^P-hard. Furthermore, the paper proposes a propositional encoding for computing explanations of RFs, thus enabling finding PI-explanations with a SAT solver. This contrasts with earlier work on explaining boosted trees (BTs) and neural networks (NNs), which requires encodings based on SMT/MILP. Experimental results, obtained on a wide range of publicly available datasets, demonstrate that the proposed SAT-based approach scales to RFs of sizes common in practical applications. Perhaps more importantly, the experimental results demonstrate that, for the vast majority of examples considered, the SAT-based approach proposed in this paper significantly outperforms existing heuristic approaches.

2020 ◽  
Vol 36 (2) ◽  
pp. 265-310 ◽  
Author(s):  
Morteza Asghari ◽  
Amir Dashti ◽  
Mashallah Rezakazemi ◽  
Ebrahim Jokar ◽  
Hadi Halakoei

AbstractArtificial neural networks (ANNs) as a powerful technique for solving complicated problems in membrane separation processes have been employed in a wide range of chemical engineering applications. ANNs can be used in the modeling of different processes more easily than other modeling methods. Besides that, the computing time in the design of a membrane separation plant is shorter compared to many mass transfer models. The membrane separation field requires an alternative model that can work alone or in parallel with theoretical or numerical types, which can be quicker and, many a time, much more reliable. They are helpful in cases when scientists do not thoroughly know the physical and chemical rules that govern systems. In ANN modeling, there is no requirement for a deep knowledge of the processes and mathematical equations that govern them. Neural networks are commonly used for the estimation of membrane performance characteristics such as the permeate flux and rejection over the entire range of the process variables, such as pressure, solute concentration, temperature, superficial flow velocity, etc. This review investigates the important aspects of ANNs such as methods of development and training, and modeling strategies in correlation with different types of applications [microfiltration (MF), ultrafiltration (UF), nanofiltration (NF), reverse osmosis (RO), electrodialysis (ED), etc.]. It also deals with particular types of ANNs that have been confirmed to be effective in practical applications and points out the advantages and disadvantages of using them. The combination of ANN with accurate model predictions and a mechanistic model with less accurate predictions that render physical and chemical laws can provide a thorough understanding of a process.


Author(s):  
Dr. C. Arunabala ◽  
P. Jwalitha ◽  
Soniya Nuthalapati

The traditional text sentiment analysis method is mainly based on machine learning. However, its dependence on emotion dictionary construction and artificial design and extraction features makes the generalization ability limited. In contrast, depth models have more powerful expressive power, and can learn complex mapping functions from data to affective semantics better. In this paper, a Convolution Neural Networks (CNNs) model combined with SVM text sentiment analysis is proposed. The experimental results show that the proposed method improves the accuracy of text sentiment classification effectively compared with traditional CNN, and confirms the effectiveness of sentiment analysis based on CNNs and SVM


2021 ◽  
Vol 5 (CHI PLAY) ◽  
pp. 1-29
Author(s):  
Alessandro Canossa ◽  
Dmitry Salimov ◽  
Ahmad Azadvar ◽  
Casper Harteveld ◽  
Georgios Yannakakis

Is it possible to detect toxicity in games just by observing in-game behavior? If so, what are the behavioral factors that will help machine learning to discover the unknown relationship between gameplay and toxic behavior? In this initial study, we examine whether it is possible to predict toxicity in the MOBA gameFor Honor by observing in-game behavior for players that have been labeled as toxic (i.e. players that have been sanctioned by Ubisoft community managers). We test our hypothesis of detecting toxicity through gameplay with a dataset of almost 1,800 sanctioned players, and comparing these sanctioned players with unsanctioned players. Sanctioned players are defined by their toxic action type (offensive behavior vs. unfair advantage) and degree of severity (warned vs. banned). Our findings, based on supervised learning with random forests, suggest that it is not only possible to behaviorally distinguish sanctioned from unsanctioned players based on selected features of gameplay; it is also possible to predict both the sanction severity (warned vs. banned) and the sanction type (offensive behavior vs. unfair advantage). In particular, all random forest models predict toxicity, its severity, and type, with an accuracy of at least 82%, on average, on unseen players. This research shows that observing in-game behavior can support the work of community managers in moderating and possibly containing the burden of toxic behavior.


2019 ◽  
Vol 24 (12) ◽  
pp. 9243-9256
Author(s):  
Jordan J. Bird ◽  
Anikó Ekárt ◽  
Diego R. Faria

Abstract In this work, we argue that the implications of pseudorandom and quantum-random number generators (PRNG and QRNG) inexplicably affect the performances and behaviours of various machine learning models that require a random input. These implications are yet to be explored in soft computing until this work. We use a CPU and a QPU to generate random numbers for multiple machine learning techniques. Random numbers are employed in the random initial weight distributions of dense and convolutional neural networks, in which results show a profound difference in learning patterns for the two. In 50 dense neural networks (25 PRNG/25 QRNG), QRNG increases over PRNG for accent classification at + 0.1%, and QRNG exceeded PRNG for mental state EEG classification by + 2.82%. In 50 convolutional neural networks (25 PRNG/25 QRNG), the MNIST and CIFAR-10 problems are benchmarked, and in MNIST the QRNG experiences a higher starting accuracy than the PRNG but ultimately only exceeds it by 0.02%. In CIFAR-10, the QRNG outperforms PRNG by + 0.92%. The n-random split of a Random Tree is enhanced towards and new Quantum Random Tree (QRT) model, which has differing classification abilities to its classical counterpart, 200 trees are trained and compared (100 PRNG/100 QRNG). Using the accent and EEG classification data sets, a QRT seemed inferior to a RT as it performed on average worse by − 0.12%. This pattern is also seen in the EEG classification problem, where a QRT performs worse than a RT by − 0.28%. Finally, the QRT is ensembled into a Quantum Random Forest (QRF), which also has a noticeable effect when compared to the standard Random Forest (RF). Ten to 100 ensembles of trees are benchmarked for the accent and EEG classification problems. In accent classification, the best RF (100 RT) outperforms the best QRF (100 QRF) by 0.14% accuracy. In EEG classification, the best RF (100 RT) outperforms the best QRF (100 QRT) by 0.08% but is extremely more complex, requiring twice the amount of trees in committee. All differences are observed to be situationally positive or negative and thus are likely data dependent in their observed functional behaviour.


Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3144 ◽  
Author(s):  
Sherif Said ◽  
Ilyes Boulkaibet ◽  
Murtaza Sheikh ◽  
Abdullah S. Karar ◽  
Samer Alkork ◽  
...  

In this paper, a customizable wearable 3D-printed bionic arm is designed, fabricated, and optimized for a right arm amputee. An experimental test has been conducted for the user, where control of the artificial bionic hand is accomplished successfully using surface electromyography (sEMG) signals acquired by a multi-channel wearable armband. The 3D-printed bionic arm was designed for the low cost of 295 USD, and was lightweight at 428 g. To facilitate a generic control of the bionic arm, sEMG data were collected for a set of gestures (fist, spread fingers, wave-in, wave-out) from a wide range of participants. The collected data were processed and features related to the gestures were extracted for the purpose of training a classifier. In this study, several classifiers based on neural networks, support vector machine, and decision trees were constructed, trained, and statistically compared. The support vector machine classifier was found to exhibit an 89.93% success rate. Real-time testing of the bionic arm with the optimum classifier is demonstrated.


Author(s):  
P. N. Botsaris ◽  
D. Bechrakis ◽  
P. D. Sparis

The intelligent control as fuzzy or artificial is based on either expert knowledge or experimental data and therefore it possesses intrinsic qualities like robustness and ease implementation. Lately, many researchers present studies aim to show that this kind of control can be used in practical applications such as the idle speed control problem in automotive industry. In this study, an estimation of an automobile three-way catalyst performance with artificial neural networks is presented. It may be an alternative approach for an on board diagnostic system (OBD) to predict the catalyst performance. This method was tested using data sets from two kind of catalysts, a brand new and an old one on a laboratory bench at idle speed. The catalyst operation during the “steady state” phase (the phase that the catalyst has reached its operating conditions and works normally) is examined. Further experiments are needed for different catalyst typed before the methods is proposed generally. It consists of 855 elements of catalyst inlet-outlet temperature difference (DT), hydrocarbons (HC), and carbon monoxide (CO) and carbon dioxide (CO2) emissions. The simulation: detects the values of HC, CO, CO2 using the DT as an input to our network forms a neural network. Results showed serious indications that artificial neural networks (or fuzzy logic control laws) could estimate the catalyst performance adequately depending their training process, if certain information about the catalyst system and the inputs and output of such system are known. In this study the “steady state” period experimental results are presented. In this paper the “steady state” period experimental results are presented.


Complexity ◽  
2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Yiming Jiang ◽  
Chenguang Yang ◽  
Jing Na ◽  
Guang Li ◽  
Yanan Li ◽  
...  

As an imitation of the biological nervous systems, neural networks (NNs), which have been characterized as powerful learning tools, are employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification, and patterns recognition. This article aims to bring a brief review of the state-of-the-art NNs for the complex nonlinear systems by summarizing recent progress of NNs in both theory and practical applications. Specifically, this survey also reviews a number of NN based robot control algorithms, including NN based manipulator control, NN based human-robot interaction, and NN based cognitive control.


2020 ◽  
Vol 11 (1) ◽  
pp. 28
Author(s):  
Witness MAAKE ◽  
Terence VAN ZYL

The research aims to investigate the role of hidden orders on the structure of the average market impact curves in the five BRICS financial markets. The concept of market impact is central to the implementation of cost-effective trading strategies during financial order executions. The literature is replicated using the data of visible orders from the five BRICS financial markets. We repeat the implementation of the literature to investigate the effect of hidden orders. We subsequently study the dynamics of hidden orders. The research applies machine learning to estimate the sizes of hidden orders. We revisit the methodology of the literature to compare the average market impact curves in which true hidden orders are added to visible orders to the average market impact curves in which hidden orders sizes are estimated via machine learning. The study discovers that: (1) hidden orders sizes could be uncovered via machine learning techniques such as Generalized Linear Models (GLM), Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Random Forests (RF); and (2) there exist no set of market features that are consistently predictive of the sizes of hidden orders across different stocks. Artificial Neural Networks produce large R2 and small Mean Squared Error on the prediction of hidden orders of individual stocks across the five studied markets. Random Forests produce the most appropriate average price impact curves of visible and estimated hidden orders that are closest to the average market impact curves of visible and true hidden orders. In some markets, hidden orders produce a convex power-law far-right tail in contrast to visible orders which produce a concave power-law far-right tail. Hidden orders may affect the average price impact curves for orders of size less than the average order size; meanwhile, hidden orders may not affect the structure of the average price impact curves in other markets. The research implies ANN and RF as the recommended tools to uncover hidden orders.


2020 ◽  
Vol 8 (6) ◽  
pp. 1623-1630

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.


10.2196/23938 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23938
Author(s):  
Ruairi O'Driscoll ◽  
Jake Turicchi ◽  
Mark Hopkins ◽  
Cristiana Duarte ◽  
Graham W Horgan ◽  
...  

Background Accurate solutions for the estimation of physical activity and energy expenditure at scale are needed for a range of medical and health research fields. Machine learning techniques show promise in research-grade accelerometers, and some evidence indicates that these techniques can be applied to more scalable commercial devices. Objective This study aims to test the validity and out-of-sample generalizability of algorithms for the prediction of energy expenditure in several wearables (ie, Fitbit Charge 2, ActiGraph GT3-x, SenseWear Armband Mini, and Polar H7) using two laboratory data sets comprising different activities. Methods Two laboratory studies (study 1: n=59, age 44.4 years, weight 75.7 kg; study 2: n=30, age=31.9 years, weight=70.6 kg), in which adult participants performed a sequential lab-based activity protocol consisting of resting, household, ambulatory, and nonambulatory tasks, were combined in this study. In both studies, accelerometer and physiological data were collected from the wearables alongside energy expenditure using indirect calorimetry. Three regression algorithms were used to predict metabolic equivalents (METs; ie, random forest, gradient boosting, and neural networks), and five classification algorithms (ie, k-nearest neighbor, support vector machine, random forest, gradient boosting, and neural networks) were used for physical activity intensity classification as sedentary, light, or moderate to vigorous. Algorithms were evaluated using leave-one-subject-out cross-validations and out-of-sample validations. Results The root mean square error (RMSE) was lowest for gradient boosting applied to SenseWear and Polar H7 data (0.91 METs), and in the classification task, gradient boost applied to SenseWear and Polar H7 was the most accurate (85.5%). Fitbit models achieved an RMSE of 1.36 METs and 78.2% accuracy for classification. Errors tended to increase in out-of-sample validations with the SenseWear neural network achieving RMSE values of 1.22 METs in the regression tasks and the SenseWear gradient boost and random forest achieving an accuracy of 80% in classification tasks. Conclusions Algorithms trained on combined data sets demonstrated high predictive accuracy, with a tendency for superior performance of random forests and gradient boosting for most but not all wearable devices. Predictions were poorer in the between-study validations, which creates uncertainty regarding the generalizability of the tested algorithms.


Sign in / Sign up

Export Citation Format

Share Document