scholarly journals A Random Forest Based Classifier for Error Prediction of Highly Individualized Products

Author(s):  
Gerd Gröner
Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2160
Author(s):  
Jean-Marie Lepioufle ◽  
Leif Marsteen ◽  
Mona Johnsrud

Instead of a flag valid/non-valid usually proposed in the quality control (QC) processes of air quality (AQ), we proposed a method that predicts the p-value of each observation as a value between 0 and 1. We based our error predictions on three approaches: the one proposed by the Working Group on Guidance for the Demonstration of Equivalence (European Commission (2010)), the one proposed by Wager (Journal of MachineLearningResearch, 15, 1625–1651 (2014)) and the one proposed by Lu (Journal of MachineLearningResearch, 22, 1–41 (2021)). Total Error framework enables to differentiate the different errors: input, output, structural modeling and remnant. We thus theoretically described a one-site AQ prediction based on a multi-site network using Random Forest for regression in a Total Error framework. We demonstrated the methodology with a dataset of hourly nitrogen dioxide measured by a network of monitoring stations located in Oslo, Norway and implemented the error predictions for the three approaches. The results indicate that a simple one-site AQ prediction based on a multi-site network using Random Forest for regression provides moderate metrics for fixed stations. According to the diagnostic based on predictive qq-plot and among the three approaches used in this study, the approach proposed by Lu provides better error predictions. Furthermore, ensuring a high precision of the error prediction requires efforts on getting accurate input, output and prediction model and limiting our lack of knowledge about the “true” AQ phenomena. We put effort in quantifying each type of error involved in the error prediction to assess the error prediction model and further improving it in terms of performance and precision.


2018 ◽  
Vol 5 (1) ◽  
pp. 47-55
Author(s):  
Florensia Unggul Damayanti

Data mining help industries create intelligent decision on complex problems. Data mining algorithm can be applied to the data in order to forecasting, identity pattern, make rules and recommendations, analyze the sequence in complex data sets and retrieve fresh insights. Yet, increasing of technology and various techniques among data mining availability data give opportunity to industries to explore and gain valuable information from their data and use the information to support business decision making. This paper implement classification data mining in order to retrieve knowledge in customer databases to support marketing department while planning strategy for predict plan premium. The dataset decompose into conceptual analytic to identify characteristic data that can be used as input parameter of data mining model. Business decision and application is characterized by processing step, processing characteristic and processing outcome (Seng, J.L., Chen T.C. 2010). This paper set up experimental of data mining based on J48 and Random Forest classifiers and put a light on performance evaluation between J48 and random forest in the context of dataset in insurance industries. The experiment result are about classification accuracy and efficiency of J48 and Random Forest , also find out the most attribute that can be used to predict plan premium in context of strategic planning to support business strategy.


2019 ◽  
Vol 139 (8) ◽  
pp. 850-857
Author(s):  
Hiromu Imaji ◽  
Takuya Kinoshita ◽  
Toru Yamamoto ◽  
Keisuke Ito ◽  
Masahiro Yoshida ◽  
...  

Author(s):  
Eesha Goel ◽  
◽  
Er. Abhilasha ◽  
Keyword(s):  

Author(s):  
Jun Pei ◽  
Zheng Zheng ◽  
Hyunji Kim ◽  
Lin Song ◽  
Sarah Walworth ◽  
...  

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>


Sign in / Sign up

Export Citation Format

Share Document