cart algorithm
Recently Published Documents


TOTAL DOCUMENTS

139
(FIVE YEARS 84)

H-INDEX

9
(FIVE YEARS 3)

2021 ◽  
Vol 6 (2) ◽  
pp. 127-136
Author(s):  
Pungkas Subarkah ◽  
Ali Nur Ikhsan

With the increase in internet users and the development of technology, the threats to its security are increasingly diverse. One of them is phishing which is the most important issue in cyberspace. Phishing is a threatening and trapping activity someone by luring the target to indirectly provide information to the trapper. The number of phishing crimes, this has the potential to cause several losses, one of which is namely about the loss of privacy of a person or company. This study aims to identify phishing websites. The Classification And Regression Trees (CART) algorithm is one of the classification algorithms, and the dataset in this research taken from the UCI Repository Learning obtained from the University of Huddersfield. The method used in this research is problem identification, data collection, pre-processing stage, use of the CART algorithm, validation and evaluation and withdrawal conclusion. Based on the test results obtained the value of accuracy of 95.28%. Thus the value of the accuracy obtained using the CART algorithm of 95.28% categorized very good classification.


2021 ◽  
Vol 34 (2) ◽  
pp. 42-63
Author(s):  
Cristiano Mauro Assis Gomes ◽  
Gina C Lemos ◽  
Enio G. Jelihovschi

Any quantitative method is shaped by certain rules or assumptions which constitute its own rationale. It is not by chance that these assumptions determine the conditions and constraints which permit the evidence to be constructed. In this article, we argue why the Regression Tree Method’s rationale is more suitable than General Linear Model to analyze complex educational datasets. Furthermore, we apply the CART algorithm of Regression Tree Method and the Multiple Linear Regression in a model with 53 predictors, taking as outcome the students’ scores in reading of the 2011’s edition of the National Exam of Upper Secondary Education (ENEM; N = 3,670,089), which is a complex educational dataset. This empirical comparison illustrates how the Regression Tree Method is better suitable than General Linear Model for furnishing evidence about non-linear relationships, as well as, to deal with nominal variables with many categories and ordinal variables. We conclude that the Regression Tree Method constructs better evidence about the relationships between the predictors and the outcome in complex datasets.


Energies ◽  
2021 ◽  
Vol 14 (24) ◽  
pp. 8534
Author(s):  
Małgorzata Grzelak ◽  
Magdalena Rykała

One of the main threats to ecological safety is the increased emissions of greenhouse gases. Promoting the purchase of electric vehicles and increasing their share among all cars in a given country can be considered as activities reducing the emissions of CO2 into the atmosphere. Based on Environmental Performance Index, in 2021, Poland is in 37th place among the most climate-friendly countries in the world, and 30th among similar countries in Europe. The aim of the article was to model the prices of electric vehicles as one of the elements of promoting climate security in Poland. For the purposes of the study, an analysis of data from electric vehicle sales advertisements on one of the Polish automotive services was carried out. Moreover, on this basis, the most important factors influencing the price of the vehicle were analyzed. For this purpose, forecasting models were built based on neural networks and selected models of decision trees based on the CART algorithm, boosted trees, and random forest. We assessed the developed models and compared their prognostic abilities.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jia-Cheng Shi ◽  
Xiao-Huan Chen ◽  
Qiong Yang ◽  
Cai-Mei Wang ◽  
Qian Huang ◽  
...  

AbstractCurrently, the most widely used screening methods for hyperuricemia (HUA) involves invasive laboratory tests, which are lacking in many rural hospitals in China. This study explored the use of non-invasive physical examinations to construct a simple prediction model for HUA, in order to reduce the economic burden and invasive operations such as blood sampling, and provide some help for the health management of people in poor areas with backward medical resources. Data of 9252 adults from April to June 2017 in the Affiliated Hospital of Guilin Medical College were collected and divided randomly into a training set (n = 6364) and a validation set (n = 2888) at a ratio of 7:3. In the training set, non-invasive physical examination indicators of age, gender, body mass index (BMI) and prevalence of hypertension were included for logistic regression analysis, and a nomogram model was established. The classification and regression tree (CART) algorithm of the decision tree model was used to build a classification tree model. Receiver operating characteristic (ROC) curve, calibration curve and decision curve analyses (DCA) were used to test the distinction, accuracy and clinical applicability of the two models. The results showed age, gender, BMI and prevalence of hypertension were all related to the occurrence of HUA. The area under the ROC curve (AUC) of the nomogram model was 0.806 and 0.791 in training set and validation set, respectively. The AUC of the classification tree model was 0.802 and 0.794 in the two sets, respectively, but were not statistically different. The calibration curves and DCAs of the two models performed well on accuracy and clinical practicality, which suggested these models may be suitable to predict HUA for rural setting.


Energies ◽  
2021 ◽  
Vol 14 (23) ◽  
pp. 8011
Author(s):  
Yan Ding ◽  
Zhe Ji ◽  
Peng Liu ◽  
Zhiqiang Wu ◽  
Gang Li ◽  
...  

With the requirement of reduced carbon emissions and air pollution, it has become much more important to monitor the oil quality used in heavy-duty vehicles, which have more than 2/3 transportation emissions. Some gas stations may provide unqualified fuel, resulting in uncontrollable emissions, which is a big challenge for environmental protection. Based on this focus, a gas station recognition method is proposed in this paper. Combining the CART algorithm with the DBSCAN clustering algorithm, the locations of gas stations were detected and recognized. Then, the oil quality analysis of these gas stations could be effectively evaluated from oil stability and vehicle emissions. Massive real-world data operating in Tangshan, China, collected from the Heavy-duty Vehicle Remote Emission Service and Management Platform, were used to verify the accuracy and robustness of the proposed model. The results illustrated that the proposed model can not only accurately detect both the time and location of the refueling behavior but can also locate gas stations and evaluate the oil quality. It can effectively assist environmental protection departments to monitor and investigate abnormal gas stations based on oil quality analysis results. In addition, this method can be achieved with a relatively small calculation effort, which makes it implementable in many different application scenarios.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Shuhui Yi ◽  
Hongxia Zhu ◽  
Junjie Liu ◽  
Junnan Li

Nonintrusive industrial load identification can accurately acquire the operation data of each load in the plant, which is the benefit of intelligent power management. The identification method of the industrial load is complicated and difficult to be realized due to the difficulty in collecting transient data for modeling, and high-precision measuring equipment is required. Aiming at this situation, the article proposes a nonintrusive industrial load identification method using a random forest algorithm and steady-state waveform. Firstly, by monitoring the change of the industrial load power state, when the load changes and becomes stable, the steady-state waveform is extracted. Due to different electrical characteristics of industrial loads, the current waveform of loads is different to some extent. We can construct characteristic data for each industrial load to construct its own current steady-state waveform. Then, using the high-dimensional data of the steady-state waveform as the sample data, the bootstrap sampling method and the CART algorithm in the random forest algorithm are used to generate multiple decision trees. Finally, the industrial load types are identified by voting multiple decision trees. The actual operating load data of a factory are used as the sample data in the simulation, and the effectiveness and rapidity of the proposed identification algorithm are verified by the combined load method simulation comparison. The simulation results show that the accuracy of the proposed identification algorithm is more than 99%, the identification time is 3.36 s, which is much higher than that of other methods, and the operation time is less than that of other methods. Therefore, the proposed identification algorithm can effectively realize the nonintrusive industrial load identification.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3055
Author(s):  
Aleksey I. Shinkevich ◽  
Irina G. Ershova ◽  
Farida F. Galimulina ◽  
Alla A. Yarlychenko

Globally, assessing sustainable development methodology is kept in sustainable society index (SSI) format, but at the level of meso- and microsystems it remains undeveloped. The aim of the study is to typologize innovative mesosystems in Russian industry in the context of sustainable development based on the CART algorithm and to develop an algorithm for identifying priority areas of sustainable development. The research methods applied included formalization, a systematic approach, and the CART algorithm (calculation of the Gini index, training sample segmentation, the use of a recursive function and regression assessment). As a result of the study, the algorithm for the differentiated identification of innovative mesosystems sustainable development priority directions in industry based on the unique author’s methodology (ISDI) is proposed. The predominance of mesosystems with weak level of sustainable development requiring state support in favor of such mesosystems restructure is revealed. The novelty of the research lies in the development of new science-based solutions to ensure an accelerated transition of industry to the path of sustainable development. The difference of the author’s approach from the provisions known in science is the inclusion of environmental innovations in the mechanism for managing the sustainable development of innovative mesosystems and subsequent accounting in the process of mathematical processing of an array of data, which determines the uniqueness of the constructed decision trees.


Author(s):  
Tzu-Pin Lu ◽  
Chien-Hui Wu ◽  
Chia-Chen Chang ◽  
Han-Ching Chan ◽  
Amrita Chattopadhyay ◽  
...  

Abstract Purpose Pancreatic cancer is one of the most malignant cancers with poor survival. The latest edition of the American Joint Committee on Cancer (AJCC) staging system classifies the majority of operable pancreatic cancer patients as stage-III, while dramatic heterogeneity is observed among these patients. Therefore, subgrouping is required to accurately predict their prognosis and define a treatment plan. This study conducts a cohort study to provide a more precise classification system for stage-III pancreatic cancer patients by utilizing clinical variables. Methods We analyzed survival using log-rank tests, univariate Cox-regression models, and Kaplan-Meier survival curves for stage-III pancreatic ductal adenocarcinoma (PDAC) patients from the Taiwan Cancer Registry (TCR). Patients were further divided into subgroups using classification and regression tree (CART) algorithm. All results were validated using the SEER database. Results Among stage-III PDAC patients, lymph node and tumor grade showed significant association with survival. Patients with N2 stage had higher mortality risks (hazard ratio [HR] = 2.30, 95% confidence interval [CI] 1.71–3.08, p < 0.0001) than N0 patients. Patients with grade 3 also had higher risk of mortality (HR = 3.80, 95% CI 2.25–6.39, p < 0.0001) than grade 1 patients. The CART algorithm stratified stage-III patients into four subgroups with significantly different survival rates. The median survival of the four subgroups was 23.5, 18.4, 14.5, and 9.0 months, respectively (p < 0.0001). Similar results were observed with SEER data. Conclusions Lymph node involvement and tumor grade are predictive factors for survival in stage-III PDAC patients. This new precise classification system can be used to guide treatment planning in advanced-stage pancreatic cancer.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Mina Jahangiri ◽  
Fakher Rahim ◽  
Najmaldin Saki ◽  
Amal Saki Malehi

Objective. Several discriminating techniques have been proposed to discriminate between β-thalassemia trait (βTT) and iron deficiency anemia (IDA). These discrimination techniques are essential clinically, but they are challenging and typically difficult. This study is the first application of the Bayesian tree-based method for differential diagnosis of βTT from IDA. Method. This cross-sectional study included 907 patients with ages over 18 years old and a mean (±SD) age of 25 ± 16.1 with either βTT or IDA. Hematological parameters were measured using a Sysmex KX-21 automated hematology analyzer. Bayesian Logit Treed (BLTREED) and Classification and Regression Trees (CART) were implemented to discriminate βTT from IDA based on the hematological parameters. Results. This study proposes an automatic detection model of beta-thalassemia carriers based on a Bayesian tree-based method. The BLTREED model and CART showed that mean corpuscular volume (MCV) was the main predictor in diagnostic discrimination. According to the test dataset, CART indicated higher sensitivity and negative predictive value than BLTREED for differential diagnosis of βTT from IDA. However, the CART algorithm had a high false-positive rate. Overall, the BLTREED model showed better performance concerning the area under the curve (AUC). Conclusions. The BLTREED model showed excellent diagnostic accuracy for differentiating βTT from IDA. In addition, understanding tree-based methods are easy and do not need statistical experience. Thus, it can help physicians in making the right clinical decision. So, the proposed model could support medical decisions in the differential diagnosis of βTT from IDA to avoid much more expensive, time-consuming laboratory tests, especially in countries with limited recourses or poor health services.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Emily Mena ◽  
Gabriele Bolte ◽  
Christine Holmberg ◽  
Philipp Jaehn ◽  
Sibille Merz ◽  
...  

Abstract Background Daily vegetable intake is considered an important behavioural health resource associated with improved immune function and lower incidence of non-communicable disease. Analyses of population-based data show that being female and having a high educational status is most strongly associated with increased vegetable intake. In contrast, men and individuals with a low educational status seem to be most affected by non-daily vegetable intake (non-DVI). From an intersectionality perspective, health inequalities are seen as a consequence of an unequal balance of power such as persisting gender inequality. Unravelling intersections of socially driven aspects underlying inequalities might be achieved by not relying exclusively on the male/female binary, but by considering different facets of gender roles as well. This study aims to analyse possible interactions of sex/gender or sex/gender related aspects with a variety of different socio-cultural, socio-demographic and socio-economic variables with regard to non-DVI as the health-related outcome. Method Comparative classification tree analyses with classification and regression tree (CART) and conditional inference tree (CIT) as quantitative, non-parametric, exploratory methods for the detection of subgroups with high prevalence of non-DVI were performed. Complete-case analyses (n = 19,512) were based on cross-sectional data from a National Health Telephone Interview Survey conducted in Germany. Results The CART-algorithm constructed overall smaller trees when compared to CIT, but the subgroups detected by CART were also detected by CIT. The most strongly differentiating factor for non-DVI, when not considering any further sex/gender related aspects, was the male/female binary with a non-DVI prevalence of 61.7% in men and 42.7% in women. However, the inclusion of further sex/gender related aspects revealed a more heterogenous distribution of non-DVI across the sample, bringing gendered differences in main earner status and being a blue-collar worker to the foreground. In blue-collar workers who do not live with a partner on whom they can rely on financially, the non-DVI prevalence was 69.6% in men and 57.4% in women respectively. Conclusions Public health monitoring and reporting with an intersectionality-informed and gender-equitable perspective might benefit from an integration of further sex/gender related aspects into quantitative analyses in order to detect population subgroups most affected by non-DVI.


Sign in / Sign up

Export Citation Format

Share Document