Corporate distress prediction using random forest and tree net for india

Arvind Shrivastava; Nitin Kumar; Kuldeep Kumar; Sanjeev Gupta

doi:10.26524/jms.2020.1

Corporate distress prediction using random forest and tree net for india

Journal of Management and Science ◽

10.26524/jms.2020.1 ◽

2020 ◽

Vol 10 (1) ◽

pp. 1-11

Author(s):

Arvind Shrivastava ◽

Nitin Kumar ◽

Kuldeep Kumar ◽

Sanjeev Gupta

Keyword(s):

Random Forest ◽

Learning Algorithm ◽

Predictive Performance ◽

Machine Learning Algorithm ◽

Data Set ◽

Data Mining Tool ◽

Distress Prediction ◽

Out Of Sample ◽

Mining Tool ◽

Corporate Distress

The paper deals with the Random Forest, a popular classification machine learning algorithm to predict bankruptcy (distress) for Indian firms. Random Forest orders firms according to their propensity to default or their likelihood to become distressed. This is also useful to explain the association between the tendency of firm failure and its features. The results are analyzed vis-à-vis Tree Net. Both in-sample and out of sample estimations have been performed to compare Random Forest with Tree Net, which is a cutting edge data mining tool known to provide satisfactory estimation results. An exhaustive data set comprising companies from varied sectors have been included in the analysis. It is found that Tree Net procedure provides improved classification and predictive performance vis-à-vis Random Forest methodology consistently that may be utilized further by industry analysts and researchers alike for predictive purposes.

Download Full-text

Application of machine learning algorithm for predicting gestational diabetes mellitus in early pregnancy†

Frontiers of Nursing ◽

10.2478/fon-2021-0022 ◽

2021 ◽

Vol 8 (3) ◽

pp. 209-221

Author(s):

Li-Li Wei ◽

Yue-Shuai Pan ◽

Yan Zhang ◽

Kai Chen ◽

Hao-Yu Wang ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Random Forest Algorithm ◽

Random Forest Regression ◽

Data Set

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.

Download Full-text

Evaluating methods for debris-flow prediction based on rainfall in an Alpine catchment

Natural Hazards and Earth System Science ◽

10.5194/nhess-21-2773-2021 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2773-2789

Author(s):

Jacob Hirschberg ◽

Alexandre Badoux ◽

Brian W. McArdell ◽

Elena Leonarduzzi ◽

Peter Molnar

Keyword(s):

Random Forest ◽

Debris Flow ◽

Debris Flows ◽

Rainfall Intensity ◽

Multiple Scales ◽

Learning Algorithm ◽

Predictive Performance ◽

Early Warning Systems ◽

Antecedent Rainfall ◽

Data Set

Abstract. The prediction of debris flows is relevant because this type of natural hazard can pose a threat to humans and infrastructure. Debris-flow (and landslide) early warning systems often rely on rainfall intensity–duration (ID) thresholds. Multiple competing methods exist for the determination of such ID thresholds but have not been objectively and thoroughly compared at multiple scales, and a validation and uncertainty assessment is often missing in their formulation. As a consequence, updating, interpreting, generalizing and comparing rainfall thresholds is challenging. Using a 17-year record of rainfall and 67 debris flows in a Swiss Alpine catchment (Illgraben), we determined ID thresholds and associated uncertainties as a function of record duration. Furthermore, we compared two methods for rainfall definition based on linear regression and/or true-skill-statistic maximization. The main difference between these approaches and the well-known frequentist method is that non-triggering rainfall events were also considered for obtaining ID-threshold parameters. Depending on the method applied, the ID-threshold parameters and their uncertainties differed significantly. We found that 25 debris flows are sufficient to constrain uncertainties in ID-threshold parameters to ±30 % for our study site. We further demonstrated the change in predictive performance of the two methods if a regional landslide data set with a regional rainfall product was used instead of a local one with local rainfall measurements. Hence, an important finding is that the ideal method for ID-threshold determination depends on the available landslide and rainfall data sets. Furthermore, for the local data set we tested if the ID-threshold performance can be increased by considering other rainfall properties (e.g. antecedent rainfall, maximum intensity) in a multivariate statistical learning algorithm based on decision trees (random forest). The highest predictive power was reached when the peak 30 min rainfall intensity was added to the ID variables, while no improvement was achieved by considering antecedent rainfall for debris-flow predictions in Illgraben. Although the increase in predictive performance with the random forest model over the classical ID threshold was small, such a framework could be valuable for future studies if more predictors are available from measured or modelled data.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Deducing Optimal Classification Algorithm for Heterogeneous Fabric

10.36227/techrxiv.17162147.v2 ◽

2022 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Synthetic Data ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Heterogeneous Rock ◽

Optimal Classification ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneous rock fabric, we identified Random Forest, among others, to be the appropriate algorithm.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147.v1 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

Download Full-text

Land subsidence susceptibility assessment using random forest machine learning algorithm

Environmental Earth Sciences ◽

10.1007/s12665-019-8518-3 ◽

2019 ◽

Vol 78 (16) ◽

Cited By ~ 12

Author(s):

Majid Mohammady ◽

Hamid Reza Pourghasemi ◽

Mojtaba Amiri

Keyword(s):

Machine Learning ◽

Random Forest ◽

Land Subsidence ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Susceptibility Assessment

Download Full-text

A Survey on Major Classification Algorithms and Comparative Analysis of Few Classification Algorithms on Contact Lenses Data Set Using Data Mining Tool

New Trends in Computational Vision and Bio-inspired Computing ◽

10.1007/978-3-030-41862-5_121 ◽

2020 ◽

pp. 1201-1209

Author(s):

Syed Nawaz Pasha ◽

D. Ramesh ◽

Mohammad Sallauddin

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Contact Lenses ◽

Classification Algorithms ◽

Data Set ◽

Data Mining Tool ◽

Mining Tool ◽

Using Data

Download Full-text

Big Data for Health Care Analytics using Extreme Machine Learning Based on Map Reduce

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5808.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2758-2762

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Storage ◽

Clinical Data ◽

Disease Risk ◽

Learning Algorithm ◽

Information Storage ◽

Support Vector ◽

Machine Learning Algorithm ◽

Data Set

A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.

Download Full-text

Detecting Cognitive Distraction Using Random Forest by Considering Eye Movement Type

Intelligent Systems ◽

10.4018/978-1-5225-5643-5.ch069 ◽

2018 ◽

pp. 1587-1599

Author(s):

Hiroaki Koma ◽

Taku Harada ◽

Akira Yoshizawa ◽

Hirotoshi Iwasaki

Keyword(s):

Machine Learning ◽

Eye Movements ◽

Random Forest ◽

Decision Trees ◽

Eye Movement ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Still Images ◽

Cognitive Distraction ◽

Movement Type

Detecting distracted states can be applied to various problems such as danger prevention when driving a car. A cognitive distracted state is one example of a distracted state. It is known that eye movements express cognitive distraction. Eye movements can be classified into several types. In this paper, the authors detect a cognitive distraction using classified eye movement types when applying the Random Forest machine learning algorithm, which uses decision trees. They show the effectiveness of considering eye movement types for detecting cognitive distraction when applying Random Forest. The authors use visual experiments with still images for the detection.

Download Full-text

Business Distress Prediction Using Bayesian Logistic Model for Indian Firms

Risks ◽

10.3390/risks6040113 ◽

2018 ◽

Vol 6 (4) ◽

pp. 113 ◽

Cited By ~ 1

Author(s):

Arvind Shrivastava ◽

Kuldeep Kumar ◽

Nitin Kumar

Keyword(s):

Bayesian Modeling ◽

Logistic Model ◽

Predictive Ability ◽

Emerging Economy ◽

Predictive Capability ◽

Corporate Sector ◽

Distress Prediction ◽

Out Of Sample ◽

Distressed Firms ◽

Corporate Distress

The objective of the study is to perform corporate distress prediction for an emerging economy, such as India, where bankruptcy details of firms are not available. Exhaustive panel dataset extracted from Capital IQ has been employed for the purpose. Foremost, the study contributes by devising novel framework to capture incipient signs of distress for Indian firms by employing a combination of firm specific parameters. The strategy not only enables enlarging the sample of distressed firms but also enables to obtain robust results. The analysis applies both standard Logistic and Bayesian modeling to predict distressed firms in Indian corporate sector. Thereby, a comparison of predictive ability of the two approaches has been carried out. Both in-sample and out of sample evaluation reveal a consistently better predictive capability employing Bayesian methodology. The study provides useful structure to indicate the early signals of failure in Indian corporate sector that is otherwise limited in literature.

Download Full-text