scholarly journals Prediction of Earnings Manipulation on Malaysian Listed Firms: A Comparison between Linear and Tree-based Machine Learning

Author(s):  
Rahayu Abdul Rahman ◽  
◽  
Suraya Masrom ◽  
Nor Balkish Zakaria ◽  
Enny Nurdin

Predicting the earning manipulation is an inseparable part of financial-economic analysis, helping shareholders, investors, creditors and outsiders acquire high quality of firm’s financial information. Thus, the aim of the paper is to compare the earnings manipulation prediction models developed by using two types of machine learning algorithms; linear and tree categories. The linear based machine learning are Logistic Regression and Generalized Linear Model while the tree based are Decision Tree and Random Forest. All of the algorithms were tested on dataset of earnings manipulation among 1874 firm-year observations of firms listed on Bursa Malaysia . The results indicate that the performances of the two kinds of machine learning is not extremely different except with the Decision Tree. Furthermore, the most outperformed algorithm has been presented by the linear based machine learning, which produced the best accuracy in the shortest total time completion. All the models present better ability in detecting the false cases of earnings manipulation rather than the true cases mainly from the tree based machine learning. Keywords-- Earnings Manipulation, Earnings Management, Machine Learning, Malaysia

Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4368 ◽  
Author(s):  
Chun-Wei Chen ◽  
Chun-Chang Li ◽  
Chen-Yu Lin

Energy baseline is an important method for measuring the energy-saving benefits of chiller system, and the benefits can be calculated by comparing prediction models and actual results. Currently, machine learning is often adopted as a prediction model for energy baselines. Common models include regression, ensemble learning, and deep learning models. In this study, we first reviewed several machine learning algorithms, which were used to establish prediction models. Then, the concept of clustering to preprocess chiller data was adopted. Data mining, K-means clustering, and gap statistic were used to successfully identify the critical variables to cluster chiller modes. Applying these key variables effectively enhanced the quality of the chiller data, and combining the clustering results and the machine learning model effectively improved the prediction accuracy of the model and the reliability of the energy baselines.


Author(s):  
Marco Bisogno

<p>Purpose: The aim of the paper is to investigate earnings management practices related to goodwill accounting, focusing on its first recognition as well as its write-offs, due to the impairment test.</p><p>Design/methodology/approach: The study refers to a sample of Italian listed firms and the analysis covers three years, with a total of 591 firm-year observations. The modified Jones’ regression model has been used in estimating discretionary accruals, as a proxy of earnings management practices.</p><p>Findings: A positive relationship between discretionary accruals and yearly changes in goodwill has been proved. Findings also show an incidence of leverage and performance.</p><p>Research limitations/implications: The study focuses on a single context (Italy) and it is essentially based on financial-economic variables.</p><p>Practical implications: Findings of the study could be relevant for standard-setters in future revisions of goodwill accounting.</p><p>Social implication: The study could support investors in evaluating the incidence of first recognition as well as goodwill impairment on the quality of earnings.</p>


Author(s):  
Phung Anh Thu ◽  
Nguyen Vinh Khuong

The investigation was conducted to contribute empirical evidence of the association between going concern and financial reporting quality of listed firms on the Vietnam stock market. Based on data from 279 companies listed on the HNX and HOSE exchanges in Vietnam for the period 2009-2015, the quantitative research. Results found that the relationship between the going concern and financial reporting quality of listed firms. Research results are significant for investors, regulators to the transparency of financial reporting information. Keywords Going concern, financial reporting quality, listed firms References Agrawal, K., & Chatterjee, C. (2015). Earnings management and financial distress: Evidence from India. Global Business Review, 16(5_suppl), 140S-154S.Bergstresser, D., & Philippon, T. (2006). CEO incentives and earnings management. Journal of Financial Economics, 80(3), 511–529.Burgstahler, D., & Dichev, I. (1997). Earnings management to avoid earnings decreases and losses. Journal of Accounting and Economics, 24(1), 99–126.Charitou, A., Lambertides, N., & Trigeorgis, L. (2007a). Earnings behaviour of financially distressed firms: The role of institutional ownership. Abacus, 43(3), 271–296.Chen, Y., Chen, C., & Huang, S. (2010). An appraisal of financially distressed companies’ earnings management: Evidence from listed companies in China. Pacific Accounting Review, 22(1), 22–41Dechow, P., & Dichev, I. (2002). The Quality of Accruals and Earnings: The Role of Accrual Estimation Errors. The Accounting Review, 77, 35-59.DeFond, M., & Jiambalvo, J. (1994). Debt covenant violation and manipulation of accruals. Journal of Accounting and Economics, 17(1), 145–176.DeFond, M.L., & Park, C.W. (1997). Smoothing income in anticipation of future earnings. Journal of Accounting and Economics, 23(2), 115–139.Dichev, I., & Skinner, D. (2004). Large sample evidence on the debt covenant hypothesis. Journal of Accounting Research, 40(4), 1091–1123.Đinh Thị Thu T., Nguyễn Vĩnh K. (2016). Tác động của hành vi điều chỉnh thu nhập đến khả năng hoạt động liên tục trong kế toán: Nghiên cứu thực nghiệm cho các doanh nghiệp niêm yết tại Việt Nam, Tạp chí phát triển khoa học và công nghệ, Quí 3, tr.96-108.Đỗ Thị Vân Trang (2015). Các mô hình đánh giá chất lượng báo cáo tài chính, Tạp chí chứng khoán Việt Nam, 200, tr 18-21.Habib, A., Uddin Bhuiyan, B., & Islam, A. (2013). Financial distress, earnings management and market pricing of accruals during the global financial crisis. Managerial Finance, 39(2), 155-180.Jaggi, B., & Lee, P. (2002). Earnings management response to debt covenant violations and debt restructuring. Journal of Accounting, Auditing & Finance, 17(4), 295–324.Kasznik, R., (1999). On the association between voluntary disclosure and earnings management. Journal of accounting research, 37(1), pp.57-81.Lu, J. (1999). An empirical study of earnings management by loss-making listed Chinese companies. KuaijiYanjiu (Accounting Research), (9), 25–35.McNichols, M.F. and Stubben, S.R., (2008). Does earnings management affect firms’ investment decisions?. The accounting review, 83(6), pp.1571-1603.Selahudin, N.F., Zakaria, N.B., & Sanusi, Z.M. (2014). Remodelling the earnings management with the appear- ance of leverage, financial distress and free cash flow: Malaysia and Thailand evidences. Journal of Applied Sciences, 14(21), 2644–2661.Skinner, D.J., & Sloan, R. (2002). Earnings surprises, growth expectations, and stock returns or don’t let an earnings torpedo sink your portfolio. Review of Accounting Studies, 7(2/3), 289–312.Sweeney, A.P., (1994). Debt-covenant violations and managers' accounting responses. Journal of Accounting & Economics, 17(3): 281-308.Trần Thị Thùy Linh, Mai Hoàng Hạnh (2015). Chất lượng báo cáo tài chính và kỳ hạn nợ ảnh hưởng đến hiệu quả hoạt động của doanh nghiệp Việt Nam, Tạp chí phát triển kinh tế, 10, tr.27-50.Trương Thị Thùy Dương (2017). Nâng cao chất lượng báo cáo tài chính công ty đại chúng, Tạp chí tài chính, 1(3), tr.55-56.Uwuigbe, Ranti, Bernard, (2015). Assessment of the effects of firm’s characteristics on earnings management of listed firms in Nigeria, Asian Economic and Financial Review,5(2):218-228.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Author(s):  
Cheng-Chien Lai ◽  
Wei-Hsin Huang ◽  
Betty Chia-Chen Chang ◽  
Lee-Ching Hwang

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.


2021 ◽  
Vol 11 (15) ◽  
pp. 6728
Author(s):  
Muhammad Asfand Hafeez ◽  
Muhammad Rashid ◽  
Hassan Tariq ◽  
Zain Ul Abideen ◽  
Saud S. Alotaibi ◽  
...  

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 126-127
Author(s):  
Lucas S Lopes ◽  
Christine F Baes ◽  
Dan Tulpan ◽  
Luis Artur Loyola Chardulo ◽  
Otavio Machado Neto ◽  
...  

Abstract The aim of this project is to compare some of the state-of-the-art machine learning algorithms on the classification of steers finished in feedlots based on performance, carcass and meat quality traits. The precise classification of animals allows for fast, real-time decision making in animal food industry, such as culling or retention of herd animals. Beef production presents high variability in its numerous carcass and beef quality traits. Machine learning algorithms and software provide an opportunity to evaluate the interactions between traits to better classify animals. Four different treatment levels of wet distiller’s grain were applied to 97 Angus-Nellore animals and used as features for the classification problem. The C4.5 decision tree, Naïve Bayes (NB), Random Forest (RF) and Multilayer Perceptron (MLP) Artificial Neural Network algorithms were used to predict and classify the animals based on recorded traits measurements, which include initial and final weights, sheer force and meat color. The top performing classifier was the C4.5 decision tree algorithm with a classification accuracy of 96.90%, while the RF, the MLP and NB classifiers had accuracies of 55.67%, 39.17% and 29.89% respectively. We observed that the final decision tree model constructed with C4.5 selected only the dry matter intake (DMI) feature as a differentiator. When DMI was removed, no other feature or combination of features was sufficiently strong to provide good prediction accuracies for any of the classifiers. We plan to investigate in a follow-up study on a significantly larger sample size, the reasons behind DMI being a more relevant parameter than the other measurements.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Jung Eun Huh ◽  
Seunghee Han ◽  
Taeseon Yoon

Abstract Objective In this study we compare the amino acid and codon sequence of SARS-CoV-2, SARS-CoV and MERS-CoV using different statistics programs to understand their characteristics. Specifically, we are interested in how differences in the amino acid and codon sequence can lead to different incubation periods and outbreak periods. Our initial question was to compare SARS-CoV-2 to different viruses in the coronavirus family using BLAST program of NCBI and machine learning algorithms. Results The result of experiments using BLAST, Apriori and Decision Tree has shown that SARS-CoV-2 had high similarity with SARS-CoV while having comparably low similarity with MERS-CoV. We decided to compare the codons of SARS-CoV-2 and MERS-CoV to see the difference. Though the viruses are very alike according to BLAST and Apriori experiments, SVM proved that they can be effectively classified using non-linear kernels. Decision Tree experiment proved several remarkable properties of SARS-CoV-2 amino acid sequence that cannot be found in MERS-CoV amino acid sequence. The consequential purpose of this paper is to minimize the damage on humanity from SARS-CoV-2. Hence, further studies can be focused on the comparison of SARS-CoV-2 virus with other viruses that also can be transmitted during latent periods.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


Sign in / Sign up

Export Citation Format

Share Document