scholarly journals Tax Avoidance Detection Based on Machine Learning of Malaysian Government-Linked Companies

Machine learning has been widely used in solving the problem of prediction and classification. It is also beneficial in the problem of tax avoidance detection. This study presents the utilization of machine learning classification approach for detecting tax avoidance of Malaysian government-linked companies (GLCs). There were nine machine learnings algorithms have been used on the real dataset collected from datastream and companies annual reports. The performance of these algorithms have been observed based on different training approaches and different features selection. The findings have revealed that the accuracy of results from each machine learnings were divergent according to the training approaches and features selection.

2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e15649-e15649
Author(s):  
Wei Zhou ◽  
Huan Chen ◽  
Wenbo Han ◽  
Ji He ◽  
Henghui Zhang

e15649 Background: The outcome prediction of hepatocellular carcinoma (HCC) is conventionally determined by evaluating tissue samples obtained during surgical removal of the primary tumor focusing on their clinical and pathologic features. Recently, accumulating evidence suggests that cancer development is comprehensively modulated by the host’s immune system underlying the importance of immunological biomarkers for the prediction of HCC prognosis. However, an integrated predictive algorism incorporating clinical characteristic and immune features still remain to be established. Methods: We obtained respectable stage II HCC specimens, along with adjacent para-tumor tissues from 221 patients who underwent surgical resection at Eastern Hepatobiliary Surgery Hospital, (Shanghai, China) from 2015 through April 2018. Characteristics such as CD8+, CD163+, tumor-infiltrating lymphocytes (TILs) were obtained for further model construction used to predict the status of 3 survival indexes: Overall Survival (OS ,≤ 24 or > 24 month), Progression Free Survival (PFS, ≤ 6 or > 6 month), and Recurrence/Death (RD). Mutual information and coefficient between each feature and the survival indexes were tested to remove low scoring features after data cleaning and standardization. Furthermore, recursive features selection was preformed to obtain the optimal features combination. Finally, supervised learning techniques include either boosting or bagging strategy were used to fit and predict model with a grid-search method optimizing the parameters. Meanwhile, a cross validation procedure with 0.2 proportion of test cohort was randomly carried out for 10 times to evaluate the model. Results: We finally confirmed 15 biomarkers from the 46 candidates as features for the survival status prediction by using a 221 patients cohort. Among them, the top 10 most important biomarkers, included both clinical and immune attributes. The AUC of our model for survival indexes (OS, PFS, RD) was ranged from 0.76 (RD) to 0.8 (PFS), and the accuracy was above 0.85. Conclusions: We describe the integrative analysis of the clinical and immune features which collectively contribute to the survival index of HCC. Machine learning techniques, such as Gradient Boosting and random forest classifier , have a great promise for using in HCC cancer survival prediction.


Author(s):  
Rahayu Abdul Rahman ◽  
Suraya Masrom ◽  
Normah Omar ◽  
Maheran Zakaria

Corporate tax avoidance reduces government revenues which could limit country development plans. Thus, the main objectives of this study is to establish a rigorous and effective model to detect corporate tax avoidance to assist government to prevent such practice. This paper presents the fundamental knowledge on the design and implementation of machine learning model based on five selected algorithms tested on the real dataset of 3,365 Malaysian companies listed on bursa Malaysia from 2005 to 2015. The performance of each machine learning algorithms on the tested dataset has been observed based on two approaches of training. The accuracy score for each algorithm is better with the cross-validation training approach. Additionationally, with the cross-validation training approach, the performances of each machine learning algorithm were tested on different group of features selection namely industry, governance, year and firm characteristics. The findings indicated that the machine learning models present better reliability with industry, governance and firm characteristics features rather than single year determinant mainly with the Random Forest and Logistic Regression algorithms.


Author(s):  
Abdullah Sani Abd Rahman ◽  
◽  
Suraya Masrom ◽  
Rahayu Abdul Rahman ◽  
Roslina Ibrahim

Reseachers have acknowledged that machine learning is useful to be utilized in many different domains of complex real life problem. However, to implement a complete machine learning model involves some technical hurdles such as the steep learning curve, the abundance of the programming skills, the complexities of hyper-parameters, and the lack of user friendly platform to be used for the implementation. This paper provides an insight of a rapid software framework for implementing machine learning. This paper also demonstrates the empirical research results of machine learning classification models from the rapid software framework. Additionally, this paper explains comparisons of results between two platforms of rapid software; the proposed software and Python program. The machine learning model in the two platforms were tested on breast cancer and tax avoidance datasets with Decision Tree algorithm. The results indicated that although the software framework is easier than the programming platform for implementing the machine learning model, the results from the software framework were highly accurate and reliable. Keywords- Software framework, rapid, implementation, machine learning


Sign in / Sign up

Export Citation Format

Share Document