A new local causal learning algorithm applied in learning analytics

2020 ◽  
Vol 38 (1) ◽  
pp. 103-115
Author(s):  
Walisson Ferreira de Carvalho ◽  
Luis Enrique Zárate

PurposeThe paper aims to present a new two stage local causal learning algorithm – HEISA. In the first stage, the algorithm discoveries the subset of features that better explains a target variable. During the second stage, computes the causal effect, using partial correlation, of each feature of the selected subset. Using this new algorithm, the study aims to identify the actions that lead a student succeed or failure in a course.Design/methodology/approachThe paper presents a brief review of main concepts used in this study: Causal Learning and Causal effects. The paper also discusses the results of applying the algorithm in education data set. Data used in this study was extracted from the log of actions of a Learning Management System, Moodle. These actions represent the behavior of 229 engineering students that take Algorithm and Data Structure course offered in a blended model.FindingsThe algorithm proposed in the paper identifies that features with weak relevance to a target may become relevant when computing the direct effect.Research limitations/implicationsThe algorithm needs to be improved to automatically discard attributes that are under a specific threshold of direct effect. Researchers are also encouraged to test the proposed propositions further.Practical implicationsThe algorithm presented in this paper can be used to identify the mostly relevant features given a classification task.Originality/valueThis paper computes the direct effect of a selected subset of features in a target variable to evaluate if a variable in this subset is really a cause of the target or if it is a spurious correlation.

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sandeepkumar Hegde ◽  
Monica R. Mundada

Purpose According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease. Design/methodology/approach A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases. Findings The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6. Originality/value The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.


2019 ◽  
Vol 4 (2) ◽  
pp. 181-201 ◽  
Author(s):  
Mark Lokanan ◽  
Vincent Tran ◽  
Nam Hoai Vuong

Purpose The purpose of this paper is to evaluate the possibility of rating the credit worthiness of a firm’s quarterly financial report using a dynamic anomaly detection method. Design/methodology/approach The study uses a data set containing financial statements from Quarter 1 – 2001 to Quarter 4 – 2016 of 937 Vietnamese listed firms. In sum, 24 fundamental financial indices are chosen as control variables. The study employs the Mahalanobis distance to measure the proximity of each data point from the centroid of the distribution to point out the extent of the anomaly. Findings The finding shows that the model is capable of ranking quarterly financial reports in terms of credit worthiness. The execution of the model on all observations also revealed that most financial statements of Vietnamese listed firms are trustworthy, while almost a quarter of them are highly anomalous and questionable. Research limitations/implications The study faces several limitations, including the availability of genuine accounting data from stock exchanges, the strong assumptions of a simple statistical distribution, the restricted timeframe of financial data and the sensitivity of the thresholds for anomaly levels. Practical implications The study opens an avenue for ordinary users of financial information to process the data and question the validity of the numbers presented by listed firms. Furthermore, if fraud information is available, similar research can be conducted to examine the tendency for companies with anomalous financial reports to commit fraud. Originality/value This is the first paper of its kind that attempts to build an anomaly detection model for Vietnamese listed companies.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiali Zheng ◽  
Han Qiao ◽  
Xiumei Zhu ◽  
Shouyang Wang

Purpose This study aims to explore the role of equity investment in knowledge-driven business model innovation (BMI) in context of open modes according to the evidence from China’s primary market. Design/methodology/approach Based on the database of China’s private market and data set of news clouds, the statistic approach is applied to explore and explain whether equity investment promotes knowledge-driven BMI. Machine learning method is also used to prove and predict the performance of such open innovation. Findings The results of logistic regression show that explanatory variables are significant, providing evidence that knowledge management (KM) promotes BMI through equity investment. By further using back propagation neural network, the classification learning algorithm estimates the possibility of BMI, which can be regarded as a score to quantify the performance of knowledge-driven BMI Research limitations/implications The quality of secondhand big data is not very ideal, and future empirical studies should use first-hand survey data. Practical implications This study provides new insights into the link between KM and BMI by highlighting the important roles of external investments in open modes. Social implications From the perspective of investment, the findings of this study suggest the importance for stakeholders to share knowledge and strategies for entrepreneurs to manage innovation. Originality/value The concepts and indicators related to business models are difficult to quantify currently, while this study provides feasible and practical methods to estimate knowledge-driven BMI with secondhand data from the primary market. The mechanism of knowledge and innovation bridged by the experience from investors is introduced and analyzed.


2016 ◽  
Vol 43 (10) ◽  
pp. 1031-1048 ◽  
Author(s):  
Roberto Zotti ◽  
Nino Speziale ◽  
Cristian Barra

Purpose The purpose of this paper is to investigate the effect of religious involvement on subjective well-being (SWB), specifically taking into account the implication of selection effects explaining religious influence using the British Household Panel Survey data set. Design/methodology/approach In order to measure the level of religious involvement, the authors construct different indices on the base of individual religious belonging, participation and beliefs applying a propensity score matching estimator. Findings The results show that religious active participation plays a relevant role among the different aspects of religiosity; moreover, having a strong religious identity such as, at the same time, belonging to any religion, attending religious services once a week or more and believing that religion makes a great difference in life, has a high causal impact on SWB. The authors’ findings are robust to different aspects of life satisfaction. Originality/value The authors offer an econometric account of the causal impact of different aspects of religiosity finding evidence that the causal effect of religious involvement on SWB is better captured than through typical regression methodologies focussing on the mean effects of the explanatory variables.


2019 ◽  
Vol 5 (2) ◽  
pp. 108-119
Author(s):  
Yeslam Al-Saggaf ◽  
Amanda Davies

Purpose The purpose of this paper is to discuss the design, application and findings of a case study in which the application of a machine learning algorithm is utilised to identify the grievances in Twitter in an Arabian context. Design/methodology/approach To understand the characteristics of the Twitter users who expressed the identified grievances, data mining techniques and social network analysis were utilised. The study extracted a total of 23,363 tweets and these were stored as a data set. The machine learning algorithm applied to this data set was followed by utilising a data mining process to explore the characteristics of the Twitter feed users. The network of the users was mapped and the individual level of interactivity and network density were calculated. Findings The machine learning algorithm revealed 12 themes all of which were underpinned by the coalition of Arab countries blockade of Qatar. The data mining analysis revealed that the tweets could be clustered in three clusters, the main cluster included users with a large number of followers and friends but who did not mention other users in their tweets. The social network analysis revealed that whilst a large proportion of users engaged in direct messages with others, the network ties between them were not registered as strong. Practical implications Borum (2011) notes that invoking grievances is the first step in the radicalisation process. It is hoped that by understanding these grievances, the study will shed light on what radical groups could invoke to win the sympathy of aggrieved people. Originality/value In combination, the machine learning algorithm offered insights into the grievances expressed within the tweets in an Arabian context. The data mining and the social network analyses revealed the characteristics of the Twitter users highlighting identifying and managing early intervention of radicalisation.


2018 ◽  
Vol 119 (9/10) ◽  
pp. 529-544 ◽  
Author(s):  
Ihab Zaqout ◽  
Mones Al-Hanjori

Purpose The face recognition problem has a long history and a significant practical perspective and one of the practical applications of the theory of pattern recognition, to automatically localize the face in the image and, if necessary, identify the person in the face. Interests in the procedures underlying the process of localization and individual’s recognition are quite significant in connection with the variety of their practical application in such areas as security systems, verification, forensic expertise, teleconferences, computer games, etc. This paper aims to recognize facial images efficiently. An averaged-feature based technique is proposed to reduce the dimensions of the multi-expression facial features. The classifier model is generated using a supervised learning algorithm called a back-propagation neural network (BPNN), implemented on a MatLab R2017. The recognition rate and accuracy of the proposed methodology is comparable with other methods such as the principle component analysis and linear discriminant analysis with the same data set. In total, 150 faces subjects are selected from the Olivetti Research Laboratory (ORL) data set, resulting 95.6 and 85 per cent recognition rate and accuracy, respectively, and 165 faces subjects from the Yale data set, resulting 95.5 and 84.4 per cent recognition rate and accuracy, respectively. Design/methodology/approach Averaged-feature based approach (dimension reduction) and BPNN (generate supervised classifier). Findings The recognition rate is 95.6 per cent and recognition accuracy is 85 per cent for the ORL data set, whereas the recognition rate is 95.5 per cent and recognition accuracy is 84.4 per cent for the Yale data set. Originality/value Averaged-feature based method.


2019 ◽  
Vol 11 (4) ◽  
pp. 828-843
Author(s):  
Najib Mozahem

Purpose The purpose of this paper is to investigate the course withdrawal behavior of business and engineering students in a private university. While previous research has studied such behavior, the literature remains sparse and dated. Design/methodology/approach This study uses a negative binomial model in order to model the total number of course withdrawals for 760 students. The data set includes all courses taken by the students, with a total of 25,160 course outcomes. Findings Among the findings of the study are that males withdraw from courses more than females, engineering courses have the highest withdrawal rates, and male engineering students withdraw more than any other group. Originality/value While dropping out of college has received cross-national interest, the same cannot be said of course withdrawal. Most research to date has been conducted in a community college setting or has used a subset of the courses taken by students at universities in the USA. Thus, this is one of the first studies to investigate course withdrawal in another country.


2017 ◽  
Vol 21 (1) ◽  
pp. 86-100 ◽  
Author(s):  
Elena Shakina ◽  
Angel Barajas ◽  
Mariya Molodchik

Purpose The paper aims to explore factors of the low competitiveness of Russian companies assuming that the gap in the endowment of intangible resources is responsible for the gap in competitiveness. Design/methodology/approach The framework of resources-based view is used to examine causality between the resources used and competitiveness measured by economic value added (EVA). Controlling for the most relevant factors, the authors place an emphasis on those intangible resources that are considered in the literature as being the most critical for Russian companies when contending for global competitiveness: productivity, strategic long-term orientation of companies, quality of human capital, innovative behavior of companies, foreign investments and corporate networks. The data set of more than 1,000 Russian companies benchmarked to the data set of more than 1,600 European companies during a period of 10 years: 2004-2013 is analyzed to test the hypothesis put forward. Findings Causal effect of the gap in intangible endowment and competitiveness of Russian companies compared with European rivals is revealed. According to our analysis, gaps in productivity, strategy implementation, qualifications of the board of directors and company location play critical roles in the global competitiveness of Russian companies. Meanwhile, underinvestment in structural resources, such as enterprise resource planning (ERP) systems and other intangible assets, are considered positive factors that reduce gaps in EVA. Originality/value The paper introduces original approach for studying the gap in performance caused by the gap in used resources.


2018 ◽  
Vol 60 (4) ◽  
pp. 998-1008 ◽  
Author(s):  
Andi Kusumawati ◽  
Syamsuddin Syamsuddin

Purpose The purpose of this paper is to investigate the relationships between auditor quality to professional skepticism and between auditor quality and professional skepticism to audit quality. Design/methodology/approach The analysis method to test the causal effect of auditor quality to profesional skepticism and audit quality. The respondent in this research is the auditor in the Audit Board of the Republic of Indonesia in South Sulawesi province using questionnaire. The analysis tool used in this research is partial least square. Findings The auditor quality has direct effect on the professional skepticism. Professional skepticism has direct effect on the audit quality. The auditor quality has no direct effect on audit quality, but auditor quality has indirect effect on audit quality with mediation of professional skepticism. Originality/value This paper shows a research was conducted about professional skepticism public sector in governmental sector for producing the audit quality, especially in South Sulawesi province in Indonesia. This research retests the research result from Aranya and Amernic (1981); Carcello et al. (1992); Behn et al. (1997); Copley (1998); Brown and Raghunandan (1995); Beasley et al. (2001); Chiu (2003); Suraida (2005); Lewinsohn et al. (1997); Novianti (2008); Varelius (2009). The researcher uses model combination (design) method of sequential explanatory based on evidentiary sequence from Creswell 2009, quantitative research (by instrument using questionnaire).


2017 ◽  
Vol 44 (4) ◽  
pp. 491-504
Author(s):  
Jan-Jan Soon

Purpose Even though Europe has recently undergone a difficult time and is recovering from the aftermath of prevalent unemployment, immigrants are still flocking towards Europe and taking up citizenships of their host countries through naturalisation. The purpose of this paper is to look at the how naturalised immigrants fare in terms of income and employment chances, compared to immigrants. Design/methodology/approach Using a fuzzy regression discontinuity design and the 2008 European Values Study integrated data set with a final sample of 4,460 observations, this paper isolates the causal effect of naturalisation on the income and employment chances of immigrants by exploiting exogenous variations generated by the eligibility rules for naturalisation in 41 European countries. Findings Main findings show that the probability of being naturalised increases for eligible immigrants, income and employment chances increase for eligible immigrants, and income and employment chances increase for naturalised immigrants. Research limitations/implications This study has a data limitation, where in using the discontinuity design, there is an unbalanced number of observations to the left and right of the design’s threshold value. Originality/value There are limited studies using causal models or potential outcome frameworks to examine the effect of immigrant naturalisation on labour market outcomes in Europe. This study fills this gap.


Sign in / Sign up

Export Citation Format

Share Document