An anomaly detection method to improve the intelligent level of smart articles based on multiple group correlation probability models

Purpose The purpose of this paper is to detect abnormal data of complex and sophisticated industrial equipment with sensors quickly and accurately. Due to the rapid development of the Internet of Things, more and more equipment is equipped with sensors, especially more complex and sophisticated industrial equipment is installed with a large number of sensors. A large amount of monitoring data is quickly collected to monitor the operation of the equipment. How to detect abnormal data quickly and accurately has become a challenge. Design/methodology/approach In this paper, the authors propose an approach called Multiple Group Correlation-based Anomaly Detection (MGCAD), which can detect equipment anomaly quickly and accurately. The single-point anomaly degree of equipment and the correlation of each kind of data sequence are modeled by using multi-group correlation probability model (a probability distribution model which is helpful to the anomaly detection of equipment), and the anomaly detection of equipment is realized. Findings The simulation data set experiments based on real data show that MGCAD has better performance than existing methods in processing multiple monitoring data sequences. Originality/value The MGCAD method can detect abnormal data quickly and accurately, promote the intelligent level of smart articles and ultimately help to project the real world into cyber space in CrowdIntell Network.

Download Full-text

Detecting anomalies in financial statements using machine learning algorithm

Asian Journal of Accounting Research ◽

10.1108/ajar-09-2018-0032 ◽

2019 ◽

Vol 4 (2) ◽

pp. 181-201 ◽

Cited By ~ 3

Author(s):

Mark Lokanan ◽

Vincent Tran ◽

Nam Hoai Vuong

Keyword(s):

Anomaly Detection ◽

Learning Algorithm ◽

Financial Statements ◽

Data Set ◽

Financial Reports ◽

Accounting Data ◽

Content Type ◽

Listed Firms ◽

Detection Model ◽

Credit Worthiness

Purpose The purpose of this paper is to evaluate the possibility of rating the credit worthiness of a firm’s quarterly financial report using a dynamic anomaly detection method. Design/methodology/approach The study uses a data set containing financial statements from Quarter 1 – 2001 to Quarter 4 – 2016 of 937 Vietnamese listed firms. In sum, 24 fundamental financial indices are chosen as control variables. The study employs the Mahalanobis distance to measure the proximity of each data point from the centroid of the distribution to point out the extent of the anomaly. Findings The finding shows that the model is capable of ranking quarterly financial reports in terms of credit worthiness. The execution of the model on all observations also revealed that most financial statements of Vietnamese listed firms are trustworthy, while almost a quarter of them are highly anomalous and questionable. Research limitations/implications The study faces several limitations, including the availability of genuine accounting data from stock exchanges, the strong assumptions of a simple statistical distribution, the restricted timeframe of financial data and the sensitivity of the thresholds for anomaly levels. Practical implications The study opens an avenue for ordinary users of financial information to process the data and question the validity of the numbers presented by listed firms. Furthermore, if fraud information is available, similar research can be conducted to examine the tendency for companies with anomalous financial reports to commit fraud. Originality/value This is the first paper of its kind that attempts to build an anomaly detection model for Vietnamese listed companies.

Download Full-text

Is small still beautiful? A comparative study of rice farm size and productivity in China and India

China Agricultural Economic Review ◽

10.1108/caer-01-2015-0005 ◽

2015 ◽

Vol 7 (3) ◽

pp. 484-509 ◽

Cited By ~ 19

Author(s):

Jianying Wang ◽

Kevin Z. Chen ◽

Sunipa Das Gupta ◽

Zuhui Huang

Keyword(s):

Measurement Error ◽

Rapid Development ◽

Rapid Change ◽

Farm Size ◽

Data Set ◽

Content Type ◽

China And India ◽

Land Rental ◽

Rental Markets ◽

Land Rental Markets

Purpose – The farm size-productivity relationship has long been the subject of debate among development economists. Few studies address this issue for China, and those that do only with outdated data sets poorly representing the current situation after the past decade of rapid change, which includes the rapid development of land rental markets, village labor out-migration and use of farm machines. Meanwhile, many studies have researched this relationship for Indian, which is undergoing similar changes except for the development of active land rental markets. The purpose of this paper is to measure the farm size-productivity relationship under the situations of rapid transformation in China and India. Design/methodology/approach – Based on the data of 325 Jiangxi and 400 Allahabad rice farmers in 2011, the survey covered multiple plots of each household in one/multiple growing season(s). The authors use the production function approach and the yield approach, and control for farmland quality, imperfect factor markets, and farm size measurement error, to identify the farm size-productivity relationship. Findings – The regressions show that land yields increase with plot size both by season and over the year in China. This may be one of the reasons that farm sizes are growing in some areas. In India, however, the inverse farm size-productivity relationship is observed by the study, despite recent changes. Moreover, land yields increase with farm machine use in both China and India. This result contributes to the debate over whether mechanization improves yields or just expands the land frontier. Originality/value – The paper empirically estimates the farm size-productivity relationship under rapid agrarian transformation in both China and India based on a unique data set collected by the authors in a detailed primary survey. The paper considers measurement error in the analysis, which adds values to this type of analysis.

Download Full-text

On verifying the authenticity of e-commercial crawling data by a semi-crosschecking method

International Journal of Web Information Systems ◽

10.1108/ijwis-10-2018-0075 ◽

2019 ◽

Vol 15 (4) ◽

pp. 454-473 ◽

Cited By ~ 3

Author(s):

Tran Khanh Dang ◽

Duc Minh Chau Pham ◽

Duc Dan Ho

Keyword(s):

Anomaly Detection ◽

Design Methodology ◽

Market Research ◽

Rapid Development ◽

Selection Model ◽

Content Type ◽

Detection Techniques ◽

Data Authentication ◽

Novel Approach ◽

Real World Datasets

Purpose Data crawling in e-commerce for market research often come with the risk of poor authenticity due to modification attacks. The purpose of this paper is to propose a novel data authentication model for such systems. Design/methodology/approach The data modification problem requires careful examinations in which the data are re-collected to verify their reliability by overlapping the two datasets. This approach is to use different anomaly detection techniques to determine which data are potential for frauds and to be re-collected. The paper also proposes a data selection model using their weights of importance in addition to anomaly detection. The target is to significantly reduce the amount of data in need of verification, but still guarantee that they achieve their high authenticity. Empirical experiments are conducted with real-world datasets to evaluate the efficiency of the proposed scheme. Findings The authors examine several techniques for detecting anomalies in the data of users and products, which give the accuracy of 80 per cent approximately. The integration with the weight selection model is also proved to be able to detect more than 80 per cent of the existing fraudulent ones while being careful not to accidentally include ones which are not, especially when the proportion of frauds is high. Originality/value With the rapid development of e-commerce fields, fraud detection on their data, as well as in Web crawling systems is new and necessary for research. This paper contributes a novel approach in crawling systems data authentication problem which has not been studied much.

Download Full-text

Withdrawal of overseas subsidiaries from Asia: the case of the Japanese food industry

British Food Journal ◽

10.1108/bfj-08-2016-0357 ◽

2017 ◽

Vol 119 (6) ◽

pp. 1394-1408

Author(s):

Daisuke Takahashi ◽

Tsaiyu Chang

Keyword(s):

Food Industry ◽

Probability Model ◽

Production Network ◽

Food Markets ◽

Data Set ◽

Content Type ◽

International Production ◽

Level Data ◽

Linear Probability ◽

Overseas Subsidiaries

Purpose The purpose of this paper is to analyze the factors that have influenced the withdrawal of Japanese overseas food industry subsidiaries from Asia. Design/methodology/approach The data refer to Asian subsidiaries of Japanese companies engaged in food production activities. The data set covers 545 overseas subsidiaries from 2003 to 2014, and the total number of observations is 3,513. A linear probability model examines the factors influencing the probability of a subsidiary withdrawing. Findings The results show that strong relationships between parent companies and overseas subsidiaries, specifically in terms of personnel and capital, reduce the likelihood of withdrawal. The number of years in business has a positive effect on withdrawal. Additionally, investment aims, such as the establishment of an international production network and acquisition of host country markets, affect the probability of withdrawal. The results are similar for subsidiaries in China and other countries, but there are differences regarding subsidiaries’ histories and investment aims. Originality/value There is limited research on food companies withdrawing from overseas markets. This study bridges the gap in the literature by compiling micro-level data and conducting a quantitative analysis of such withdrawals from overseas markets. The originality of this study is that it shows the effects of investment aims on subsidiary withdrawal, representing various aspects of overseas subsidiaries, and compares the estimation results with the recent trends in food markets in Asia.

Download Full-text

An improved correlation-based anomaly detection approach for condition monitoring data of industrial equipment

2016 IEEE International Conference on Prognostics and Health Management (ICPHM) ◽

10.1109/icphm.2016.7542850 ◽

2016 ◽

Cited By ~ 3

Author(s):

Shisheng Zhong ◽

Hui Luo ◽

Lin Lin ◽

Xuyun Fu

Keyword(s):

Anomaly Detection ◽

Condition Monitoring ◽

Monitoring Data ◽

Industrial Equipment ◽

Detection Approach

Download Full-text

MATVIZ: a semantic query and visualization approach for metallic materials data

International Journal of Web Information Systems ◽

10.1108/ijwis-11-2016-0065 ◽

2017 ◽

Vol 13 (3) ◽

pp. 260-280 ◽

Cited By ~ 3

Author(s):

Xiaoming Zhang ◽

Huilin Chen ◽

Yanqin Ruan ◽

Dongyu Pan ◽

Chongchong Zhao

Keyword(s):

Semantic Web ◽

Materials Science ◽

Rapid Development ◽

Meaningful Work ◽

Implicit Knowledge ◽

Metallic Materials ◽

Materials Informatics ◽

Data Set ◽

Content Type ◽

Semantic Query

Purpose With the rapid development of materials informatics and the Semantic Web, the semantic-driven solution has emerged to improve traditional query technology, which is hard to discover implicit knowledge from materials data. However, it is a nontrivial thing for materials scientists to construct a semantic query, and the query results are usually presented in RDF/XML format which is not convenient for users to understand. This paper aims to propose an approach to construct semantic query and visualize the query results for metallic materials domain. Design/methodology/approach The authors design a query builder to generate SPARQL query statements automatically based on domain ontology and query conditions inputted by users. Moreover, a semantic visualization model is defined based on the materials science tetrahedron to support the visualization of query results in an intuitive, dynamic and interactive way. Findings Based on the Semantic Web technology, the authors design an automatic semantic query builder to help domain experts write the normative semantic query statements quickly and simply, as well as a prototype (named MatViz) is developed to visually show query results, which could help experts discover implicit knowledge from materials data. Moreover, the experiments demonstrate that the proposed system in this paper can rapidly and effectively return visualized query results over the metallic materials data set. Originality/value This paper mainly discusses an approach to support semantic query and visualization of metallic materials data. The implementation of MatViz will be a meaningful work for the research of metal materials data integration.

Download Full-text

Who sells knowledge online? An exploratory study of knowledge celebrities in China

Internet Research ◽

10.1108/intr-07-2020-0378 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Xiaoyu Chen ◽

Alton Y.K. Chua ◽

L.G. Pee

Keyword(s):

Latent Dirichlet Allocation ◽

Sales Performance ◽

Online Identity ◽

Data Set ◽

Content Type ◽

Multiple Group ◽

Identity Signaling ◽

Important Trend ◽

Knowledge Platform ◽

Online Identities

PurposeThis study explores identity signaling used by an emerging class of knowledge celebrities in China – Knowledge Wanghong – who sell knowledge products on online platforms. Because identity signaling may involve constructing unique online identities and controlling over product-related and seller-related characteristics, the purpose of this study is two-fold: (1) to uncover different online identities of knowledge celebrities; and (2) to examine the extent to which the online identity type is associated with their product-related characteristics, seller-related characteristics and sales performance.Design/methodology/approachA unique data set was collected from a Chinese leading pay-for-knowledge platform – Zhihu – which featured the online profiles of tens of thousands of knowledge celebrities. Online identity types were derived from their self-edited content using Latent Dirichlet Allocation (LDA) topic modeling. Thereafter, their product-related characteristics, seller-related characteristics and respective sales performance were analyzed across different identity types using analysis of variance (ANOVA) and multiple-group linear regression.FindingsKnowledge celebrities are clustered into four distinctive online identities: Mentor, Broker, Storyteller and Geek. Product-related characteristics, sell-related characteristics and sales performance varied across four different identities. Additionally, the online identity type moderated the relationships among their product-related characteristics, sell-related characteristics and sales performance.Originality/valueAs emerging-phenomenon-based research, this study extends related literature by using the notion of identity signaling to analyze a peculiar group of online celebrities who are setting an important trend in the pay-for-knowledge model in China.

Download Full-text

Bayesian Updates for an Extreme Value Distribution Model of Bridge Traffic Load Effect Based on SHM Data

Sustainability ◽

10.3390/su13158631 ◽

2021 ◽

Vol 13 (15) ◽

pp. 8631

Author(s):

Xin Gao ◽

Gengxin Duan ◽

Chunguang Lan

Keyword(s):

Probability Model ◽

Extreme Value Distribution ◽

Value Theory ◽

Extreme Value ◽

Value Distribution ◽

Traffic Load ◽

Monitoring Data ◽

Distribution Model ◽

Load Effect ◽

Tail Distribution

As the distribution function of traffic load effect on bridge structures has always been unknown or very complicated, a probability model of extreme traffic load effect during service periods has not yet been perfectly predicted by the traditional extreme value theory. Here, we focus on this problem and introduce a novel method based on the bridge structural health monitoring data. The method was based on the fact that the tails of the probability distribution governed the behavior of extreme values. The generalized Pareto distribution was applied to model the tail distribution of traffic load effect using the peak-over-threshold method, while the filtered Poisson process was used to model the traffic load effect stochastic process. The parameters of the extreme value distribution of traffic load effect during a service period could be determined by theoretical derivation if the parameters of tail distribution were estimated. Moreover, Bayes’ theorem was applied to update the distribution model to reduce the statistical uncertainty. Finally, the rationality of the proposed method was applied to analyze the monitoring data of concrete-filled steel tube arch bridge suspenders. The results proved that the approach was convenient and found that the extreme value distribution type III might be more suitable as the traffic load effect probability model.

Download Full-text

Financial distress determinants among SMEs: empirical evidence from Sweden

Journal of Economic Studies ◽

10.1108/jes-01-2019-0030 ◽

2020 ◽

Vol 47 (3) ◽

pp. 547-560 ◽

Cited By ~ 1

Author(s):

Darush Yazdanfar ◽

Peter Öhman

Keyword(s):

Financial Crisis ◽

Financial Distress ◽

Large Scale ◽

Global Financial Crisis ◽

Binary Logistic Regression ◽

Data Availability ◽

Cross Sectional ◽

Data Set ◽

Content Type ◽

The Global Financial Crisis

PurposeThe purpose of this study is to empirically investigate determinants of financial distress among small and medium-sized enterprises (SMEs) during the global financial crisis and post-crisis periods.Design/methodology/approachSeveral statistical methods, including multiple binary logistic regression, were used to analyse a longitudinal cross-sectional panel data set of 3,865 Swedish SMEs operating in five industries over the 2008–2015 period.FindingsThe results suggest that financial distress is influenced by macroeconomic conditions (i.e. the global financial crisis) and, in particular, by various firm-specific characteristics (i.e. performance, financial leverage and financial distress in previous year). However, firm size and industry affiliation have no significant relationship with financial distress.Research limitationsDue to data availability, this study is limited to a sample of Swedish SMEs in five industries covering eight years. Further research could examine the generalizability of these findings by investigating other firms operating in other industries and other countries.Originality/valueThis study is the first to examine determinants of financial distress among SMEs operating in Sweden using data from a large-scale longitudinal cross-sectional database.

Download Full-text

Light touch, heavy hand: principals and data-use PLCs

Journal of Educational Administration ◽

10.1108/jea-09-2016-0101 ◽

2017 ◽

Vol 55 (4) ◽

pp. 376-389 ◽

Cited By ~ 10

Author(s):

Alice Huguet ◽

Caitlin C. Farrell ◽

Julie A. Marsh

Keyword(s):

Professional Learning ◽

Low Income ◽

Equal Opportunity ◽

Data Use ◽

Opportunity To Learn ◽

School Level ◽

Comparative Case Study ◽

Data Set ◽

Content Type ◽

Principal's Role

Purpose The use of data for instructional improvement is prevalent in today’s educational landscape, yet policies calling for data use may result in significant variation at the school level. The purpose of this paper is to focus on tools and routines as mechanisms of principal influence on data-use professional learning communities (PLCs). Design/methodology/approach Data were collected through a comparative case study of two low-income, low-performing schools in one district. The data set included interview and focus group transcripts, observation field notes and documents, and was iteratively coded. Findings The two principals in the study employed tools and routines differently to influence ways that teachers interacted with data in their PLCs. Teachers who were given leeway to co-construct data-use tools found them to be more beneficial to their work. Findings also suggest that teachers’ data use may benefit from more flexibility in their day-to-day PLC routines. Research limitations/implications Closer examination of how tools are designed and time is spent in data-use PLCs may help the authors further understand the influence of the principal’s role. Originality/value Previous research has demonstrated that data use can improve teacher instruction, yet the varied implementation of data-use PLCs in this district illustrates that not all students have an equal opportunity to learn from teachers who meaningfully engage with data.

Download Full-text