scholarly journals Outlier Detection in Big Data

Author(s):  
Victoria J. Hodge

Outlier detection (or anomaly detection) is a fundamental task in data mining. Outliers are data that deviate from the norm and outlier detection is often compared to “finding a needle in a haystack.” However, the outliers may generate high value if they are found, value in terms of cost savings, improved efficiency, compute time savings, fraud reduction and failure prevention. Detection can identify faults before they escalate with potentially catastrophic consequences. Big Data refers to large, dynamic collections of data. These vast and complex data appear problematic for traditional outlier detection methods to process but, Big Data provides considerable opportunity to uncover new outliers and data relationships. This chapter highlights some of the research issues for outlier detection in Big Data and covers the solutions used and research directions taken along with an analysis of some current outlier detection approaches for Big Data applications.

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Zhongbo Bai ◽  
Xiaomei Bai

With the rapid growth of information technology and sports, analyzing sports information has become an increasingly challenging issue. Sports big data come from the Internet and show a rapid growth trend. Sports big data contain rich information such as athletes, coaches, athletics, and swimming. Nowadays, various sports data can be easily accessed, and amazing data analysis technologies have been developed, which enable us to further explore the value behind these data. In this paper, we first introduce the background of sports big data. Secondly, we review sports big data management such as sports big data acquisition, sports big data labeling, and improvement of existing data. Thirdly, we show sports data analysis methods, including statistical analysis, sports social network analysis, and sports big data analysis service platform. Furthermore, we describe the sports big data applications such as evaluation and prediction. Finally, we investigate representative research issues in sports big data areas, including predicting the athletes’ performance in the knowledge graph, finding a rising star of sports, unified sports big data platform, open sports big data, and privacy protections. This paper should help the researchers obtaining a broader understanding of sports big data and provide some potential research directions.


Author(s):  
Fabrizio Angiulli

Data mining techniques can be grouped in four main categories: clustering, classification, dependency detection, and outlier detection. Clustering is the process of partitioning a set of objects into homogeneous groups, or clusters. Classification is the task of assigning objects to one of several predefined categories. Dependency detection searches for pairs of attribute sets which exhibit some degree of correlation in the data set at hand. The outlier detection task can be defined as follows: “Given a set of data points or objects, find the objects that are considerably dissimilar, exceptional or inconsistent with respect to the remaining data”. These exceptional objects as also referred to as outliers. Most of the early methods for outlier identification have been developed in the field of statistics (Hawkins, 1980; Barnett & Lewis, 1994). Hawkins’ definition of outlier clarifies the approach: “An outlier is an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. Indeed, statistical techniques assume that the given data set has a distribution model. Outliers are those points that satisfy a discordancy test, that is, that are significantly far from what would be their expected position given the hypothesized distribution. Many clustering, classification and dependency detection methods produce outliers as a by-product of their main task. For example, in classification, mislabeled objects are considered outliers and thus they are removed from the training set to improve the accuracy of the resulting classifier, while in clustering, objects that do not strongly belong to any cluster are considered outliers. Nevertheless, it must be said that searching for outliers through techniques specifically designed for tasks different from outlier detection could not be advantageous. As an example, clusters can be distorted by outliers and, thus, the quality of the outliers returned is affected by their presence. Moreover, other than returning a solution of higher quality, outlier detection algorithms can be vastly more efficient than non ad-hoc algorithms. While in many contexts outliers are considered as noise that must be eliminated, as pointed out elsewhere, “one person’s noise could be another person’s signal”, and thus outliers themselves can be of great interest. Outlier mining is used in telecom or credit card frauds to detect the atypical usage of telecom services or credit cards, in intrusion detection for detecting unauthorized accesses, in medical analysis to test abnormal reactions to new medical therapies, in marketing and customer segmentations to identify customers spending much more or much less than average customer, in surveillance systems, in data cleaning, and in many other fields.


Author(s):  
Md Mahbubur Rahim ◽  
Maryam Jabberzadeh ◽  
Nergiz Ilhan

E-procurement systems that have been in place for over a decade have begun incorporating digital tools like big data, cloud computing, internet of things, and data mining. Hence, there exists a rich literature on earlier e-procurement systems and advanced digitally-enabled e-procurement systems. Existing literature on these systems addresses many research issues (e.g., adoption) associated with e-procurement. However, one critical issue that has so far received no rigorous attention is about “unit of analysis,” a methodological concern of importance, for e-procurement research context. Hence, the aim of this chapter is twofold: 1) to discuss how the notion of “unit of analysis” has been conceptualised in the e-procurement literature and 2) to discuss how its use has been justified by e-procurement scholars to address the research issues under investigation. Finally, the chapter provides several interesting findings and outlines future research directions.


2022 ◽  
pp. 1477-1503
Author(s):  
Ali Al Mazari

HIV/AIDS big data analytics evolved as a potential initiative enabling the connection between three major scientific disciplines: (1) the HIV biology emergence and evolution; (2) the clinical and medical complex problems and practices associated with the infections and diseases; and (3) the computational methods for the mining of HIV/AIDS biological, medical, and clinical big data. This chapter provides a review on the computational and data mining perspectives on HIV/AIDS in big data era. The chapter focuses on the research opportunities in this domain, identifies the challenges facing the development of big data analytics in HIV/AIDS domain, and then highlights the future research directions of big data in the healthcare sector.


2021 ◽  
pp. 59-89
Author(s):  
Chandrakanta Mahanty ◽  
Devpriya Panda ◽  
Brojo Kishore Mishra

Author(s):  
Venkat Gudivada ◽  
Amy Apon ◽  
Dhana L. Rao

Special needs of Big Data applications have ushered in several new classes of systems for data storage and retrieval. Each class targets the needs of a category of Big Data application. These systems differ greatly in their data models and system architecture, approaches used for high availability and scalability, query languages and client interfaces provided. This chapter begins with a description of the emergence of Big Data and data management requirements of Big Data applications. Several new classes of database management systems have emerged recently to address the needs of Big Data applications. NoSQL is an umbrella term used to refer to these systems. Next, a taxonomy for NoSQL systems is developed and several NoSQL systems are classified under this taxonomy. Characteristics of representative systems in each class are also discussed. The chapter concludes by indicating the emerging trends of NoSQL systems and research issues.


Author(s):  
Dr. Mohd Zuber

The huge data generate by the Internet of Things (IOT) are measured of high business worth, and data mining algorithms can be applied to IOT to take out hidden information from data. In this paper, we give a methodical way to review data mining in knowledge, technique and application view, together with classification, clustering, association analysis and time series analysis, outlier analysis. And the latest application luggage is also surveyed. As more and more devices connected to IOT, huge volume of data should be analyzed, the latest algorithms should be customized to apply to big data. We reviewed these algorithms and discussed challenges and open research issues. At last a suggested big data mining system is proposed.


2020 ◽  
pp. 70-93
Author(s):  
Nayem Rahman

Data mining techniques are widely used to uncover hidden knowledge that cannot be extracted using conventional information retrieval and data analytics tools or using any manual techniques. Different data mining techniques have evolved over the last two decades and solve a wide variety of business problems. Different techniques have been proposed. Practitioners and researchers in both industry and academia continuously develop and experiment with variety of data mining techniques. This article provides a consolidated list of problems being solved by different data mining techniques. The author presents up to three techniques that can be used to address a particular type of problem. The objective is to assist practitioners and researchers to have a holistic view of data mining techniques, and the problems being solved by them. This article also provides an overview of data mining problems solved in the healthcare industry. The article also highlights as to how big data technologies are leveraged in handling and processing huge amounts of complex data from data mining perspectives.


Author(s):  
Nayem Rahman

Data mining techniques are widely used to uncover hidden knowledge that cannot be extracted using conventional information retrieval and data analytics tools or using any manual techniques. Different data mining techniques have evolved over the last two decades and solve a wide variety of business problems. Different techniques have been proposed. Practitioners and researchers in both industry and academia continuously develop and experiment with variety of data mining techniques. This article provides a consolidated list of problems being solved by different data mining techniques. The author presents up to three techniques that can be used to address a particular type of problem. The objective is to assist practitioners and researchers to have a holistic view of data mining techniques, and the problems being solved by them. This article also provides an overview of data mining problems solved in the healthcare industry. The article also highlights as to how big data technologies are leveraged in handling and processing huge amounts of complex data from data mining perspectives.


Author(s):  
Anand Kumar Pandey ◽  
Rashmi Pandey ◽  
Ashish Tripathi

Big data and Data Mining are co-related to each other and also emphasize the phenomena of extracting and analysis useful data from considerable database. The concept of Big Data analytics plays a very significant role in several fields, such as Data Mining, Education and Training, cloud computing, E-commerce, healthcare and life science, Banking and Agriculture. Big data Analytic is a technique for looking at big set of data to expose hidden patterns. A large amount of data is continuously generated every day using modern information system and technologies. As a result this paper provides a platform to investigate applications of big data at various stages. In future, it come forward to be a required for an analytical assessment of new developments in the big data technology. In addition, it also explores a new and suitable outlook for researchers to expand the solution, based on the literature survey, challenges, new ideas and open research issues.


Sign in / Sign up

Export Citation Format

Share Document