Outlier Detection in Big Data

With the rapid growth of information technology and sports, analyzing sports information has become an increasingly challenging issue. Sports big data come from the Internet and show a rapid growth trend. Sports big data contain rich information such as athletes, coaches, athletics, and swimming. Nowadays, various sports data can be easily accessed, and amazing data analysis technologies have been developed, which enable us to further explore the value behind these data. In this paper, we first introduce the background of sports big data. Secondly, we review sports big data management such as sports big data acquisition, sports big data labeling, and improvement of existing data. Thirdly, we show sports data analysis methods, including statistical analysis, sports social network analysis, and sports big data analysis service platform. Furthermore, we describe the sports big data applications such as evaluation and prediction. Finally, we investigate representative research issues in sports big data areas, including predicting the athletes’ performance in the knowledge graph, finding a rising star of sports, unified sports big data platform, open sports big data, and privacy protections. This paper should help the researchers obtaining a broader understanding of sports big data and provide some potential research directions.

Download Full-text

Outlier Detection Techniques for Data Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch228 ◽

2011 ◽

pp. 1483-1488

Author(s):

Fabrizio Angiulli

Keyword(s):

Data Mining ◽

Outlier Detection ◽

Credit Card ◽

Detection Methods ◽

Distribution Model ◽

Main Task ◽

Data Set ◽

Homogeneous Groups ◽

Definition Of ◽

Dependency Detection

Data mining techniques can be grouped in four main categories: clustering, classification, dependency detection, and outlier detection. Clustering is the process of partitioning a set of objects into homogeneous groups, or clusters. Classification is the task of assigning objects to one of several predefined categories. Dependency detection searches for pairs of attribute sets which exhibit some degree of correlation in the data set at hand. The outlier detection task can be defined as follows: “Given a set of data points or objects, find the objects that are considerably dissimilar, exceptional or inconsistent with respect to the remaining data”. These exceptional objects as also referred to as outliers. Most of the early methods for outlier identification have been developed in the field of statistics (Hawkins, 1980; Barnett & Lewis, 1994). Hawkins’ definition of outlier clarifies the approach: “An outlier is an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. Indeed, statistical techniques assume that the given data set has a distribution model. Outliers are those points that satisfy a discordancy test, that is, that are significantly far from what would be their expected position given the hypothesized distribution. Many clustering, classification and dependency detection methods produce outliers as a by-product of their main task. For example, in classification, mislabeled objects are considered outliers and thus they are removed from the training set to improve the accuracy of the resulting classifier, while in clustering, objects that do not strongly belong to any cluster are considered outliers. Nevertheless, it must be said that searching for outliers through techniques specifically designed for tasks different from outlier detection could not be advantageous. As an example, clusters can be distorted by outliers and, thus, the quality of the outliers returned is affected by their presence. Moreover, other than returning a solution of higher quality, outlier detection algorithms can be vastly more efficient than non ad-hoc algorithms. While in many contexts outliers are considered as noise that must be eliminated, as pointed out elsewhere, “one person’s noise could be another person’s signal”, and thus outliers themselves can be of great interest. Outlier mining is used in telecom or credit card frauds to detect the atypical usage of telecom services or credit cards, in intrusion detection for detecting unauthorized accesses, in medical analysis to test abnormal reactions to new medical therapies, in marketing and customer segmentations to identify customers spending much more or much less than average customer, in surveillance systems, in data cleaning, and in many other fields.

Download Full-text

Unit of Analysis in Digitally-Enabled Electronic Procurement Research

Digital Innovations for Customer Engagement, Management, and Organizational Improvement - Advances in Business Strategy and Competitive Advantage ◽

10.4018/978-1-7998-5171-4.ch005 ◽

2020 ◽

pp. 83-103

Author(s):

Md Mahbubur Rahim ◽

Maryam Jabberzadeh ◽

Nergiz Ilhan

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Critical Issue ◽

Future Research ◽

Research Directions ◽

Research Issues ◽

Unit Of Analysis ◽

Electronic Procurement ◽

Future Research Directions ◽

Methodological Concern

E-procurement systems that have been in place for over a decade have begun incorporating digital tools like big data, cloud computing, internet of things, and data mining. Hence, there exists a rich literature on earlier e-procurement systems and advanced digitally-enabled e-procurement systems. Existing literature on these systems addresses many research issues (e.g., adoption) associated with e-procurement. However, one critical issue that has so far received no rigorous attention is about “unit of analysis,” a methodological concern of importance, for e-procurement research context. Hence, the aim of this chapter is twofold: 1) to discuss how the notion of “unit of analysis” has been conceptualised in the e-procurement literature and 2) to discuss how its use has been justified by e-procurement scholars to address the research issues under investigation. Finally, the chapter provides several interesting findings and outlines future research directions.

Download Full-text

Computational and Data Mining Perspectives on HIV/AIDS in Big Data Era

10.4018/978-1-6684-3662-2.ch072 ◽

2022 ◽

pp. 1477-1503

Author(s):

Ali Al Mazari

Keyword(s):

Data Mining ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Healthcare Sector ◽

Future Research ◽

Research Directions ◽

Scientific Disciplines ◽

Future Research Directions ◽

Hiv Aids

HIV/AIDS big data analytics evolved as a potential initiative enabling the connection between three major scientific disciplines: (1) the HIV biology emergence and evolution; (2) the clinical and medical complex problems and practices associated with the infections and diseases; and (3) the computational methods for the mining of HIV/AIDS biological, medical, and clinical big data. This chapter provides a review on the computational and data mining perspectives on HIV/AIDS in big data era. The chapter focuses on the research opportunities in this domain, identifies the challenges facing the development of big data analytics in HIV/AIDS domain, and then highlights the future research directions of big data in the healthcare sector.

Download Full-text

A Review of Different Data Mining Techniques Used in Big Data Applications

Handbook of Research for Big Data ◽

10.1201/9781003144526-3 ◽

2021 ◽

pp. 59-89

Author(s):

Chandrakanta Mahanty ◽

Devpriya Panda ◽

Brojo Kishore Mishra

Keyword(s):

Data Mining ◽

Big Data ◽

Data Mining Techniques ◽

Big Data Applications

Download Full-text

Database Systems for Big Data Storage and Retrieval

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch003 ◽

2018 ◽

pp. 76-100 ◽

Cited By ~ 2

Author(s):

Venkat Gudivada ◽

Amy Apon ◽

Dhana L. Rao

Keyword(s):

Big Data ◽

Data Storage ◽

Database Systems ◽

Query Languages ◽

Research Issues ◽

Storage And Retrieval ◽

Emerging Trends ◽

Big Data Applications ◽

Data Application ◽

Big Data Application

Special needs of Big Data applications have ushered in several new classes of systems for data storage and retrieval. Each class targets the needs of a category of Big Data application. These systems differ greatly in their data models and system architecture, approaches used for high availability and scalability, query languages and client interfaces provided. This chapter begins with a description of the emergence of Big Data and data management requirements of Big Data applications. Several new classes of database management systems have emerged recently to address the needs of Big Data applications. NoSQL is an umbrella term used to refer to these systems. Next, a taxonomy for NoSQL systems is developed and several NoSQL systems are classified under this taxonomy. Characteristics of representative systems in each class are also discussed. The chapter concludes by indicating the emerging trends of NoSQL systems and research issues.

Download Full-text

Data Mining for the Internet of Things: Literature Review and Challenges

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37122 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 3351-3362

Author(s):

Dr. Mohd Zuber

Keyword(s):

Data Mining ◽

Big Data ◽

Internet Of Things ◽

The Internet ◽

Mining System ◽

Research Issues ◽

Data Mining Algorithms ◽

Huge Data ◽

Mining Algorithms ◽

The Internet Of Things

The huge data generate by the Internet of Things (IOT) are measured of high business worth, and data mining algorithms can be applied to IOT to take out hidden information from data. In this paper, we give a methodical way to review data mining in knowledge, technique and application view, together with classification, clustering, association analysis and time series analysis, outlier analysis. And the latest application luggage is also surveyed. As more and more devices connected to IOT, huge volume of data should be analyzed, the latest algorithms should be customized to apply to big data. We reviewed these algorithms and discussed challenges and open research issues. At last a suggested big data mining system is proposed.

Download Full-text

Data Mining Problems Classification and Techniques

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch006 ◽

2020 ◽

pp. 70-93

Author(s):

Nayem Rahman

Keyword(s):

Data Mining ◽

Big Data ◽

Information Retrieval ◽

Data Analytics ◽

Complex Data ◽

Healthcare Industry ◽

Holistic View ◽

Data Mining Techniques ◽

Big Data Technologies ◽

Hidden Knowledge

Data mining techniques are widely used to uncover hidden knowledge that cannot be extracted using conventional information retrieval and data analytics tools or using any manual techniques. Different data mining techniques have evolved over the last two decades and solve a wide variety of business problems. Different techniques have been proposed. Practitioners and researchers in both industry and academia continuously develop and experiment with variety of data mining techniques. This article provides a consolidated list of problems being solved by different data mining techniques. The author presents up to three techniques that can be used to address a particular type of problem. The objective is to assist practitioners and researchers to have a holistic view of data mining techniques, and the problems being solved by them. This article also provides an overview of data mining problems solved in the healthcare industry. The article also highlights as to how big data technologies are leveraged in handling and processing huge amounts of complex data from data mining perspectives.

Download Full-text

Data Mining Problems Classification and Techniques

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.2018010104 ◽

2018 ◽

Vol 3 (1) ◽

pp. 38-57 ◽

Cited By ~ 1

Author(s):

Nayem Rahman

Keyword(s):

Data Mining ◽

Big Data ◽

Information Retrieval ◽

Data Analytics ◽

Complex Data ◽

Healthcare Industry ◽

Holistic View ◽

Data Mining Techniques ◽

Big Data Technologies ◽

Hidden Knowledge

Data mining techniques are widely used to uncover hidden knowledge that cannot be extracted using conventional information retrieval and data analytics tools or using any manual techniques. Different data mining techniques have evolved over the last two decades and solve a wide variety of business problems. Different techniques have been proposed. Practitioners and researchers in both industry and academia continuously develop and experiment with variety of data mining techniques. This article provides a consolidated list of problems being solved by different data mining techniques. The author presents up to three techniques that can be used to address a particular type of problem. The objective is to assist practitioners and researchers to have a holistic view of data mining techniques, and the problems being solved by them. This article also provides an overview of data mining problems solved in the healthcare industry. The article also highlights as to how big data technologies are leveraged in handling and processing huge amounts of complex data from data mining perspectives.

Download Full-text

Underpinnings of Big Data Analytics and Its Applications

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206222 ◽

2020 ◽

pp. 75-81

Author(s):

Anand Kumar Pandey ◽

Rashmi Pandey ◽

Ashish Tripathi

Keyword(s):

Data Mining ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Research Issues ◽

New Developments ◽

New Ideas ◽

Analytical Assessment ◽

Data Analytic ◽

Big Data Technology

Big data and Data Mining are co-related to each other and also emphasize the phenomena of extracting and analysis useful data from considerable database. The concept of Big Data analytics plays a very significant role in several fields, such as Data Mining, Education and Training, cloud computing, E-commerce, healthcare and life science, Banking and Agriculture. Big data Analytic is a technique for looking at big set of data to expose hidden patterns. A large amount of data is continuously generated every day using modern information system and technologies. As a result this paper provides a platform to investigate applications of big data at various stages. In future, it come forward to be a required for an analytical assessment of new developments in the big data technology. In addition, it also explores a new and suitable outlook for researchers to expand the solution, based on the literature survey, challenges, new ideas and open research issues.

Download Full-text