Mining Big Data and Streams

Author(s):  
Hoda Ahmed Abdelhafez

Mining big data is getting a lot of attention currently because businesses need more complex information in order to increase their revenue and gain competitive advantage. Therefore, mining the huge amount of data as well as mining real-time data needs to be done by new data mining techniques/approaches. This chapter will discuss big data volume, variety, and velocity, data mining techniques, and open source tools for handling very large datasets. Moreover, the chapter will focus on two industrial areas telecommunications and healthcare and lessons learned from them.

Author(s):  
Hoda Ahmed Abdelhafez

Mining big data is getting a lot of attention currently because the businesses need more complex information in order to increase their revenue and gain competitive advantage. Therefore, mining the huge amount of data as well as mining real-time data needs to be done by new data mining techniques/approaches. This chapter will discuss big data volume, variety and velocity, data mining techniques and open source tools for handling very large datasets. Moreover, the chapter will focus on two industrial areas telecommunications and healthcare and lessons learned from them.


2014 ◽  
Vol 1 (2) ◽  
pp. 1-17 ◽  
Author(s):  
Hoda Ahmed Abdelhafez

The internet era creates new types of large and real-time data; much of those data are non-standard such as streaming and sensor-generated data. Advanced big data technologies enable organizations to extract insights from sophisticated data. Volume, variety and velocity represent big data challenges, which cause difficulties in capture, storage, search, sharing, analysis and visualization. Therefore, technologies like No-SQL, Hadoop and cloud computing used to extract value from large volumes and a wide variety of data to discover business needs. This article's goal is to focus on the challenges of big data and how the recent technologies can be used to address those issues, which are illustrated through real world case studies. The article also presents the lessons learned from these case studies.


Author(s):  
Shadi Aljawarneh ◽  
Aurea Anguera ◽  
John William Atwood ◽  
Juan A. Lara ◽  
David Lizcano

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.


Author(s):  
Nayem Rahman

Data mining has been gaining attention with the complex business environments, as a rapid increase of data volume and the ubiquitous nature of data in this age of the internet and social media. Organizations are interested in making informed decisions with a complete set of data including structured and unstructured data that originate both internally and externally. Different data mining techniques have evolved over the last two decades. To solve a wide variety of business problems, different data mining techniques are developed. Practitioners and researchers in industry and academia continuously develop and experiment varieties of data mining techniques. This article provides an overview of data mining techniques that are widely used in different fields to discover knowledge and solve business problems. This article provides an update on data mining techniques based on extant literature as of 2018. That might help practitioners and researchers to have a holistic view of data mining techniques.


2020 ◽  
Vol 17 (11) ◽  
pp. 5162-5166
Author(s):  
Puninder Kaur ◽  
Amandeep Kaur ◽  
Rajwinder Kaur

In the IT world, predicting the academic performance of the huge student population poses a big challenge. Educational data mining techniques significantly contribute in providing solution to this problem. There are several prediction methods available for data classification and clustering, to extract information and provide accurate results. In this paper, different prediction methodologies are highlighted for the prediction of real-time data analysis of dynamic academic behavior of the students. The main focus is to provide brief knowledge about all data mining techniques and highlight dissimilarities among various methods in order to provide the best results for the students.


Author(s):  
Kalyani Kadam ◽  
Pooja Vinayak Kamat ◽  
Amita P. Malav

Cardiovascular diseases (CVDs) have turned out to be one of the life-threatening diseases in recent times. The key to effectively managing this is to analyze a huge amount of datasets and effectively mine it to predict and further prevent heart-related diseases. The primary objective of this chapter is to understand and survey various information mining strategies to efficiently determine occurrence of CVDs and also propose a big data architecture for the same. The authors make use of Apache Spark for the implementation.


Author(s):  
Arun Thotapalli Sundararaman

Study of data quality for data mining application has always been a complex topic; in the recent years, this topic has gained further complexity with the advent of big data as the source for data mining and business intelligence (BI) applications. In a big data environment, data is consumed in various states and various forms serving as input for data mining, and this is the main source of added complexity. These new complexities and challenges arise from the underlying dimensions of big data (volume, variety, velocity, and value) together with the ability to consume data at various stages of transition from raw data to standardized datasets. These have created a need for expanding the traditional data quality (DQ) factors into BDQ (big data quality) factors besides the need for new BDQ assessment and measurement frameworks for data mining and BI applications. However, very limited advancement has been made in research and industry in the topic of BDQ and their relevance and criticality for data mining and BI applications. Data quality in data mining refers to the quality of the patterns or results of the models built using mining algorithms. DQ for data mining in business intelligence applications should be aligned with the objectives of the BI application. Objective measures, training/modeling approaches, and subjective measures are three major approaches that exist to measure DQ for data mining. However, there is no agreement yet on definitions or measurements or interpretations of DQ for data mining. Defining the factors of DQ for data mining and their measurement for a BI system has been one of the major challenges for researchers as well as practitioners. This chapter provides an overview of existing research in the area of BDQ definitions and measurement for data mining for BI, analyzes the gaps therein, and provides a direction for future research and practice in this area.


Author(s):  
Sunny Sharma ◽  
Manisha Malhotra

Web usage mining is the use of data mining techniques to analyze user behavior in order to better serve the needs of the user. This process of personalization uses a set of techniques and methods for discovering the linking structure of information on the web. The goal of web personalization is to improve the user experience by mining the meaningful information and presented the retrieved information in a way the user intends. The arrival of big data instigated novel issues to the personalization community. This chapter provides an overview of personalization, big data, and identifies challenges related to web personalization with respect to big data. It also presents some approaches and models to fill the gap between big data and web personalization. Further, this research brings additional opportunities to web personalization from the perspective of big data.


2021 ◽  
pp. 59-89
Author(s):  
Chandrakanta Mahanty ◽  
Devpriya Panda ◽  
Brojo Kishore Mishra

Sign in / Sign up

Export Citation Format

Share Document