On using MapReduce to scale algorithms for Big Data analytics: a case study

Abstract Introduction Many data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to “Big algorithms” for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution. Case description This paper investigates a case study of a scaling problem of “Big algorithms” for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model. Discussion and evaluation Formal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000. Conclusions The results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based “Big algorithms”.

Download Full-text

Data Lake Ecosystem Workflow

10.21079/11681/40203 ◽

2021 ◽

Author(s):

R. Salter ◽

Quyen Dong ◽

Cody Coleman ◽

Maria Seale ◽

Alicia Ruvinsky ◽

...

Keyword(s):

Big Data ◽

Language Processing ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Lake Ecosystem ◽

Data Governance ◽

Government Organizations ◽

Large Scale Data ◽

Scale Data

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.

Download Full-text

Big data breaches and customer compensation strategies

International Journal of Operations & Production Management ◽

10.1108/ijopm-03-2015-0156 ◽

2017 ◽

Vol 37 (1) ◽

pp. 56-74 ◽

Cited By ~ 7

Author(s):

Thomas Kude ◽

Hartmut Hoehle ◽

Tracy Ann Sykes

Keyword(s):

Big Data ◽

Personality Traits ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Content Type ◽

Customer Data ◽

Data Breaches ◽

Large Scale Data ◽

Scale Data

Purpose Big Data Analytics provides a multitude of opportunities for organizations to improve service operations, but it also increases the threat of external parties gaining unauthorized access to sensitive customer data. With data breaches now a common occurrence, it is becoming increasingly plain that while modern organizations need to put into place measures to try to prevent breaches, they must also put into place processes to deal with a breach once it occurs. Prior research on information technology security and services failures suggests that customer compensation can potentially restore customer sentiment after such data breaches. The paper aims to discuss these issues. Design/methodology/approach In this study, the authors draw on the literature on personality traits and social influence to better understand the antecedents of perceived compensation and the effectiveness of compensation strategies. The authors studied the propositions using data collected in the context of Target’s large-scale data breach that occurred in December 2013 and affected the personal data of more than 70 million customers. In total, the authors collected data from 212 breached customers. Findings The results show that customers’ personality traits and their social environment significantly influences their perceptions of compensation. The authors also found that perceived compensation positively influences service recovery and customer experience. Originality/value The results add to the emerging literature on Big Data Analytics and will help organizations to more effectively manage compensation strategies in large-scale data breaches.

Download Full-text

Explanations as Discourse

Australasian Journal of Information Systems ◽

10.3127/ajis.v24i0.2519 ◽

2020 ◽

Vol 24 ◽

Author(s):

Sadaf Afrashteh ◽

Ida Someh ◽

Michael Davern

Keyword(s):

Decision Making ◽

Big Data ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Customer Engagement ◽

Data Sets ◽

Research Directions ◽

Large Scale Data ◽

Scale Data

Big data analytics uses algorithms for decision-making and targeting of customers. These algorithms process large-scale data sets and create efficiencies in the decision-making process for organizations but are often incomprehensible to customers and inherently opaque in nature. Recent European Union regulations require that organizations communicate meaningful information to customers on the use of algorithms and the reasons behind decisions made about them. In this paper, we explore the use of explanations in big data analytics services. We rely on discourse ethics to argue that explanations can facilitate a balanced communication between organizations and customers, leading to transparency and trust for customers as well as customer engagement and reduced reputation risks for organizations. We conclude the paper by proposing future empirical research directions.

Download Full-text

Assessment of the Awareness of Nigerian Professionals in the Built Environment on the Big Data analytics (BDA) Applications in the Construction Industry.

10.36265/arejoen.2021.010101 ◽

2021 ◽

pp. 1-7

Author(s):

Emmanuel Jesse Amadosi

Keyword(s):

Big Data ◽

Built Environment ◽

Data Analytics ◽

Large Scale ◽

Rapid Development ◽

Big Data Analytics ◽

Strong Relationship ◽

Large Scale Data ◽

Scale Data ◽

Structured Questionnaire

With rapid development in technology, the built industry’s capacity to generate large-scale data is not in doubt. This trend of data upsurge labelled “Big Data” is currently being used to seek intelligent solutions in many industries including construction. As a result of this, the appeal to embrace Big Data Analytics has also gained wide advocacy globally. However, the general knowledge of Nigeria’s built environment professionals on Big Data Analytics is still limited and this gap continues to account for the slow pace of adoption of digital technologies like Big Data Analytics and the value it projects. This study set out to assess the level of awareness and knowledge of professionals within the Nigerian built environment with a view to promoting the adoption of Big Data Analytics for improved productivity. To achieve this aim, a structured questionnaire survey was carried out among a total of 283 professionals drawn from 9 disciplines within the built environment in the Federal Capital Territory, Abuja. The findings revealed that: a) a low knowledge level of Big Data exists among professionals, b) knowledge among professional and the level of Big Data Analytics application have strong relationship c) professional are interested in knowing more about the Big Data concept and how Big Data Analytics can be leveraged upon. The study, therefore recommends an urgent paradigm shift towards digitisation to fully embrace and adopt Big Data Analytics and enjoin stakeholders to promote collaborative schemes among practice-based professionals and the academia in seeking intelligent and smart solutions to construction-related problems.

Download Full-text

Examining the potential of textual big data analytics for public policy decision-making: A case study with driverless cars in Denmark

Transport Policy ◽

10.1016/j.tranpol.2020.05.026 ◽

2020 ◽

Vol 98 ◽

pp. 68-78 ◽

Cited By ~ 2

Author(s):

Aseem Kinra ◽

Samaneh Beheshti-Kashi ◽

Rasmus Buch ◽

Thomas Alexander Sick Nielsen ◽

Francisco Pereira

Keyword(s):

Public Policy ◽

Decision Making ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Policy Decision ◽

Driverless Cars ◽

Policy Decision Making

Download Full-text

Big Data analytics with case study on financial organization

2014 Conference on IT in Business, Industry and Government (CSIBIG) ◽

10.1109/csibig.2014.7056919 ◽

2014 ◽

Author(s):

Anurag Mandloi

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Financial Organization

Download Full-text

Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2019.0100469 ◽

2019 ◽

Vol 10 (4) ◽

Author(s):

Ayaz H Khan ◽

Ali Mustafa ◽

Aneeq Yusuf ◽

Rehanullah Khan

Keyword(s):

Big Data ◽

Deep Learning ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Learning Models

Download Full-text

A big data analytics platform for smart factories in small and medium-sized manufacturing enterprises: An empirical case study of a die casting factory

International Journal of Precision Engineering and Manufacturing ◽

10.1007/s12541-017-0161-x ◽

2017 ◽

Vol 18 (10) ◽

pp. 1353-1361 ◽

Cited By ~ 18

Author(s):

Ju Yeon Lee ◽

Joo Seong Yoon ◽

Bo-Hyun Kim

Keyword(s):

Big Data ◽

Data Analytics ◽

Die Casting ◽

Big Data Analytics ◽

Manufacturing Enterprises ◽

Smart Factories

Download Full-text

A Big Data Analytics-driven Lean Six Sigma framework for enhanced green performance: a case study of chemical company

Production Planning & Control ◽

10.1080/09537287.2021.1964868 ◽

2021 ◽

pp. 1-24

Author(s):

Amine Belhadi ◽

Sachin S. Kamble ◽

Angappa Gunasekaran ◽

Karim Zkik ◽

Dileep Kumar M. ◽

...

Keyword(s):

Big Data ◽

Six Sigma ◽

Data Analytics ◽

Big Data Analytics ◽

Lean Six Sigma ◽

Chemical Company ◽

Green Performance

Download Full-text

Big Data and Cloud Computing

Exploring the Convergence of Big Data and the Internet of Things - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2947-7.ch001 ◽

2018 ◽

pp. 1-28 ◽

Cited By ~ 2

Author(s):

Marcus Tanque ◽

Harry J Foxwell

Keyword(s):

Cloud Computing ◽

Big Data ◽

Supply Chain ◽

Supply Chain Management ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Chain Management ◽

Computing Services ◽

Cloud Computing Services

Big data and cloud computing are transforming information technology. These comparable technologies are the result of dramatic developments in computational power, virtualization, network bandwidth, availability, storage capability, and cyber-physical systems. The crossroads of these two areas, involves the use of cloud computing services and infrastructure, to support large-scale data analytics research, providing relevant solutions or future possibilities for supply chain management. This chapter broadens the current posture of cloud computing and big data, as associate with the supply chain solutions. This chapter focuses on areas of significant technology and scientific advancements, which are likely to enhance supply chain systems. This evaluation emphasizes the security challenges and mega-trends affecting cloud computing and big data analytics pertaining to supply chain management.

Download Full-text