scholarly journals On using MapReduce to scale algorithms for Big Data analytics: a case study

2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Phongphun Kijsanayothin ◽  
Gantaphon Chalumporn ◽  
Rattikorn Hewett

Abstract Introduction Many data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to “Big algorithms” for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution. Case description This paper investigates a case study of a scaling problem of “Big algorithms” for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model. Discussion and evaluation Formal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000. Conclusions The results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based “Big algorithms”.

2021 ◽  
Author(s):  
R. Salter ◽  
Quyen Dong ◽  
Cody Coleman ◽  
Maria Seale ◽  
Alicia Ruvinsky ◽  
...  

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.


2017 ◽  
Vol 37 (1) ◽  
pp. 56-74 ◽  
Author(s):  
Thomas Kude ◽  
Hartmut Hoehle ◽  
Tracy Ann Sykes

Purpose Big Data Analytics provides a multitude of opportunities for organizations to improve service operations, but it also increases the threat of external parties gaining unauthorized access to sensitive customer data. With data breaches now a common occurrence, it is becoming increasingly plain that while modern organizations need to put into place measures to try to prevent breaches, they must also put into place processes to deal with a breach once it occurs. Prior research on information technology security and services failures suggests that customer compensation can potentially restore customer sentiment after such data breaches. The paper aims to discuss these issues. Design/methodology/approach In this study, the authors draw on the literature on personality traits and social influence to better understand the antecedents of perceived compensation and the effectiveness of compensation strategies. The authors studied the propositions using data collected in the context of Target’s large-scale data breach that occurred in December 2013 and affected the personal data of more than 70 million customers. In total, the authors collected data from 212 breached customers. Findings The results show that customers’ personality traits and their social environment significantly influences their perceptions of compensation. The authors also found that perceived compensation positively influences service recovery and customer experience. Originality/value The results add to the emerging literature on Big Data Analytics and will help organizations to more effectively manage compensation strategies in large-scale data breaches.


Author(s):  
Sadaf Afrashteh ◽  
Ida Someh ◽  
Michael Davern

Big data analytics uses algorithms for decision-making and targeting of customers. These algorithms process large-scale data sets and create efficiencies in the decision-making process for organizations but are often incomprehensible to customers and inherently opaque in nature. Recent European Union regulations require that organizations communicate meaningful information to customers on the use of algorithms and the reasons behind decisions made about them. In this paper, we explore the use of explanations in big data analytics services. We rely on discourse ethics to argue that explanations can facilitate a balanced communication between organizations and customers, leading to transparency and trust for customers as well as customer engagement and reduced reputation risks for organizations. We conclude the paper by proposing future empirical research directions.


2021 ◽  
pp. 1-7
Author(s):  
Emmanuel Jesse Amadosi

With rapid development in technology, the built industry’s capacity to generate large-scale data is not in doubt. This trend of data upsurge labelled “Big Data” is currently being used to seek intelligent solutions in many industries including construction. As a result of this, the appeal to embrace Big Data Analytics has also gained wide advocacy globally. However, the general knowledge of Nigeria’s built environment professionals on Big Data Analytics is still limited and this gap continues to account for the slow pace of adoption of digital technologies like Big Data Analytics and the value it projects. This study set out to assess the level of awareness and knowledge of professionals within the Nigerian built environment with a view to promoting the adoption of Big Data Analytics for improved productivity. To achieve this aim, a structured questionnaire survey was carried out among a total of 283 professionals drawn from 9 disciplines within the built environment in the Federal Capital Territory, Abuja. The findings revealed that: a) a low knowledge level of Big Data exists among professionals, b) knowledge among professional and the level of Big Data Analytics application have strong relationship c) professional are interested in knowing more about the Big Data concept and how Big Data Analytics can be leveraged upon. The study, therefore recommends an urgent paradigm shift towards digitisation to fully embrace and adopt Big Data Analytics and enjoin stakeholders to promote collaborative schemes among practice-based professionals and the academia in seeking intelligent and smart solutions to construction-related problems.


2020 ◽  
Vol 98 ◽  
pp. 68-78 ◽  
Author(s):  
Aseem Kinra ◽  
Samaneh Beheshti-Kashi ◽  
Rasmus Buch ◽  
Thomas Alexander Sick Nielsen ◽  
Francisco Pereira

Author(s):  
Amine Belhadi ◽  
Sachin S. Kamble ◽  
Angappa Gunasekaran ◽  
Karim Zkik ◽  
Dileep Kumar M. ◽  
...  

Author(s):  
Marcus Tanque ◽  
Harry J Foxwell

Big data and cloud computing are transforming information technology. These comparable technologies are the result of dramatic developments in computational power, virtualization, network bandwidth, availability, storage capability, and cyber-physical systems. The crossroads of these two areas, involves the use of cloud computing services and infrastructure, to support large-scale data analytics research, providing relevant solutions or future possibilities for supply chain management. This chapter broadens the current posture of cloud computing and big data, as associate with the supply chain solutions. This chapter focuses on areas of significant technology and scientific advancements, which are likely to enhance supply chain systems. This evaluation emphasizes the security challenges and mega-trends affecting cloud computing and big data analytics pertaining to supply chain management.


Sign in / Sign up

Export Citation Format

Share Document