Big Data: A Strategic Perspective

2014 ◽  
Vol 08 (03) ◽  
pp. 319-333
Author(s):  
David Alfred Ostrowski

Big Data has become ubiquitous across all areas of research allowing for new applications that were not possible earlier. Unlike software development relying on traditional data sources, Big Data applications present their own unique challenges to appropriately harness the utility of the Apache Hadoop architecture. In this paper, we introduce fundamental concepts of Hadoop and explore its usage as well as future direction. We also present our strategy for exploring the Hadoop architecture including addressing issues of scalability, customization of code and utilization of programming techniques.

Author(s):  
Atis Verdenhofs ◽  
Ineta Geipele ◽  
Tatjana Tambovceva

Technological advancement has led to tremendous increase of data. Many industries utilize big data to become more efficient or even to create new products or services. Applications of big data in construction industry has been extensively researched in Asia that can be explained with huge construction volumes in the area. This study is aimed at identifying big data applications in construction industry in time period beyond 2016. Research object is construction industry, research subject is big data applications. Research methods used in this research are systematic literature overview and meta-analysis. Novelty of the research is classification of big data applications based on systematic literature overview. Authors conclude that existing categorization (Bilal et al., 2016b) can be applied to researches about big data applications in construction industry published in 2016 and later. However, potential for new applications is identified in category of emerging trends triggered by big data and authors propose to perform cross-industry analysis to identify solutions that can be adopted to construction industry.


Author(s):  
Nazia Tazeen ◽  
Sandhya Rani K.

Big Data is a broad area that deals with enormous chunks of data sets. It is a word for enormous data sets having huge volume, more diverse structures of data originating from diverse sources are growing rapidly. Many data being generated because of fast data transmission between devices concerning different sectors like healthcare, science, media, business, entertainment and engineering. Data collection capacity and its storage is big concern. Apache Hadoop software is a store of accessible source programs to store big data and perform analytics and various other operations related to big data. Many organizations base their decisions by extracting knowledge from huge and complex data, because of this prime cause of decision making, Big Data has to be accurately classified and analyzed. In order to overcome the complex challenges encountered by Big Data, various Big Data tools and technologies have developed. Big Data Applications, tools and technologies used to handle it are briefly discussed in this paper.


2021 ◽  
Vol 348 ◽  
pp. 01003
Author(s):  
Abdullayev Vugar Hacimahmud ◽  
Ragimova Nazila Ali ◽  
Khalilov Matlab Etibar

The volume of information in the 21st century is growing at a rapid pace. Big data technologies are used to process modern information. This article discusses the use of big data technologies to implement monitoring of social processes. Big data has its characteristics and principles, which reflect here. In addition, we also discussed big data applications in some areas. Particular attention in this article pays to the interactions of big data and sociology. For this, there consider digital sociology and computational social sciences. One of the main objects of study in sociology is social processes. The article shows the types of social processes and their monitoring. As an example, there is implemented monitoring of social processes at the university. There are used following technologies for the realization of social processes monitoring: products 1010data (1010edge, 1010connect, 1010reveal, 1010equities), products of Apache Software Foundation (Apache Hive, Apache Chukwa, Apache Hadoop, Apache Pig), MapReduce framework, language R, library Pandas, NoSQL, etc. Despite this, this article examines the use of the MapReduce model for social processes monitoring at the university.


2021 ◽  
Author(s):  
Ehsan Ataie ◽  
Athanasia Evangelinou ◽  
Eugenio Gianniti ◽  
Danilo Ardagna

Abstract Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yoseph Mamo ◽  
Yiran Su ◽  
Damon P.S. Andrew

PurposeAs big data (BD) has increasingly become an important tool for managers and researchers to transform sport management practices, the purpose of this research is to highlight diverse data sources and modern analytical techniques that will leverage BD as a means to advance scholarship in sport management.Design/methodology/approachA comprehensive review of existing BD literature in sport management outlines new perspectives on BD research method and the application of BD in sport management.Findings First, through a thorough review of the literature, a domain-specific conceptualization that incorporates the field's mission and priorities was developed. Second, potential data sources and different types of analytical opportunities was identified, highlighting strategies for developing methodological approaches that leads to novel research questions. BD analytics can allow for more flexibility in improving methodological capability to analyze data and, thus, provide more granular and predictive insights. Finally, this paper concludes with a discussion of BD's impact on three domains of sport management, whereby the organizations yield data-driven decisions.Originality/valueBD has the potential to transform the sport management operations and bridges the research-practice gap. BD research in sport management is instrumental for accumulating new knowledge and/or testing existing theories, either in a deductive fashion or by taking an inductive approach, as the field embarks to advance scholarship.


Author(s):  
José Moura ◽  
Fernando Batista ◽  
Elsa Cardoso ◽  
Luís Nunes

This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources; the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services; and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.


Web Services ◽  
2019 ◽  
pp. 1991-2016
Author(s):  
José Moura ◽  
Fernando Batista ◽  
Elsa Cardoso ◽  
Luís Nunes

This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources; the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services; and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.


2020 ◽  
Author(s):  
Bankole Olatosi ◽  
Jiajia Zhang ◽  
Sharon Weissman ◽  
Zhenlong Li ◽  
Jianjun Hu ◽  
...  

BACKGROUND The Coronavirus Disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus (SARS-CoV-2) remains a serious global pandemic. Currently, all age groups are at risk for infection but the elderly and persons with underlying health conditions are at higher risk of severe complications. In the United States (US), the pandemic curve is rapidly changing with over 6,786,352 cases and 199,024 deaths reported. South Carolina (SC) as of 9/21/2020 reported 138,624 cases and 3,212 deaths across the state. OBJECTIVE The growing availability of COVID-19 data provides a basis for deploying Big Data science to leverage multitudinal and multimodal data sources for incremental learning. Doing this requires the acquisition and collation of multiple data sources at the individual and county level. METHODS The population for the comprehensive database comes from statewide COVID-19 testing surveillance data (March 2020- till present) for all SC COVID-19 patients (N≈140,000). This project will 1) connect multiple partner data sources for prediction and intelligence gathering, 2) build a REDCap database that links de-identified multitudinal and multimodal data sources useful for machine learning and deep learning algorithms to enable further studies. Additional data will include hospital based COVID-19 patient registries, Health Sciences South Carolina (HSSC) data, data from the office of Revenue and Fiscal Affairs (RFA), and Area Health Resource Files (AHRF). RESULTS The project was funded as of June 2020 by the National Institutes for Health. CONCLUSIONS The development of such a linked and integrated database will allow for the identification of important predictors of short- and long-term clinical outcomes for SC COVID-19 patients using data science.


Sign in / Sign up

Export Citation Format

Share Document