A Specification Framework for Big Data Initiatives

Author(s):  
Anh D. Ta ◽  
Marcus Tanque ◽  
Montressa Washington

Given the emergence of big data technology and its rising popularity, it is important to ensure that the use of this avant-garde technology directly addresses the enterprise goals which are required to maximize the Return-On-Investment (ROI). This chapter aims to address a specification framework for the process of transforming enterprise data into wisdom or actionable information through the use of big data technology. The framework is based on proven methodologies, which consist of three components: Specify, Design, and Refine. The recommended framework provides a systematic, top-down process to extrapolate big data requirements from high-level technical and enterprise goals. The framework also provides a process for managing the quality and relationship between raw data sources and big data products.

Web Services ◽  
2019 ◽  
pp. 639-656
Author(s):  
Anh D. Ta ◽  
Marcus Tanque ◽  
Montressa Washington

Given the emergence of big data technology and its rising popularity, it is important to ensure that the use of this avant-garde technology directly addresses the enterprise goals which are required to maximize the Return-On-Investment (ROI). This chapter aims to address a specification framework for the process of transforming enterprise data into wisdom or actionable information through the use of big data technology. The framework is based on proven methodologies, which consist of three components: Specify, Design, and Refine. The recommended framework provides a systematic, top-down process to extrapolate big data requirements from high-level technical and enterprise goals. The framework also provides a process for managing the quality and relationship between raw data sources and big data products.


Author(s):  
Miguel Figueres Esteban

New technology brings ever more data to support decision-making for intelligent transport systems. Big Data is no longer a futuristic challenge, it is happening right now: modern railway systems have countless sources of data providing a massive quantity of diverse information on every aspect of operations such as train position and speed, brake applications, passenger numbers, status of the signaling system or reported incidents.The traditional approaches to safety management on the railways have relied on static data sources to populate traditional safety tools such as bow-tie models and fault trees. The Big Data Risk Analysis (BDRA) program for Railways at the University of Huddersfield is investigating how the many Big Data sources from the railway can be combined in a meaningful way to provide a better understanding about the GB railway systems and the environment within which they operate.Moving to BDRA is not simply a matter of scaling-up existing analysis techniques. BDRA has to coordinate and combine a wide range of sources with different types of data and accuracy, and that is not straight-forward. BDRA is structured around three components: data, ontology and visualisation. Each of these components is critical to support the overall framework. This paper describes how these three components are used to get safety knowledge from two data sources by means of ontologies from text documents. This is a part of the ongoing BDRA research that is looking at integrating many large and varied data sources to support railway safety and decision-makers.DOI: http://dx.doi.org/10.4995/CIT2016.2016.1825


The previous chapter overviewed big data including its types, sources, analytic techniques, and applications. This chapter briefly discusses the architecture components dealing with the huge volume of data. The complexity of big data types defines a logical architecture with layers and high-level components to obtain a big data solution that includes data sources with the relation to atomic patterns. The dimensions of the approach include volume, variety, velocity, veracity, and governance. The diverse layers of the architecture are big data sources, data massaging and store layer, analysis layer, and consumption layer. Big data sources are data collected from various sources to perform analytics by data scientists. Data can be from internal and external sources. Internal sources comprise transactional data, device sensors, business documents, internal files, etc. External sources can be from social network profiles, geographical data, data stores, etc. Data massage is the process of extracting data by preprocessing like removal of missing values, dimensionality reduction, and noise removal to attain a useful format to be stored. Analysis layer is to provide insight with preferred analytics techniques and tools. The analytics methods, issues to be considered, requirements, and tools are widely mentioned. Consumption layer being the result of business insight can be outsourced to sources like retail marketing, public sector, financial body, and media. Finally, a case study of architectural drivers is applied on a retail industry application and its challenges and usecases are discussed.


2020 ◽  
Vol 15 (2) ◽  
pp. 21-20
Author(s):  
Aldha Shafrielda Sihab ◽  
Anugerah Pagiyan Nurfajar

Big data is an newest trend that embraces the world of technology and business. It's a data collection so large and complex that it no longer allows it to be managed with traditional software tools. One of the world's companies moving in big data technology is Google.Inc. This company maintains and districts data for various purposes, so its presence is urgently needed. As the development of data in Google has been a crucial part of the digital age, resulting in several breakthroughs. First Google data can hold files effectively as well as easily be accessed by people. Both big data can easily call back specifically through Google's learning machine. The study was conducted using qualitative diskirtive methods with secondary data sources through library studies.


2021 ◽  
Vol 14 (2) ◽  
pp. 13-20
Author(s):  
Yaroslav Ivano ◽  
Petr Asalhanov ◽  
Nadezhda Bendik

The article considers the use of Big Data technology for planning food production in conditions of uncertainty. The use of a large amount of diverse information allows us to solve various classes of problems of forecasting and planning the production and sale of food products. The conceptual scheme of using of big data technology by agricultural producers is given on the example of the Irkutsk region and groups of solved extreme problems with examples are considered. Data sources and users are described. The current Big Data platforms are presented.


Author(s):  
José Moura ◽  
Fernando Batista ◽  
Elsa Cardoso ◽  
Luís Nunes

This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources; the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services; and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.


Web Services ◽  
2019 ◽  
pp. 1991-2016
Author(s):  
José Moura ◽  
Fernando Batista ◽  
Elsa Cardoso ◽  
Luís Nunes

This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources; the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services; and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.


Author(s):  
Yu Zhang ◽  
Yan-Ge Wang ◽  
Yan-Ping Bai ◽  
Yong-Zhen Li ◽  
Zhao-Yong Lv ◽  
...  

Author(s):  
Shalin Eliabeth S. ◽  
Sarju S.

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.


Sign in / Sign up

Export Citation Format

Share Document