Big Data Architectural Pattern to Ingest Multiple Sources and Standardization to Immune Downstream Applications

Author(s):  
Imran Quadri Syed
2019 ◽  
Vol 10 (4) ◽  
pp. 106
Author(s):  
Bader A. Alyoubi

Big Data is gaining rapid popularity in e-commerce sector across the globe. There is a general consensus among experts that Saudi organisations are late in adopting new technologies. It is generally believed that the lack of research in latest technologies that are specific to Saudi Arabia that is culturally, socially, and economically different from the West, is one of the key factors for the delay in technology adoption in Saudi Arabia. Hence, to fill this gap to a certain extent and create awareness about Big Data technology, the primary goal of this research was to identify the impact of Big Data on e-commerce organisations in Saudi Arabia. Internet has changed the business environment of Saudi Arabia too. E-commerce is set for achieving new heights due to latest technological advancements. A qualitative research approach was used by conducting interviews with highly experienced professional to gather primary data. Using multiple sources of evidence, this research found out that traditional databases are not capable of handling massive data. Big Data is a promising technology that can be adopted by e-commerce companies in Saudi Arabia. Big Data’s predictive analytics will certainly help e-commerce companies to gain better insight of the consumer behaviour and thus offer customised products and services. The key finding of this research is that Big Data has a significant impact in e-commerce organisations in Saudi Arabia on various verticals like customer retention, inventory management, product customisation, and fraud detection.


Author(s):  
Ying Wang ◽  
Yiding Liu ◽  
Minna Xia

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


2021 ◽  
Author(s):  
Roberto Carlos Fuenmayor

Abstract The concept of digital transformation is based on two principles: data driven—exploiting every bit of data source—and user focused. The objective is not only to consolidate data from multiple systems, but to apply an analytics approach to extract insights that are the product of the aggregation of multiple sources then present it to the user (field manager, production and surveillance engineer, region manager, and country) with criteria's of simplicity, specificity, novelty—and most importantly, clarity. The idea is to liberate the data across the whole upstream community and intended for production operations people by providing a one-stop production digital platform that taps into unstructured data and is transformed into structured to be used as input to engineering models and as a result provide data analytics and generate insights. There is three main key objectives: To have only one source of truth using cloud-based technology To incorporate artificial intelligence models to fill the data gaps of production and operations parameters such as pressure and temperature To incorporate multiple solutions for the upstream community that helps during the slow, medium, and fast loops of upstream operations. The new "way of working" helps multiple disciplines such as subsurface team, facilities, and operations, HSSE and business planning, combining business process management and technical workflows to generates insights and create value that impact the profit and losses (P&L) sheet of the operators. The "new ways of working" tackle values pillars such as production optimization, reduced unplanned deferment, cost avoidance, and improved process cycle efficiency. The use of big data and artificial intelligence algorithms are key to understand the production of the wells and fields, as well as anchoring on processing the data with automated engineering models, thus enabling better decision making including the span of time scale such as fast, medium, or slow loop actions.


Author(s):  
Monica Nehemia ◽  
Tandokazi Zondani

Big data has gained popularity in recent years, with increased interest from both public and private organisations including academics. The automation of business processes led to the proliferation of different types of data at various speeds through information systems. Big data is generated at a high rate from multiple sources that can become complex to manage with challenges to collect, manipulate, and store data with traditional IS/IT. Big data has been associated with technical non-technical challenges. Due to these challenges, organisations deploy enterprise architecture as an approach to holistically manage and mitigate challenges associated with business and technology. An exploratory study was done to determine how EA could be used to manage big data in healthcare facilities. This study employs the interpretive approach with documentation as the analysis. Findings were governance, internal and external big data sources, information technology infrastructure development, and big data skills. Through the different EA domains, big data challenges could be mitigated.


Data Mining ◽  
2013 ◽  
pp. 2117-2131
Author(s):  
May Yuan ◽  
James Bothwell

The so-called Big Data Challenge poses not only issues with massive volumes of data, but issues with the continuing data streams from multiple sources that monitor environmental processes or record social activities. Many statistics tools and data mining methods have been developed to reveal embedded patterns in large data sets. While patterns are critical to data analysis, deep insights will remain buried unless we develop means to associate spatiotemporal patterns to the dynamics of spatial processes that essentially drive the formation of patterns in the data. This chapter reviews the literature with the conceptual foundation for space-time analytics dealing with spatial processes, discusses the types of dynamics that have and have not been addressed in the literature, and identifies needs for new thinking that can systematically advance space-time analytics to reveal dynamics of spatial processes. The discussion is facilitated by an example to highlight potential means of space-time analytics in response to the Big Data Challenge. The example shows the development of new space-time concepts and tools to analyze data from two common General Circulation Models for climate change predictions. Common approaches compare temperature changes at locations from the NCAR CCSM3 and from the CNRM CM3 or animate time series of temperature layers to visualize the climate prediction. Instead, new space-time analytics methods are shown here the ability to decipher the differences in spatial dynamics of the predicted temperature change in the model outputs and apply the concepts of change and movement to reveal warming, cooling, convergence, and divergence in temperature change across the globe.


2020 ◽  
Vol 54 (4) ◽  
pp. 409-435
Author(s):  
Paolo Manghi ◽  
Claudio Atzori ◽  
Michele De Bonis ◽  
Alessia Bardi

PurposeSeveral online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts.Design/methodology/approachThis work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments.FindingsGDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph.Originality/valueTo our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.


2020 ◽  
Vol 39 (10) ◽  
pp. 753-754
Author(s):  
Jiajia Sun ◽  
Daniele Colombo ◽  
Yaoguo Li ◽  
Jeffrey Shragge

Geophysicists seek to extract useful and potentially actionable information about the subsurface by interpreting various types of geophysical data together with prior geologic information. It is well recognized that reliable imaging, characterization, and monitoring of subsurface systems require integration of multiple sources of information from a multitude of geoscientific data sets. With increasing data volumes and computational power, new data types, constant development of inversion algorithms, and the advent of the big data era, Geophysics editors see multiphysics integration as an effective means of meeting some of the challenges arising from imaging subsurface systems with higher resolution and reliability as well as exploring geologically more complicated areas. To advance the field of multiphysics integration and to showcase its added value, Geophysics will introduce a new section “Multiphysics and Joint Inversion” in 2021. Submissions are accepted now.


2017 ◽  
Vol 75 (3) ◽  
pp. 941-952 ◽  
Author(s):  
P F E Addison ◽  
D J Collins ◽  
R Trebilco ◽  
S Howe ◽  
N Bax ◽  
...  

Abstract Sustainable management and conservation of the world’s oceans requires effective monitoring, evaluation, and reporting (MER). Despite the growing political and social imperative for these activities, there are some persistent and emerging challenges that marine practitioners face in undertaking these activities. In 2015, a diverse group of marine practitioners came together to discuss the emerging challenges associated with marine MER, and potential solutions to address these challenges. Three emerging challenges were identified: (i) the need to incorporate environmental, social and economic dimensions in evaluation and reporting; (ii) the implications of big data, creating challenges in data management and interpretation; and (iii) dealing with uncertainty throughout MER activities. We point to key solutions to address these challenges across MER activities: (i) integrating models into marine management systems to help understand, interpret, and manage the environmental and socio-economic dimensions of uncertain and complex marine systems; (ii) utilizing big data sources and new technologies to collect, process, store, and analyze data; and (iii) applying approaches to evaluate, account for, and report on the multiple sources and types of uncertainty. These solutions point towards a potential for a new wave of evidence-based marine management, through more innovative monitoring, rigorous evaluation and transparent reporting. Effective collaboration and institutional support across the science–management–policy interface will be crucial to deal with emerging challenges, and implement the tools and approaches embedded within these solutions.


Sign in / Sign up

Export Citation Format

Share Document