Big Data Technologies and Management

2022 ◽  
pp. 1734-1744
Author(s):  
Jayashree K. ◽  
Abirami R.

Developments in information technology and its prevalent growth in several areas of business, engineering, medical, and scientific studies are resulting in information as well as data explosion. Knowledge discovery and decision making from such rapidly growing voluminous data are a challenging task in terms of data organization and processing, which is an emerging trend known as big data computing. Big data has gained much attention from the academia and the IT industry. A new paradigm that combines large-scale compute, new data-intensive techniques, and mathematical models to build data analytics. Thus, this chapter discusses the background of big data. It also discusses the various application of big data in detail. The various related work and the future direction would be addressed in this chapter.

Author(s):  
Jayashree K. ◽  
Abirami R.

Developments in information technology and its prevalent growth in several areas of business, engineering, medical, and scientific studies are resulting in information as well as data explosion. Knowledge discovery and decision making from such rapidly growing voluminous data are a challenging task in terms of data organization and processing, which is an emerging trend known as big data computing. Big data has gained much attention from the academia and the IT industry. A new paradigm that combines large-scale compute, new data-intensive techniques, and mathematical models to build data analytics. Thus, this chapter discusses the background of big data. It also discusses the various application of big data in detail. The various related work and the future direction would be addressed in this chapter.


Author(s):  
Sathishkumar S. ◽  
Devi Priya R. ◽  
Karthika K.

Big data computing in clouds is a new paradigm for next-generation analytics development. It enables large-scale data organizations to share and explore large quantities of ever-increasing data types using cloud computing technology as a back-end. Knowledge exploration and decision-making from this rapidly increasing volume of data encourage data organization, access, and timely processing, an evolving trend known as big data computing. This modern paradigm incorporates large-scale computing, new data-intensive techniques, and mathematical models to create data analytics for intrinsic information extraction. Cloud computing emerged as a service-oriented computing model to deliver infrastructure, platform, and applications as services from the providers to the consumers meeting the QoS parameters by enabling the archival and processing of large volumes of rapidly growing data faster economy models.


Author(s):  
Ewa Niewiadomska-Szynkiewicz ◽  
Michał P. Karpowicz

Progress in life, physical sciences and technology depends on efficient data-mining and modern computing technologies. The rapid growth of data-intensive domains requires a continuous development of new solutions for network infrastructure, servers and storage in order to address Big Datarelated problems. Development of software frameworks, include smart calculation, communication management, data decomposition and allocation algorithms is clearly one of the major technological challenges we are faced with. Reduction in energy consumption is another challenge arising in connection with the development of efficient HPC infrastructures. This paper addresses the vital problem of energy-efficient high performance distributed and parallel computing. An overview of recent technologies for Big Data processing is presented. The attention is focused on the most popular middleware and software platforms. Various energy-saving approaches are presented and discussed as well.


2020 ◽  
Vol 7 (1) ◽  
pp. 205395172093514 ◽  
Author(s):  
Laurence Barry ◽  
Arthur Charpentier

The aim of this article is to assess the impact of Big Data technologies for insurance ratemaking, with a special focus on motor products.The first part shows how statistics and insurance mechanisms adopted the same aggregate viewpoint. It made visible regularities that were invisible at the individual level, further supporting the classificatory approach of insurance and the assumption that all members of a class are identical risks. The second part focuses on the reversal of perspective currently occurring in data analysis with predictive analytics, and how this conceptually contradicts the collective basis of insurance. The tremendous volume of data and the personalization promise through accurate individual prediction indeed deeply shakes the homogeneity hypothesis behind pooling. The third part attempts to assess the extent of this shift in motor insurance. Onboard devices that collect continuous driving behavioural data could import this new paradigm into these products. An examination of the current state of research on models with telematics data shows however that the epistemological leap, for now, has not happened.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 409
Author(s):  
Balázs Bohár ◽  
David Fazekas ◽  
Matthew Madgwick ◽  
Luca Csabai ◽  
Marton Olbei ◽  
...  

In the era of Big Data, data collection underpins biological research more so than ever before. In many cases this can be as time-consuming as the analysis itself, requiring downloading multiple different public databases, with different data structures, and in general, spending days before answering any biological questions. To solve this problem, we introduce an open-source, cloud-based big data platform, called Sherlock (https://earlham-sherlock.github.io/). Sherlock provides a gap-filling way for biologists to store, convert, query, share and generate biology data, while ultimately streamlining bioinformatics data management. The Sherlock platform provides a simple interface to leverage big data technologies, such as Docker and PrestoDB. Sherlock is designed to analyse, process, query and extract the information from extremely complex and large data sets. Furthermore, Sherlock is capable of handling different structured data (interaction, localization, or genomic sequence) from several sources and converting them to a common optimized storage format, for example to the Optimized Row Columnar (ORC). This format facilitates Sherlock’s ability to quickly and easily execute distributed analytical queries on extremely large data files as well as share datasets between teams. The Sherlock platform is freely available on Github, and contains specific loader scripts for structured data sources of genomics, interaction and expression databases. With these loader scripts, users are able to easily and quickly create and work with the specific file formats, such as JavaScript Object Notation (JSON) or ORC. For computational biology and large-scale bioinformatics projects, Sherlock provides an open-source platform empowering data management, data analytics, data integration and collaboration through modern big data technologies.


2021 ◽  
Author(s):  
Martha ◽  
Ramdas Vankdothu ◽  
Hameed Mohd Abdul ◽  
Rekha Gangula

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.


2016 ◽  
Author(s):  
Marco Moscatelli ◽  
Matteo Gnocchi ◽  
Andrea Manconi ◽  
Luciano Milanesi

Motivation Nowadays, advances in technology has arisen in a huge amount of data in both biomedical research and healthcare systems. This growing amount of data gives rise to the need for new research methods and analysis techniques. Analysis of these data offers new opportunities to define novel diagnostic processes. Therefore, a greater integration between healthcare and biomedical data is essential to devise novel predictive models in the field of biomedical diagnosis. In this context, the digitalization of clinical exams and medical records is becoming essential to collect heterogeneous information. Analysis of these data by means of big data technologies will allow a more in depth understanding of the mechanisms leading to diseases, and contextually it will facilitate the development of novel diagnostics and personalized therapeutics. The recent application of big data technologies in the medical fields will offer new opportunities to integrate enormous amount of medical and clinical information from population studies. Therefore, it is essential to devise new strategies aimed at storing and accessing the data in a standardized way. Moreover, it is important to provide suitable methods to manage these heterogeneous data. Methods In this work, we present a new information technology infrastructure devised to efficiently manage huge amounts of heterogeneous data for disease prevention and precision medicine. A test set based on data produced by a clinical and diagnostic laboratory has been built to set up the infrastructure. When working with clinical data is essential to ensure the confidentiality of sensitive patient data. Therefore, the set up phase has been carried out using "anonymous data". To this end, specific techniques have been adopted with the aim to ensure a high level of privacy in the correlation of the medical records with important secondary information (e.g., date of birth, place of residence). It should be noted that the rigidity of relational databases does not lend to the nature of these data. In our opinion, better results can be obtained using non-relational (NoSQL) databases. Starting from these considerations, the infrastructure has been developed on a NoSQL database with the aim to combine scalability and flexibility performances. In particular, MongoDB [1] has been used as it fits better to manage different types of data on large scale. In doing so, the infrastructure is able to provide an optimized management of huge amounts of heterogeneous data, while ensuring high speed of analysis. Results The presented infrastructure exploits big data technologies in order to overcome the limitations of relational databases when working with large and heterogeneous data. The infrastructure implements a set of interface procedures aimed at preparing the metadata for importing data in a NOSQL DB. Abstract truncated at 3,000 characters - the full version is available in the pdf file


Sign in / Sign up

Export Citation Format

Share Document