Iterative MapReduce

Author(s):  
Utkarsh Srivastava ◽  
Ramanathan L.

Diabetes Mellitus has turned into a noteworthy general wellbeing issue in India. Most recent measurements on diabetes uncover that 63 million individuals in India are experiencing diabetes, and this figure is probably going to go up to 80 million by 2025. Given the rise of big data as a socio-technical phenomenon, there are various complications in analyzing big data and its related data handling issues. This chapter examines Hadoop, an open source structure that permits the disseminated handling for huge datasets on group of PCs and thus finally produces better results with the deployment of Iterative MapReduce. The goal of this chapter is to dissect and extricate the enhanced performance of data analysis in distributed environment. Iterative MapReduce (i-MapReduce) plays a major role in optimizing the analytics performance. Implementation is done on Cloudera Hadoop introduced on top of Hortonworks Data Platform (HDP) Sandbox.

Author(s):  
Ying Wang ◽  
Yiding Liu ◽  
Minna Xia

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.


2020 ◽  
Vol 10 (4) ◽  
pp. 36
Author(s):  
Sajeewan Pratsri ◽  
Prachyanun Nilsook

According to a continuously increasing amount of information in all aspects whether the sources are retrieved from an internal or external organization, a platform should be provided for the automation of whole processes in the collection, storage, and processing of Big Data. The tool for creating Big Data is a Big Data challenge. Furthermore, the security and privacy of Big Data and Big Data analysis in organizations, government agencies, and educational institutions also have an impact on the aspect of designing a Big Data platform for higher education institute (HEi). It is a digital learning platform that is an online instruction and the use of digital media for educational reform including a module provides information on functions of various modules between computers and humans. 1) Big Data architecture is a framework for an architecture of numerous data which consisting of Big Data Infrastructure (BDI), Data Storage (Cloud-based), processing of a computer system that uses all parts of computer resources for optimal efficiency (High-Performance Computing: HPC), a network system to detect the target device network. Thereafter, according to Hadoop’s tools and techniques, when Big Data was introduced with Hadoop's tools and techniques, the benefits of the Big Data platform would provide desired data analysis by retrieving existing information, to illustrate, student information and teaching information that is large amounts of information to adopt for accurate forecasting.


2021 ◽  
Author(s):  
Fabian Kovacs ◽  
Max Thonagel ◽  
Marion Ludwig ◽  
Alexander Albrecht ◽  
Manuel Hegner ◽  
...  

BACKGROUND Big data in healthcare must be exploited to achieve a substantial increase in efficiency and competitiveness. Especially the analysis of patient-related data possesses huge potential to improve decision-making processes. However, most analytical approaches used today are highly time- and resource-consuming. OBJECTIVE The presented software solution Conquery is an open-source software tool providing advanced, but intuitive data analysis without the need for specialized statistical training. Conquery aims to simplify big data analysis for novice database users in the medical sector. METHODS Conquery is a document-oriented distributed timeseries database and analysis platform. Its main application is the analysis of per-person medical records by non-technical medical professionals. Complex analyses are realized in the Conquery frontend by dragging tree nodes into the query editor. Queries are evaluated by a bespoke distributed query-engine for medical records in a column-oriented fashion. We present a custom compression scheme to facilitate low response times that uses online calculated as well as precomputed metadata and data statistics. RESULTS Conquery allows for easy navigation through the hierarchy and enables complex study cohort construction whilst reducing the demand on time and resources. The UI of Conquery and a query output is exemplified by the construction of a relevant clinical cohort. CONCLUSIONS Conquery is an efficient and intuitive open-source software for performant and secure data analysis and aims at supporting decision-making processes in the healthcare sector.


Author(s):  
Roger S. Bivand

Abstract Twenty years have passed since Bivand and Gebhardt (J Geogr Syst 2(3):307–317, 2000. 10.1007/PL00011460) indicated that there was a good match between the then nascent open-source R programming language and environment and the needs of researchers analysing spatial data. Recalling the development of classes for spatial data presented in book form in Bivand et al. (Applied spatial data analysis with R. Springer, New York, 2008, Applied spatial data analysis with R, 2nd edn. Springer, New York, 2013), it is important to present the progress now occurring in representation of spatial data, and possible consequences for spatial data handling and the statistical analysis of spatial data. Beyond this, it is imperative to discuss the relationships between R-spatial software and the larger open-source geospatial software community on whose work R packages crucially depend.


Author(s):  
Arpit Kumar Sharma ◽  
Arvind Dhaka ◽  
Amita Nandal ◽  
Kumar Swastik ◽  
Sunita Kumari

The meaning of the term “big data” can be inferred by its name itself (i.e., the collection of large structured or unstructured data sets). In addition to their huge quantity, these data sets are so complex that they cannot be analyzed in any way using the conventional data handling software and hardware tools. If processed judiciously, big data can prove to be a huge advantage for the industries using it. Due to its usefulness, studies are being conducted to create methods to handle the big data. Knowledge extraction from big data is very important. Other than this, there is no purpose for accumulating such volumes of data. Cloud computing is a powerful tool which provides a platform for the storage and computation of massive amounts of data.


2020 ◽  
Author(s):  
Martin Wegmann ◽  
Jakob Schwalb-Willmann ◽  
Stefan Dech

This is a book about how ecologists can integrate remote sensing and GIS in their research. It will allow readers to get started with the application of remote sensing and to understand its potential and limitations. Using practical examples, the book covers all necessary steps from planning field campaigns to deriving ecologically relevant information through remote sensing and modelling of species distributions. An Introduction to Spatial Data Analysis introduces spatial data handling using the open source software Quantum GIS (QGIS). In addition, readers will be guided through their first steps in the R programming language. The authors explain the fundamentals of spatial data handling and analysis, empowering the reader to turn data acquired in the field into actual spatial data. Readers will learn to process and analyse spatial data of different types and interpret the data and results. After finishing this book, readers will be able to address questions such as “What is the distance to the border of the protected area?”, “Which points are located close to a road?”, “Which fraction of land cover types exist in my study area?” using different software and techniques. This book is for novice spatial data users and does not assume any prior knowledge of spatial data itself or practical experience working with such data sets. Readers will likely include student and professional ecologists, geographers and any environmental scientists or practitioners who need to collect, visualize and analyse spatial data. The software used is the widely applied open source scientific programs QGIS and R. All scripts and data sets used in the book will be provided online at book.ecosens.org. This book covers specific methods including: what to consider before collecting in situ data how to work with spatial data collected in situ the difference between raster and vector data how to acquire further vector and raster data how to create relevant environmental information how to combine and analyse in situ and remote sensing data how to create useful maps for field work and presentations how to use QGIS and R for spatial analysis how to develop analysis scripts


Sign in / Sign up

Export Citation Format

Share Document