Development of a big data system to assess ecosystem services of surface soil

Author(s):  
Kyoung Jae Lim ◽  
Dongjun Lee ◽  
Jonggun Kim ◽  
Jae E Yang ◽  
Minhwan Shin

<p>A big data system plays a significant role in various fields. This technology has also been applied to environment fields because it can discover hidden patterns between environmental factors. As the massive data set was constructed for several decades, big data analysis has widely been using for extracting useful information by analyzing different types of big data sets. In this study, we developed a big data system frame to assess the ecosystem service provided from surface soil. Among big data platforms, we used the Amazon Web Service (AWS) due to their cost-efficiency and hardware flexibility. There are five stages of the big data system (i.e. data acquisition– data storage – data processing – data analysis – visualization). In the data acquisition step, the soil sensor and Internet of Things (IoT) system were used, and we collected existing soil properties data provided by national institutes such as Rural Development Administration (RDA), Ministry of Environment (MOE), and Ministry of Land, Infrastructure, and Transport (MOLIT). AWS S3 platform, which is an object storage service and provides easy-to-use management features to users, was accepted as the data storage platform of the big data system. Amazon EMR, Amazon SageMaker, and Amazon QuickSight were used for the step of data processing, data analysis, and visualization of the big data system respectively. We tested that the developed system could predict soil bulk density and able to replace a typical environmental model by using models based on machine learning and deep learning. The results of the two tests showed positive results that the developed models could predict soil properties and simulate natural phenomena as much as the typical environmental model could.  However, since the system is at an early development stage, it needs repetitive tests in the future considering various soil properties. If this system becomes fully functional, the system will be helpful to improve soil environments.</p>

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mohammed Anouar Naoui ◽  
Brahim Lejdel ◽  
Mouloud Ayad ◽  
Abdelfattah Amamra ◽  
Okba kazar

PurposeThe purpose of this paper is to propose a distributed deep learning architecture for smart cities in big data systems.Design/methodology/approachWe have proposed an architectural multilayer to describe the distributed deep learning for smart cities in big data systems. The components of our system are Smart city layer, big data layer, and deep learning layer. The Smart city layer responsible for the question of Smart city components, its Internet of things, sensors and effectors, and its integration in the system, big data layer concerns data characteristics 10, and its distribution over the system. The deep learning layer is the model of our system. It is responsible for data analysis.FindingsWe apply our proposed architecture in a Smart environment and Smart energy. 10; In a Smart environment, we study the Toluene forecasting in Madrid Smart city. For Smart energy, we study wind energy foresting in Australia. Our proposed architecture can reduce the time of execution and improve the deep learning model, such as Long Term Short Memory10;.Research limitations/implicationsThis research needs the application of other deep learning models, such as convolution neuronal network and autoencoder.Practical implicationsFindings of the research will be helpful in Smart city architecture. It can provide a clear view into a Smart city, data storage, and data analysis. The 10; Toluene forecasting in a Smart environment can help the decision-maker to ensure environmental safety. The Smart energy of our proposed model can give a clear prediction of power generation.Originality/valueThe findings of this study are expected to contribute valuable information to decision-makers for a better understanding of the key to Smart city architecture. Its relation with data storage, processing, and data analysis.


2020 ◽  
Vol 10 (4) ◽  
pp. 36
Author(s):  
Sajeewan Pratsri ◽  
Prachyanun Nilsook

According to a continuously increasing amount of information in all aspects whether the sources are retrieved from an internal or external organization, a platform should be provided for the automation of whole processes in the collection, storage, and processing of Big Data. The tool for creating Big Data is a Big Data challenge. Furthermore, the security and privacy of Big Data and Big Data analysis in organizations, government agencies, and educational institutions also have an impact on the aspect of designing a Big Data platform for higher education institute (HEi). It is a digital learning platform that is an online instruction and the use of digital media for educational reform including a module provides information on functions of various modules between computers and humans. 1) Big Data architecture is a framework for an architecture of numerous data which consisting of Big Data Infrastructure (BDI), Data Storage (Cloud-based), processing of a computer system that uses all parts of computer resources for optimal efficiency (High-Performance Computing: HPC), a network system to detect the target device network. Thereafter, according to Hadoop’s tools and techniques, when Big Data was introduced with Hadoop's tools and techniques, the benefits of the Big Data platform would provide desired data analysis by retrieving existing information, to illustrate, student information and teaching information that is large amounts of information to adopt for accurate forecasting.


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


Author(s):  
Abou_el_ela Abdou Hussein

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.


2021 ◽  
Vol 105 ◽  
pp. 348-355
Author(s):  
Hou Xiang Liu ◽  
Sheng Han Zhou ◽  
Bang Chen ◽  
Chao Fan Wei ◽  
Wen Bing Chang ◽  
...  

The paper proposed a practice teaching mode by making analysis on Didi data set. There are more and more universities have provided the big data analysis courses with the rapid development and wide application of big data analysis technology. The theoretical knowledge of big data analysis is professional and hard to understand. That may reduce students' interest in learning and learning motivation. And the practice teaching plays an important role between theory learning and application. This paper first introduces the theoretical teaching part of the course, and the theoretical methods involved in the course. Then the practice teaching content of Didi data analysis case was briefly described. And the study selects the related evaluation index to evaluate the teaching effect through questionnaire survey and verify the effectiveness of teaching method. The results show that 78% of students think that practical teaching can greatly improve students' interest in learning, 89% of students think that practical teaching can help them learn theoretical knowledge, 89% of students have basically mastered the method of big data analysis technology introduced in the course, 90% of students think that the teaching method proposed in this paper can greatly improve students' practical ability. The teaching mode is effective, which can improve the learning effect and practical ability of students in data analysis, so as to improve the teaching effect.


Author(s):  
Ganesh Chandra Deka

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.


2018 ◽  
Vol 60 (5-6) ◽  
pp. 321-326 ◽  
Author(s):  
Christoph Boden ◽  
Tilmann Rabl ◽  
Volker Markl

Abstract The last decade has been characterized by the collection and availability of unprecedented amounts of data due to rapidly decreasing storage costs and the omnipresence of sensors and data-producing global online-services. In order to process and analyze this data deluge, novel distributed data processing systems resting on the paradigm of data flow such as Apache Hadoop, Apache Spark, or Apache Flink were built and have been scaled to tens of thousands of machines. However, writing efficient implementations of data analysis programs on these systems requires a deep understanding of systems programming, prohibiting large groups of data scientists and analysts from efficiently using this technology. In this article, we present some of the main achievements of the research carried out by the Berlin Big Data Cente (BBDC). We introduce the two domain-specific languages Emma and LARA, which are deeply embedded in Scala and enable declarative specification and the automatic parallelization of data analysis programs, the PEEL Framework for transparent and reproducible benchmark experiments of distributed data processing systems, approaches to foster the interpretability of machine learning models and finally provide an overview of the challenges to be addressed in the second phase of the BBDC.


2016 ◽  
Vol 49 (3) ◽  
pp. 1035-1041 ◽  
Author(s):  
Takanori Nakane ◽  
Yasumasa Joti ◽  
Kensuke Tono ◽  
Makina Yabashi ◽  
Eriko Nango ◽  
...  

A data processing pipeline for serial femtosecond crystallography at SACLA was developed, based onCheetah[Bartyet al.(2014).J. Appl. Cryst.47, 1118–1131] andCrystFEL[Whiteet al.(2016).J. Appl. Cryst.49, 680–689]. The original programs were adapted for data acquisition through the SACLA API, thread and inter-node parallelization, and efficient image handling. The pipeline consists of two stages: The first, online stage can analyse all images in real time, with a latency of less than a few seconds, to provide feedback on hit rate and detector saturation. The second, offline stage converts hit images into HDF5 files and runsCrystFELfor indexing and integration. The size of the filtered compressed output is comparable to that of a synchrotron data set. The pipeline enables real-time feedback and rapid structure solution during beamtime.


2020 ◽  
Author(s):  
Byeongchul Lee ◽  
Kyoung Jae Lim ◽  
Jae E Yang ◽  
Dong Seok Yang ◽  
Jiyoeng Hong

<p>In the age of big data, constructing a database plays a vital role in various fields. Especially, in the agricultural and environmental fields, real-time databases are useful because the fields are easily affected by dynamic nature phenomena. To construct a real-time database in these fields, various sensors and an Internet of Things (IoT) system have been widely used. In this study, an IoT system was developed to construct soil properties database on a real-time basis and aim to a big data system analysis that can assess ecosystem services provided from soil resources. The IoT system consisted of three types of soil sensors, main devices, sensor connectors, and subsidiary devices. The IoT system can measure soil temperature, moisture, and electrical conductivity (EC) data on a five-minute interval. Also, the devices were applied to two test-beds near Chuncheon city in South Korea and have been testing for the stability and availability of the system. In a further study, we will add various soil sensors and functions into the developed IoT system to improve their availability. If the developed IoT system becomes to be stable and functional, it can contribute to constructing soil properties database on a real-time basis and a big data system that assesses soil ecosystem services.</p>


Sign in / Sign up

Export Citation Format

Share Document