scholarly journals Smartic: A smart tool for Big Data analytics and IoT

F1000Research ◽  
2022 ◽  
Vol 11 ◽  
pp. 17
Author(s):  
Shohel Sayeed ◽  
Abu Fuad Ahmad ◽  
Tan Choo Peng

The Internet of Things (IoT) is leading the physical and digital world of technology to converge. Real-time and massive scale connections produce a large amount of versatile data, where Big Data comes into the picture. Big Data refers to large, diverse sets of information with dimensions that go beyond the capabilities of widely used database management systems, or standard data processing software tools to manage within a given limit. Almost every big dataset is dirty and may contain missing data, mistyping, inaccuracies, and many more issues that impact Big Data analytics performances. One of the biggest challenges in Big Data analytics is to discover and repair dirty data; failure to do this can lead to inaccurate analytics results and unpredictable conclusions. We experimented with different missing value imputation techniques and compared machine learning (ML) model performances with different imputation methods. We propose a hybrid model for missing value imputation combining ML and sample-based statistical techniques. Furthermore, we continued with the best missing value inputted dataset, chosen based on ML model performance for feature engineering and hyperparameter tuning. We used k-means clustering and principal component analysis. Accuracy, the evaluated outcome, improved dramatically and proved that the XGBoost model gives very high accuracy at around 0.125 root mean squared logarithmic error (RMSLE). To overcome overfitting, we used K-fold cross-validation.

2019 ◽  
Vol 8 (S3) ◽  
pp. 35-40
Author(s):  
S. Mamatha ◽  
T. Sudha

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.


Author(s):  
Vijander Singh ◽  
Amit Kumar Bairwa ◽  
Deepak Sinwar

In the development of the advanced world, information has been created each second in numerous regions like astronomy, social locales, medical fields, transportation, web-based business, logical research, horticulture, video, and sound download. As per an overview, in 60 seconds, 600+ new clients on YouTube and 7 billion queries are executed on Google. In this way, we can say that the immense measure of organized, unstructured, and semi-organized information are produced each second around the cyber world, which should be managed efficiently. Big data conveys properties such as unpredictability, 'V' factor, multivariable information, and it must be put away, recovered, and dispersed. Logical arranged data may work as information in the field of digital world. In the past century, the sources of data as to size were very limited and could be managed using pen and paper. The next generation of data generation tools include Microsoft Excel, Access, and database tools like SQL, MySQL, and DB2.


Author(s):  
Prateek Gupta ◽  
Sverre Steen ◽  
Adil Rasheed

Abstract A modern ship is fitted with numerous sensors and Data Acquisition Systems (DAQs) each of which can be viewed as a data collection source node. These source nodes transfer data to one another and to one or many centralized systems. The centralized systems or data interpreter nodes can be physically located onboard the vessel or onshore at the shipping data control center. The main purpose of a data interpreter node is to assimilate the collected data and present or relay it in a concise manner. The interpreted data can further be visualized and used as an integral part of a monitoring and decision support system. This paper presents a simple data processing framework based on big data analytics. The framework uses Principal Component Analysis (PCA) as a tool to process data gathered through in-service measurements onboard a ship during various operational conditions. Weather hindcast data is obtained from various sources to account for environmental loads on the ship. The proposed framework reduces the dimensionality of high dimensional data and determines the correlation between data variables. The accuracy of the model is evaluated based on the data recorded during the voyage of a ship.


Author(s):  
Murad Khan ◽  
Bhagya Nathali Silva ◽  
Kijun Han

Big Data and deep computation are among the buzzwords in the present sophisticated digital world. Big Data has emerged with the expeditious growth of digital data. This chapter addresses the problem of employing deep learning algorithms in Big Data analytics. Unlike the traditional algorithms, this chapter comes up with various solutions to employ advanced deep learning mechanisms with less complexity and finally present a generic solution. The deep learning algorithms require less time to process the big amount of data based on different contexts. However, collecting the accurate feature and classifying the context into patterns using neural networks algorithms require high time and complexity. Therefore, using deep learning algorithms in integration with neural networks can bring optimize solutions. Consequently, the aim of this chapter is to provide an overview of how the advance deep learning algorithms can be used to solve various existing challenges in Big Data analytics.


2019 ◽  
Vol 10 (4) ◽  
pp. 423-446 ◽  
Author(s):  
Pankaj Sharma ◽  
Ashutosh Joshi

Purpose Big data analytics has emerged as one of the most used keywords in the digital world. The hype surrounding the buzz has led everyone to believe that big data analytics is the panacea for all evils. As the insights into this new field are growing and the world is discovering novel ways to apply big data, the need for caution has become increasingly important. The purpose of this paper is to conduct a literature review in the field of big data application for humanitarian relief and highlight the challenges of using big data for humanitarian relief missions. Design/methodology/approach This paper conducts a review of the literature of the application of big data in disaster relief operations. The methodology of literature review adopted in the paper was proposed by Mayring (2004) and is conducted in four steps, namely, material collection, descriptive analysis, category selection and material evaluation. Findings This paper summarizes the challenges that can affect the humanitarian logistical missions in case of over dependence on the big data tools. The paper emphasizes the need to exercise caution in applying digital humanitarianism for relief operations. Originality/value Most published research is focused on the benefits of big data describing the ways it will change the humanitarian relief horizon. This is an original paper that puts together the wisdom of the numerous published works about the negative effects of big data in humanitarian missions.


2020 ◽  
pp. 1344-1357
Author(s):  
Murad Khan ◽  
Bhagya Nathali Silva ◽  
Kijun Han

Big Data and deep computation are among the buzzwords in the present sophisticated digital world. Big Data has emerged with the expeditious growth of digital data. This chapter addresses the problem of employing deep learning algorithms in Big Data analytics. Unlike the traditional algorithms, this chapter comes up with various solutions to employ advanced deep learning mechanisms with less complexity and finally present a generic solution. The deep learning algorithms require less time to process the big amount of data based on different contexts. However, collecting the accurate feature and classifying the context into patterns using neural networks algorithms require high time and complexity. Therefore, using deep learning algorithms in integration with neural networks can bring optimize solutions. Consequently, the aim of this chapter is to provide an overview of how the advance deep learning algorithms can be used to solve various existing challenges in Big Data analytics.


Sign in / Sign up

Export Citation Format

Share Document