Preprocessing Big Data for Efficient Storage and Research

Big Data refers to large datasets and so it is not possible to store, manage and analyze it using commonly used software systems. The emergence of smart phones, social networks and online applications has led to the generation of massive amounts of structured, unstructured and semi structured data. Big data analytics has received sizeable attention since it offers a great opportunity to uncover potentials from heavy amounts of data. Data preprocessing techniques, when applied prior to analytics, can substantially improve the overall quality of the patterns mined and/or the time required for the actual mining. Thus this paper presents an efficient method for preprocessing data and also partitioning big dataset based on sensitivity parameters. The partitioned dataset can be uploaded to public and private cloud based on the importance of data in the partition. Thus hybrid cloud storage and processing of big data is supported by this approach. The experimental results show that the proposed method preprocesses and partition data with high accuracy and reduced processing time.

Download Full-text

SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data

Journal of Computer Science and Technology ◽

10.24215/16666038.18.e23 ◽

2018 ◽

Vol 18 (03) ◽

pp. e23 ◽

Cited By ~ 7

Author(s):

María José Basgall ◽

Waldo Hasperué ◽

Marcelo Naiouf ◽

Alberto Fernández ◽

Francisco Herrera

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Model Design ◽

Minority Class ◽

Imbalanced Classification ◽

Design And Implementation ◽

Learning Issues ◽

Intelligent Model

The volume of data in today's applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes. In this work we present SMOTE-BD, a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation.

Download Full-text

Cloud-Based Big Data Analysis Tools and Techniques Towards Sustainable Smart City Services

Advances in Computational Intelligence and Robotics - Decision Support Systems and Industrial IoT in Smart Grid, Factories, and Cities ◽

10.4018/978-1-7998-7468-3.ch004 ◽

2021 ◽

pp. 63-90

Author(s):

Suresh P. ◽

Keerthika P. ◽

Sathiyamoorthi V. ◽

Logeswaran K. ◽

Manjula Devi R. ◽

...

Keyword(s):

Big Data ◽

Real Time ◽

Smart City ◽

Traffic Management ◽

Smart Cities ◽

Big Data Analytics ◽

Streaming Data ◽

City Development ◽

Time Data ◽

Public And Private

Cloud computing and big data analytics are the key parts of smart city development that can create reliable, secure, healthier, more informed communities while producing tremendous data to the public and private sectors. Since the various sectors of smart cities generate enormous amounts of streaming data from sensors and other devices, storing and analyzing this huge real-time data typically entail significant computing capacity. Most smart city solutions use a combination of core technologies such as computing, storage, databases, data warehouses, and advanced technologies such as analytics on big data, real-time streaming data, artificial intelligence, machine learning, and the internet of things (IoT). This chapter presents a theoretical and experimental perspective on the smart city services such as smart healthcare, water management, education, transportation and traffic management, and smart grid that are offered using big data management and cloud-based analytics services.

Download Full-text

Improving Geospatial Big Data Analytics Approaches

Interdisciplinary Approaches to Spatial Optimization Issues - Advances in Geospatial Technologies ◽

10.4018/978-1-7998-1954-7.ch005 ◽

2021 ◽

pp. 82-90

Author(s):

Sana Rekik

Keyword(s):

Big Data ◽

Data Stream ◽

Web Search ◽

Big Data Analytics ◽

Future Events ◽

Data Stream Analysis ◽

Analysis System ◽

Changes Over Time ◽

Time Systems

The advent of geospatial big data has led to a paradigm shift where most related applications became data driven, and therefore intensive in both data and computation. This revolution has covered most domains, namely the real-time systems such as web search engines, social networks, and tracking systems. These later are linked to the high-velocity feature, which characterizes the dynamism, the fast changing and moving data streams. Therefore, the response time and speed of such queries, along with the space complexity, are among data stream analysis system requirements, which still require improvements using sophisticated algorithms. In this vein, this chapter discusses new approaches that can reduce the complexity and costs in time and space while improving the efficiency and quality of responses of geospatial big data stream analysis to efficiently detect changes over time, conclude, and predict future events.

Download Full-text

Using Big Data Analytics to Assist a Smart City to Prevent Cyber Security Threats

Examining the Socio-Technical Impact of Smart Cities - Advances in Human and Social Aspects of Technology ◽

10.4018/978-1-7998-5326-8.ch005 ◽

2021 ◽

pp. 107-124

Author(s):

Fenio Annansingh

Keyword(s):

Big Data ◽

Cyber Security ◽

Smart City ◽

Data Analytics ◽

Life Quality ◽

Big Data Analytics ◽

Review Of The Literature ◽

Smart Services ◽

Prevention Model

The concept of a smart city as a means to enhance the life quality of citizens has been gaining increasing importance in recent years globally. A smart city consists of city infrastructure, which includes smart services, devices, and institutions. Every second, these components of the smart city infrastructure are generating data. The vast amount of data is called big data. This chapter explores the possibilities of using big data analytics to prevent cybersecurity threats in a smart city. It also analyzed how big data tools and concepts can solve cybersecurity challenges and detect and prevent attacks. Using interviews and an extensive review of the literature have developed the data analytics and cyber prevention model. The chapter concludes by indicating that big data analytics allow a smart city to identify and solve cybersecurity challenges quickly and efficiently.

Download Full-text

IoT Based Agriculture as a Cloud and Big Data Service

Securing the Internet of Things ◽

10.4018/978-1-5225-9866-4.ch069 ◽

2020 ◽

pp. 1499-1521

Author(s):

Sukhpal Singh Gill ◽

Inderveer Chana ◽

Rajkumar Buyya

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Cloud Services ◽

Cloud Environment ◽

Big Data Technologies ◽

Sensor Networking ◽

New Applications

Cloud computing has transpired as a new model for managing and delivering applications as services efficiently. Convergence of cloud computing with technologies such as wireless sensor networking, Internet of Things (IoT) and Big Data analytics offers new applications' of cloud services. This paper proposes a cloud-based autonomic information system for delivering Agriculture-as-a-Service (AaaS) through the use of cloud and big data technologies. The proposed system gathers information from various users through preconfigured devices and IoT sensors and processes it in cloud using big data analytics and provides the required information to users automatically. The performance of the proposed system has been evaluated in Cloud environment and experimental results show that the proposed system offers better service and the Quality of Service (QoS) is also better in terms of QoS parameters.

Download Full-text

Mobile network quality of experience using big data analytics approach

2017 8th International Conference on Information Technology (ICIT) ◽

10.1109/icitech.2017.8079923 ◽

2017 ◽

Cited By ~ 3

Author(s):

Ayisat W. Yusuf-Asaju ◽

Zulkhairi B. Dahalin ◽

Azman Ta'a

Keyword(s):

Big Data ◽

Data Analytics ◽

Quality Of Experience ◽

Big Data Analytics ◽

Mobile Network ◽

Network Quality

Download Full-text

CBDR: An efficient storage repository for cultural big data

Digital Scholarship in the Humanities ◽

10.1093/llc/fqz083 ◽

2019 ◽

Vol 35 (4) ◽

pp. 893-903 ◽

Cited By ~ 1

Author(s):

Seemu Sharma ◽

Seema Bawa

Keyword(s):

Big Data ◽

Data Analytics ◽

High Performance ◽

Big Data Analytics ◽

Data Repository ◽

Common People ◽

Data Repositories ◽

Storage And Retrieval ◽

Efficient Storage ◽

Cultural Data

Abstract Cultural data and information on the web are continuously increasing, evolving, and reshaping in the form of big data due to globalization, digitization, and its vast exploration, with common people realizing the importance of ancient values. Therefore, before it becomes unwieldy and too complex to manage, its integration in the form of big data repositories is essential. This article analyzes the complexity of the growing cultural data and presents a Cultural Big Data Repository as an efficient way to store and retrieve cultural big data. The repository is highly scalable and provides integrated high-performance methods for big data analytics in cultural heritage. Experimental results demonstrate that the proposed repository outperforms in terms of space as well as storage and retrieval time of Cultural Big Data.

Download Full-text

Urban Planning and Smart City Decision Management Empowered by Real-Time Data Processing Using Big Data Analytics

Sensors ◽

10.3390/s18092994 ◽

2018 ◽

Vol 18 (9) ◽

pp. 2994 ◽

Cited By ~ 28

Author(s):

Bhagya Silva ◽

Murad Khan ◽

Changsu Jung ◽

Jihun Seo ◽

Diyan Muhammad ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Real Time ◽

Smart City ◽

Data Analytics ◽

Smart Cities ◽

Big Data Analytics ◽

Time Data ◽

Real Time Data

The Internet of Things (IoT), inspired by the tremendous growth of connected heterogeneous devices, has pioneered the notion of smart city. Various components, i.e., smart transportation, smart community, smart healthcare, smart grid, etc. which are integrated within smart city architecture aims to enrich the quality of life (QoL) of urban citizens. However, real-time processing requirements and exponential data growth withhold smart city realization. Therefore, herein we propose a Big Data analytics (BDA)-embedded experimental architecture for smart cities. Two major aspects are served by the BDA-embedded smart city. Firstly, it facilitates exploitation of urban Big Data (UBD) in planning, designing, and maintaining smart cities. Secondly, it occupies BDA to manage and process voluminous UBD to enhance the quality of urban services. Three tiers of the proposed architecture are liable for data aggregation, real-time data management, and service provisioning. Moreover, offline and online data processing tasks are further expedited by integrating data normalizing and data filtering techniques to the proposed work. By analyzing authenticated datasets, we obtained the threshold values required for urban planning and city operation management. Performance metrics in terms of online and offline data processing for the proposed dual-node Hadoop cluster is obtained using aforementioned authentic datasets. Throughput and processing time analysis performed with regard to existing works guarantee the performance superiority of the proposed work. Hence, we can claim the applicability and reliability of implementing proposed BDA-embedded smart city architecture in the real world.

Download Full-text

Big data - a review in health sciences

Global Journal of Information Technology Emerging Technologies ◽

10.18844/gjit.v6i1.392 ◽

2016 ◽

Vol 6 (1) ◽

Author(s):

Ayça Kurnaz Türkben ◽

Emre Türkben ◽

Dilek Karahoca ◽

Adem Karahoca

Keyword(s):

Big Data ◽

Health Sector ◽

Health Sciences ◽

Health Science ◽

Mobile Technologies ◽

Review Of Literature ◽

Public And Private ◽

Review Study ◽

The Cost

Technologies are changing very fast and data has an impact on the change of technology and development of world. Data are obtained by social media, the Internet and mobile technologies. For years, academics, researchers and companies utilize some sources and information to analyze them for their studies and jobs. Increasing usage of mobile devices, social networks, electronic records of customers in public and private sectors have led to increase in data. Obtained massive amount of data is called big data. There are a lot of description of big data in the literature, but simply it can be said that; big data is the data which have a massive size and can be obtained from every environment. One of these environment is health environment and it has grown fastly through that huge amount of data exist in this sector like patients’ electronic health record. Health sector has a high cost and decision will be taken as soon as possible and correctly in this sector in which timing is critically important. In this manner, the usage of big data in health is important to increase the quality of service, innovative health operations and decrease the cost. In this study, a brief review of literature has done for the use of big data in health sciences for last five years. Big data’s content, methods, advantages and difficulties are discussed in this review study. Keywords: Health science, Big data, Medicine, data mining

Download Full-text