scholarly journals Preprocessing Big Data for Efficient Storage and Research

Big Data refers to large datasets and so it is not possible to store, manage and analyze it using commonly used software systems. The emergence of smart phones, social networks and online applications has led to the generation of massive amounts of structured, unstructured and semi structured data. Big data analytics has received sizeable attention since it offers a great opportunity to uncover potentials from heavy amounts of data. Data preprocessing techniques, when applied prior to analytics, can substantially improve the overall quality of the patterns mined and/or the time required for the actual mining. Thus this paper presents an efficient method for preprocessing data and also partitioning big dataset based on sensitivity parameters. The partitioned dataset can be uploaded to public and private cloud based on the importance of data in the partition. Thus hybrid cloud storage and processing of big data is supported by this approach. The experimental results show that the proposed method preprocesses and partition data with high accuracy and reduced processing time.

2018 ◽  
Vol 18 (03) ◽  
pp. e23 ◽  
Author(s):  
María José Basgall ◽  
Waldo Hasperué ◽  
Marcelo Naiouf ◽  
Alberto Fernández ◽  
Francisco Herrera

The volume of data in today's applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes. In this work we present SMOTE-BD, a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation.


Author(s):  
Suresh P. ◽  
Keerthika P. ◽  
Sathiyamoorthi V. ◽  
Logeswaran K. ◽  
Manjula Devi R. ◽  
...  

Cloud computing and big data analytics are the key parts of smart city development that can create reliable, secure, healthier, more informed communities while producing tremendous data to the public and private sectors. Since the various sectors of smart cities generate enormous amounts of streaming data from sensors and other devices, storing and analyzing this huge real-time data typically entail significant computing capacity. Most smart city solutions use a combination of core technologies such as computing, storage, databases, data warehouses, and advanced technologies such as analytics on big data, real-time streaming data, artificial intelligence, machine learning, and the internet of things (IoT). This chapter presents a theoretical and experimental perspective on the smart city services such as smart healthcare, water management, education, transportation and traffic management, and smart grid that are offered using big data management and cloud-based analytics services.


Author(s):  
Sana Rekik

The advent of geospatial big data has led to a paradigm shift where most related applications became data driven, and therefore intensive in both data and computation. This revolution has covered most domains, namely the real-time systems such as web search engines, social networks, and tracking systems. These later are linked to the high-velocity feature, which characterizes the dynamism, the fast changing and moving data streams. Therefore, the response time and speed of such queries, along with the space complexity, are among data stream analysis system requirements, which still require improvements using sophisticated algorithms. In this vein, this chapter discusses new approaches that can reduce the complexity and costs in time and space while improving the efficiency and quality of responses of geospatial big data stream analysis to efficiently detect changes over time, conclude, and predict future events.


Author(s):  
Fenio Annansingh

The concept of a smart city as a means to enhance the life quality of citizens has been gaining increasing importance in recent years globally. A smart city consists of city infrastructure, which includes smart services, devices, and institutions. Every second, these components of the smart city infrastructure are generating data. The vast amount of data is called big data. This chapter explores the possibilities of using big data analytics to prevent cybersecurity threats in a smart city. It also analyzed how big data tools and concepts can solve cybersecurity challenges and detect and prevent attacks. Using interviews and an extensive review of the literature have developed the data analytics and cyber prevention model. The chapter concludes by indicating that big data analytics allow a smart city to identify and solve cybersecurity challenges quickly and efficiently.


2020 ◽  
pp. 1499-1521
Author(s):  
Sukhpal Singh Gill ◽  
Inderveer Chana ◽  
Rajkumar Buyya

Cloud computing has transpired as a new model for managing and delivering applications as services efficiently. Convergence of cloud computing with technologies such as wireless sensor networking, Internet of Things (IoT) and Big Data analytics offers new applications' of cloud services. This paper proposes a cloud-based autonomic information system for delivering Agriculture-as-a-Service (AaaS) through the use of cloud and big data technologies. The proposed system gathers information from various users through preconfigured devices and IoT sensors and processes it in cloud using big data analytics and provides the required information to users automatically. The performance of the proposed system has been evaluated in Cloud environment and experimental results show that the proposed system offers better service and the Quality of Service (QoS) is also better in terms of QoS parameters.


2019 ◽  
Vol 35 (4) ◽  
pp. 893-903 ◽  
Author(s):  
Seemu Sharma ◽  
Seema Bawa

Abstract Cultural data and information on the web are continuously increasing, evolving, and reshaping in the form of big data due to globalization, digitization, and its vast exploration, with common people realizing the importance of ancient values. Therefore, before it becomes unwieldy and too complex to manage, its integration in the form of big data repositories is essential. This article analyzes the complexity of the growing cultural data and presents a Cultural Big Data Repository as an efficient way to store and retrieve cultural big data. The repository is highly scalable and provides integrated high-performance methods for big data analytics in cultural heritage. Experimental results demonstrate that the proposed repository outperforms in terms of space as well as storage and retrieval time of Cultural Big Data.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 2994 ◽  
Author(s):  
Bhagya Silva ◽  
Murad Khan ◽  
Changsu Jung ◽  
Jihun Seo ◽  
Diyan Muhammad ◽  
...  

The Internet of Things (IoT), inspired by the tremendous growth of connected heterogeneous devices, has pioneered the notion of smart city. Various components, i.e., smart transportation, smart community, smart healthcare, smart grid, etc. which are integrated within smart city architecture aims to enrich the quality of life (QoL) of urban citizens. However, real-time processing requirements and exponential data growth withhold smart city realization. Therefore, herein we propose a Big Data analytics (BDA)-embedded experimental architecture for smart cities. Two major aspects are served by the BDA-embedded smart city. Firstly, it facilitates exploitation of urban Big Data (UBD) in planning, designing, and maintaining smart cities. Secondly, it occupies BDA to manage and process voluminous UBD to enhance the quality of urban services. Three tiers of the proposed architecture are liable for data aggregation, real-time data management, and service provisioning. Moreover, offline and online data processing tasks are further expedited by integrating data normalizing and data filtering techniques to the proposed work. By analyzing authenticated datasets, we obtained the threshold values required for urban planning and city operation management. Performance metrics in terms of online and offline data processing for the proposed dual-node Hadoop cluster is obtained using aforementioned authentic datasets. Throughput and processing time analysis performed with regard to existing works guarantee the performance superiority of the proposed work. Hence, we can claim the applicability and reliability of implementing proposed BDA-embedded smart city architecture in the real world.


Author(s):  
Ayça Kurnaz Türkben ◽  
Emre Türkben ◽  
Dilek Karahoca ◽  
Adem Karahoca

Technologies are changing very fast and data has an impact on the change of technology and development of world. Data are obtained by social media, the Internet and mobile technologies. For years, academics, researchers and companies utilize some sources and information to analyze them for their studies and jobs. Increasing usage of mobile devices, social networks, electronic records of customers in public and private sectors have led to increase in data. Obtained massive amount of data is called big data. There are a lot of description of big data in the literature, but simply it can be said that; big data is the data which have a massive size and can be obtained from every environment. One of these environment is health environment and it has grown fastly through that huge amount of data exist in this sector like patients’ electronic health record. Health sector has a high cost and decision will be taken as soon as possible and correctly in this sector in which timing is critically important. In this manner, the usage of big data in health is important to increase the quality of service, innovative health operations and decrease the cost. In this study, a brief review of literature has done for the use of big data in health sciences for last five years. Big data’s content, methods, advantages and difficulties are discussed in this review study. Keywords: Health science, Big data, Medicine, data mining


Sign in / Sign up

Export Citation Format

Share Document