A big data pipeline for temporospatial infrasound analysis

Bank marketers still have difficulties to find the best implementation for credit card promotion using above the line, particularly based on customers preferences in point of interest (POI) locations such as mall and shopping center. On the other hand, customers on those POIs are keen to have recommendation on what is being offered by the bank. On this paper we propose a design architecture and implementation of big data platform to support bank’s credit card’s program campaign that generating data and extracting topics from Twitter. We built a data pipeline that consist of a Twitter streamer, a text preprocessor, a topic extractor using Latent Dirichlet Allocation, and a dashboard that visualize the recommendation. As a result, we successfully generate topics that related to specific location in Jakarta during some time windows, that can be used as a recommendation for bank marketers to create promotion program for their customers. We also present the analysis of computing power usages that indicates the strategy is well implemented on the big data platform.

Download Full-text

Review of social media analytics process and Big Data pipeline

Social Network Analysis and Mining ◽

10.1007/s13278-018-0507-0 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 4

Author(s):

Hiba Sebei ◽

Mohamed Ali Hadj Taieb ◽

Mohamed Ben Aouicha

Keyword(s):

Social Media ◽

Big Data ◽

Social Media Analytics ◽

Data Pipeline

Download Full-text

Automated Transverse Crack Mapping System with Optical Sensors and Big Data Analytics

Sensors ◽

10.3390/s20071838 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1838

Author(s):

Kwanghee Won ◽

Chungwook Sim

Keyword(s):

Big Data ◽

Optical Sensors ◽

Data Analytics ◽

Crack Detection ◽

Big Data Analytics ◽

Sensor Data ◽

Detection Methods ◽

Transverse Cracks ◽

Data Pipeline ◽

Localization Strategy

Transverse cracks on bridge decks provide the path for chloride penetration and are the major reason for deck deterioration. For such reasons, collecting information related to the crack widths and spacing of transverse cracks are important. In this study, we focused on developing a data pipeline for automated crack detection using non-contact optical sensors. We developed a data acquisition system that is able to acquire data in a fast and simple way without obstructing traffic. Understanding that GPS is not always available and odometer sensor data can only provide relative positions along the direction of traffic, we focused on providing an alternative localization strategy only using optical sensors. In addition, to improve existing crack detection methods which mostly rely on the low-intensity and localized line-segment characteristics of cracks, we considered the direction and shape of the cracks to make our machine learning approach smarter. The proposed system may serve as a useful inspection tool for big data analytics because the system is easy to deploy and provides multiple properties of cracks. Progression of crack deterioration, if any, both in spatial and temporal scale, can be checked and compared if the system is deployed multiple times.

Download Full-text

Analysing and Predicting on Diseases using Data Pipeline in Hadoop

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952362 ◽

2019 ◽

pp. 1288-1292

Author(s):

Arpna Joshi ◽

Chirag Singla ◽

Mr. Pankaj

Keyword(s):

Big Data ◽

Data Processing ◽

Real World ◽

Health Data ◽

Apache Spark ◽

Time Data ◽

Data Pipeline ◽

Data Ingestion ◽

Using Data ◽

Batch Data

A data pipeline is a set of conducts that are performed from the time data is available for ingestion till value is obtained from that data. Such kind of actions is Extraction (getting value field from the dataset), Transformation and Loading (putting the data of value in a form that is useful for upstream use). In this big data project, we will simulate a simple batch data pipeline. Our dataset of interest we will get from https://www.githubarchive.org/ that records the health data of US for past 125years. The objective of this spark project will be to create a small but real-world pipeline that downloads this dataset as they become available, initiated the various form of transformation and load them into forms of storage that will need further use. In this project Apache kafka is used for data ingestion, Apache Spark for data processing and Cassandra for storing the processed result.

Download Full-text

Comparative analysis of real-time messages in big data pipeline architecture

International Journal of High Performance Computing and Networking ◽

10.1504/ijhpcn.2019.10027728 ◽

2019 ◽

Vol 15 (3/4) ◽

pp. 191

Author(s):

Aung Htein Maw ◽

Hla Yin Min ◽

Thandar Aung

Keyword(s):

Big Data ◽

Comparative Analysis ◽

Real Time ◽

Pipeline Architecture ◽

Data Pipeline

Download Full-text

Big Data Pipeline with ML-Based and Crowd Sourced Dynamically Created and Maintained Columnar Data Warehouse for Structured and Unstructured Big Data

2020 3rd International Conference on Information and Computer Technologies (ICICT) ◽

10.1109/icict50521.2020.00018 ◽

2020 ◽

Author(s):

Kamran Ghane

Keyword(s):

Big Data ◽

Data Warehouse ◽

Data Pipeline

Download Full-text

Performance Evaluation for Real-Time Messaging System in Big Data Pipeline Architecture

2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) ◽

10.1109/cyberc.2018.00047 ◽

2018 ◽

Cited By ~ 1

Author(s):

Thandar Aung ◽

Hla Yin Min ◽

Aung Htein Maw

Keyword(s):

Big Data ◽

Performance Evaluation ◽

Real Time ◽

Pipeline Architecture ◽

Data Pipeline

Download Full-text

Comparative analysis of real-time messages in big data pipeline architecture

International Journal of High Performance Computing and Networking ◽

10.1504/ijhpcn.2019.106108 ◽

2019 ◽

Vol 15 (3/4) ◽

pp. 191

Author(s):

Thandar Aung ◽

Hla Yin Min ◽

Aung Htein Maw

Keyword(s):

Big Data ◽

Comparative Analysis ◽

Real Time ◽

Pipeline Architecture ◽

Data Pipeline

Download Full-text

Telco Data Analytics using Open-Source Data Pipeline: Detailed Architecture and Technology Stack

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38644 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1717-1725

Author(s):

Abirami T

Keyword(s):

Big Data ◽

Data Analysis ◽

Load Balancing ◽

Open Source ◽

Data Storage ◽

Data Analytics ◽

Big Data Analytics ◽

Time Data ◽

Data Pipeline ◽

Real Time Data

Abstract: Open-source technology has influenced data analytics at each step from data storage to data analysis, and visualization. Open source for telco big data analytics enables sharp insights by enhancing problem discoverability and solution feasibility. This research paper talks about different technology stacks using open source for telco big data analytics that are used to deploy various tools including data collection, data storage, data processing, data analysis, and data visualization. This open source pipeline micro-services architecture built with modular technology stack and orchestrated by Kubernetes, can ingest data from multiple sources, process real-time data and provide business and network intelligence. Major idea of using open source technology in our architecture is to reduce cost and manage easily. Kubernetes is an industry adopted open source container orchestrator that offers fault-tolerance, application scaling, and load-balancing. The results can be displayed on the intuitive open source dashboard like Grafana for telecom operators. Our architecture is flexible and can be easily customized based on the telecommunication industry needs. Using the proposed architecture, the telecommunication sectors can get quick decision making with nearly 30% lower CapEX which is made possible using COTS hardware. Index Terms: Big data analytics, Data pipeline architecture, Open Source technologies, Real-time data processing, Faulttolerance, Load-balancing, Kubernetes, BDA, Open source dashboard

Download Full-text