A Design on Informal Big Data Topic Extraction System Based on Spark Framework

Situated in the main tracks of typhoons in the Northwestern Pacific Ocean, Taiwan frequently encounters disasters from heavy rainfall during typhoons. Accurate and timely typhoon rainfall prediction is an imperative topic that must be addressed. The purpose of this study was to develop a Hadoop Spark distribute framework based on big-data technology, to accelerate the computation of typhoon rainfall prediction models. This study used deep neural networks (DNNs) and multiple linear regressions (MLRs) in machine learning, to establish rainfall prediction models and evaluate rainfall prediction accuracy. The Hadoop Spark distributed cluster-computing framework was the big-data technology used. The Hadoop Spark framework consisted of the Hadoop Distributed File System, MapReduce framework, and Spark, which was used as a new-generation technology to improve the efficiency of the distributed computing. The research area was Northern Taiwan, which contains four surface observation stations as the experimental sites. This study collected 271 typhoon events (from 1961 to 2017). The following results were obtained: (1) in machine-learning computation, prediction errors increased with prediction duration in the DNN and MLR models; and (2) the system of Hadoop Spark framework was faster than the standalone systems (single I7 central processing unit (CPU) and single E3 CPU). When complex computation is required in a model (e.g., DNN model parameter calibration), the big-data-based Hadoop Spark framework can be used to establish highly efficient computation environments. In summary, this study successfully used the big-data Hadoop Spark framework with machine learning, to develop rainfall prediction models with effectively improved computing efficiency. Therefore, the proposed system can solve problems regarding real-time typhoon rainfall prediction with high timeliness and accuracy.

Download Full-text

A Data Cleaning Model for Electric Power Big Data Based on Spark Framework

International Journal of Database Theory and Application ◽

10.14257/ijdta.2016.9.3.15 ◽

2016 ◽

Vol 9 (3) ◽

pp. 137-150 ◽

Cited By ~ 2

Author(s):

Zhao-Yang Qu ◽

Yong-Wen Wang ◽

Chong Wang ◽

Nan Qu ◽

Jia Yan

Keyword(s):

Big Data ◽

Electric Power ◽

Data Cleaning ◽

Spark Framework

Download Full-text

Enormous Information Examination using Big Data in a Distributed Environment with Profound Learning of Next Generation Interruption Identification Framework Enhancement

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4155.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1779-1784

Keyword(s):

Big Data ◽

Short Term Memory ◽

Neural System ◽

Distributed Environment ◽

Short Term ◽

Term Memory ◽

Identification Rate ◽

Long Short Term Memory ◽

Future Vision ◽

Spark Framework

With the developing utilization of data innovation in all life areas, hacking has turned out to be more contrarily powerful than any other time in recent memory. Additionally, with creating advances, assaults numbers are developing exponentially like clockwork and become progressively refined so conventional I.D.S ends up wasteful recognizing them. We accomplish those outcomes by utilizing Networking Chabot, a profound intermittent neural system: Long Short Term Memory (L.S.T.M) [2]over Apache Spark Framework that has a contribution of stream traffic and traffic conglomeration and the yield is a language of two words, typical or strange. The new and proposed blending ideas of the language are preparing, relevant examination, circulated profound adapting, huge information, and oddity discovery of stream investigation. We propose a model that portrays the system dynamic typical conduct from an arrangement of a great many parcels inside their unique circumstance and examines them in close to constant to identify point, aggregate and relevant inconsistencies. The examination shows lower false positive, higher identification rate and better point abnormalities location. With respect to demonstrate of relevant and aggregate oddities identification, we talk about our case and the explanation for our speculation. Be that as it may, the investigation is done on arbitrary little subsets of the dataset as a result of equipment restrictions, so we offer examination and our future vision musings as we wish that full demonstrate will be done in future by other intrigued specialists who have preferable equipment foundation over our own..

Download Full-text

Classification of Big Data Using Spark Framework

Lecture Notes in Electrical Engineering - Proceedings of the Fourth International Conference on Microelectronics, Computing and Communication Systems ◽

10.1007/978-981-15-5546-6_70 ◽

2020 ◽

pp. 847-854

Author(s):

Ritesh Jha ◽

Vandana Bhattacharjee ◽

Abhijit Mustafi

Keyword(s):

Big Data ◽

Spark Framework

Download Full-text

Target Data Optimization based on Big Data-streaming for Two-stage Fuzzy Extraction System

Proceedings of the 2018 International Conference on Big Data Engineering and Technology - BDET 2018 ◽

10.1145/3297730.3297731 ◽

2018 ◽

Author(s):

Rui-Yang Chen

Keyword(s):

Big Data ◽

Extraction System ◽

Data Streaming ◽

Two Stage ◽

Target Data ◽

Data Optimization

Download Full-text

Visualization of Big Data Text Analytics in Financial Industry: A Case Study of Topic Extraction for Italian Banks

SSRN Electronic Journal ◽

10.2139/ssrn.3490108 ◽

2019 ◽

Author(s):

Živko Krstić ◽

Sanja Seljan ◽

Jovana Zoroja

Keyword(s):

Big Data ◽

Text Analytics ◽

Financial Industry ◽

Topic Extraction

Download Full-text

Big Data-aware News Recommendation System According to Regional Twitter Users’ Interests

10.21203/rs.3.rs-392181/v1 ◽

2021 ◽

Author(s):

Maryam Bagheri ◽

Shahram Jamali ◽

Reza Fotohi

Keyword(s):

Big Data ◽

Recommendation System ◽

Research Area ◽

Small Data ◽

City Region ◽

Home Pages ◽

News Websites ◽

Twitter Users ◽

News Recommendation ◽

Spark Framework

Abstract Nowadays with the development of technology and access to the Internet everywhere for everyone, the interest to get the news from newspapers and other traditional media is decreasing. Therefore, the popularity of news websites is ascending as the newspapers are changing into electronic versions. News websites can be accessed from anywhere, i.e., any country, city, region, etc. So, the need to present the news depends on where the reader is from can be a research area, as with facing with variety of news topics on websites readers prefer to choose those which more often show the news, they are interested in on their home pages. Based on this idea we represent the technique to find favorite topics of Twitter users of certain geographical districts to provide news websites a way of increasing popularity. In this work we processed tweets. It seems that tweets are some small data, but we found out that processing this small data needs a lot of time, due to the repetition of the algorithm a lot and many searches to be done. Therefore, we categorized our work as big data. To help this problem we developed our work in the Spark framework. Our technique includes 2 phases; Feature Extraction Phase and Topic Discovery Phase. Our analysis shows that with this technique we can get the accuracy between 68% and 76%, in 3 developments 3-fold, 5-fold, and 10-fold.

Download Full-text