scholarly journals A Design on Informal Big Data Topic Extraction System Based on Spark Framework

2016 ◽  
Vol 5 (11) ◽  
pp. 521-526
Author(s):  
Kiejin Park
PLoS ONE ◽  
2018 ◽  
Vol 13 (8) ◽  
pp. e0201933 ◽  
Author(s):  
Jungwon Yoon ◽  
Jong Wook Kim ◽  
Beakcheol Jang

Author(s):  
S Nirmala Sugirtha Rajini ◽  
K Anuradha ◽  
S Umadevi ◽  
E Mercy Beulah

Atmosphere ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 870 ◽  
Author(s):  
Chih-Chiang Wei ◽  
Tzu-Hao Chou

Situated in the main tracks of typhoons in the Northwestern Pacific Ocean, Taiwan frequently encounters disasters from heavy rainfall during typhoons. Accurate and timely typhoon rainfall prediction is an imperative topic that must be addressed. The purpose of this study was to develop a Hadoop Spark distribute framework based on big-data technology, to accelerate the computation of typhoon rainfall prediction models. This study used deep neural networks (DNNs) and multiple linear regressions (MLRs) in machine learning, to establish rainfall prediction models and evaluate rainfall prediction accuracy. The Hadoop Spark distributed cluster-computing framework was the big-data technology used. The Hadoop Spark framework consisted of the Hadoop Distributed File System, MapReduce framework, and Spark, which was used as a new-generation technology to improve the efficiency of the distributed computing. The research area was Northern Taiwan, which contains four surface observation stations as the experimental sites. This study collected 271 typhoon events (from 1961 to 2017). The following results were obtained: (1) in machine-learning computation, prediction errors increased with prediction duration in the DNN and MLR models; and (2) the system of Hadoop Spark framework was faster than the standalone systems (single I7 central processing unit (CPU) and single E3 CPU). When complex computation is required in a model (e.g., DNN model parameter calibration), the big-data-based Hadoop Spark framework can be used to establish highly efficient computation environments. In summary, this study successfully used the big-data Hadoop Spark framework with machine learning, to develop rainfall prediction models with effectively improved computing efficiency. Therefore, the proposed system can solve problems regarding real-time typhoon rainfall prediction with high timeliness and accuracy.


2016 ◽  
Vol 9 (3) ◽  
pp. 137-150 ◽  
Author(s):  
Zhao-Yang Qu ◽  
Yong-Wen Wang ◽  
Chong Wang ◽  
Nan Qu ◽  
Jia Yan

With the developing utilization of data innovation in all life areas, hacking has turned out to be more contrarily powerful than any other time in recent memory. Additionally, with creating advances, assaults numbers are developing exponentially like clockwork and become progressively refined so conventional I.D.S ends up wasteful recognizing them. We accomplish those outcomes by utilizing Networking Chabot, a profound intermittent neural system: Long Short Term Memory (L.S.T.M) [2]over Apache Spark Framework that has a contribution of stream traffic and traffic conglomeration and the yield is a language of two words, typical or strange. The new and proposed blending ideas of the language are preparing, relevant examination, circulated profound adapting, huge information, and oddity discovery of stream investigation. We propose a model that portrays the system dynamic typical conduct from an arrangement of a great many parcels inside their unique circumstance and examines them in close to constant to identify point, aggregate and relevant inconsistencies. The examination shows lower false positive, higher identification rate and better point abnormalities location. With respect to demonstrate of relevant and aggregate oddities identification, we talk about our case and the explanation for our speculation. Be that as it may, the investigation is done on arbitrary little subsets of the dataset as a result of equipment restrictions, so we offer examination and our future vision musings as we wish that full demonstrate will be done in future by other intrigued specialists who have preferable equipment foundation over our own..


2021 ◽  
Author(s):  
Maryam Bagheri ◽  
Shahram Jamali ◽  
Reza Fotohi

Abstract Nowadays with the development of technology and access to the Internet everywhere for everyone, the interest to get the news from newspapers and other traditional media is decreasing. Therefore, the popularity of news websites is ascending as the newspapers are changing into electronic versions. News websites can be accessed from anywhere, i.e., any country, city, region, etc. So, the need to present the news depends on where the reader is from can be a research area, as with facing with variety of news topics on websites readers prefer to choose those which more often show the news, they are interested in on their home pages. Based on this idea we represent the technique to find favorite topics of Twitter users of certain geographical districts to provide news websites a way of increasing popularity. In this work we processed tweets. It seems that tweets are some small data, but we found out that processing this small data needs a lot of time, due to the repetition of the algorithm a lot and many searches to be done. Therefore, we categorized our work as big data. To help this problem we developed our work in the Spark framework. Our technique includes 2 phases; Feature Extraction Phase and Topic Discovery Phase. Our analysis shows that with this technique we can get the accuracy between 68% and 76%, in 3 developments 3-fold, 5-fold, and 10-fold.


Sign in / Sign up

Export Citation Format

Share Document