scholarly journals Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

2018 ◽  
Vol 12 (4) ◽  
pp. 1099-1117 ◽  
Author(s):  
Yi Zhang ◽  
Jie Lu ◽  
Feng Liu ◽  
Qian Liu ◽  
Alan Porter ◽  
...  
2020 ◽  
Vol 29 (07n08) ◽  
pp. 2040005
Author(s):  
Zhen Li ◽  
Dan Qu ◽  
Yanxia Li ◽  
Chaojie Xie ◽  
Qi Chen

Deep learning technology promotes the development of neural network machine translation (NMT). End-to-End (E2E) has become the mainstream in NMT. It uses word vectors as the initial value of the input layer. The effect of word vector model directly affects the accuracy of E2E-NMT. Researchers have proposed many approaches to learn word representations and have achieved significant results. However, the drawbacks of these methods still limit the performance of E2E-NMT systems. This paper focuses on the word embedding technology and proposes the PW-CBOW word vector model which can present better semantic information. We apply these word vector models on IWSLT14 German-English, WMT14 English-German, WMT14 English-French corporas. The results evaluate the performance of the PW-CBOW model. In the latest E2E-NMT systems, the PW-CBOW word vector model can improve the performance.


2019 ◽  
Vol 9 (16) ◽  
pp. 3414 ◽  
Author(s):  
Ren-Hung Hwang ◽  
Min-Chun Peng ◽  
Van-Linh Nguyen ◽  
Yu-Lun Chang

Recently, deep learning has been successfully applied to network security assessments and intrusion detection systems (IDSs) with various breakthroughs such as using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) to classify malicious traffic. However, these state-of-the-art systems also face tremendous challenges to satisfy real-time analysis requirements due to the major delay of the flow-based data preprocessing, i.e., requiring time for accumulating the packets into particular flows and then extracting features. If detecting malicious traffic can be done at the packet level, detecting time will be significantly reduced, which makes the online real-time malicious traffic detection based on deep learning technologies become very promising. With the goal of accelerating the whole detection process by considering a packet level classification, which has not been studied in the literature, in this research, we propose a novel approach in building the malicious classification system with the primary support of word embedding and the LSTM model. Specifically, we propose a novel word embedding mechanism to extract packet semantic meanings and adopt LSTM to learn the temporal relation among fields in the packet header and for further classifying whether an incoming packet is normal or a part of malicious traffic. The evaluation results on ISCX2012, USTC-TFC2016, IoT dataset from Robert Gordon University and IoT dataset collected on our Mirai Botnet show that our approach is competitive to the prior literature which detects malicious traffic at the flow level. While the network traffic is booming year by year, our first attempt can inspire the research community to exploit the advantages of deep learning to build effective IDSs without suffering significant detection delay.


Symmetry ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 1553
Author(s):  
Harun Yasar ◽  
Zeynep Hilal Kilimci

Exchange rate forecasting has been an important topic for investors, researchers, and analysts. In this study, financial sentiment analysis (FSA) and time series analysis (TSA) are proposed to form a predicting model for US Dollar/Turkish Lira exchange rate. For this purpose, the proposed hybrid model is constructed in three stages: obtaining and modeling text data for FSA, obtaining and modeling numerical data for TSA, and blending two models like a symmetry. To our knowledge, this is the first study in the literature that uses social media platforms as a source for FSA and blends them with TSA methods. To perform FSA, word embedding methods Word2vec, GloVe, fastText, and deep learning models such as CNN, RNN, LSTM are used. To the best of our knowledge, this study is the first attempt in terms of performing the FSA by using the combinations of deep learning models with word embedding methods for both Turkish and English texts. For TSA, simple exponential smoothing, Holt–Winters, Holt’s linear, and ARIMA models are employed. Finally, with the usage of the proposed model, any user who wants to make a US Dollar/Turkish Lira exchange rate forecast will be able to make a more consistent and strong exchange rate forecast.


Webology ◽  
2021 ◽  
Vol 18 (2) ◽  
pp. 1011-1022
Author(s):  
Saja Naeem Turky ◽  
Ahmed Sabah Ahmed AL-Jumaili ◽  
Rajaa K. Hasoun

An abstractive summary is a process of producing a brief and coherent summary that contains the original text's main concepts. In scientific texts, summarization has generally been restricted to extractive techniques. Abstractive methods that use deep learning have proven very effective in summarizing articles in public fields, like news documents. Because of the difficulty of the neural frameworks for learning specific domain- knowledge especially in NLP task, they haven't been more applied to documents that are related to a particular domain such as the medical domain. In this study, an abstractive summary is proposed. The proposed system is applied to the COVID-19 dataset which a collection of science documents linked to the coronavirus and associated illnesses, in this work 12000 samples from this dataset have been used. The suggested model is an abstractive summary model that can read abstracts of Covid-19 papers then create summaries in the style of a single-statement headline. A text summary model has been designed based on the LSTM method architecture. The proposed model includes using a glove model for word embedding which is converts input sequence to vector forms, then these vectors pass through LSTM layers to produce the summary. The results indicate that using an LSTM and glove model for word embedding together improves the summarization system's performance. This system was evaluated by rouge metrics and it achieved (43.6, 36.7, 43.6) for Rouge-1, Rouge-2, and Rouge-L respectively.


2021 ◽  
Vol 27 (11) ◽  
pp. 1203-1221
Author(s):  
Amal Rekik ◽  
Salma Jamoussi

Clustering data streams in order to detect trending topic on social networks is a chal- lenging task that interests the researchers in the big data field. In fact, analyzing such data needs several requirements to be addressed due to their large amount and evolving nature. For this purpose, we propose, in this paper, a new evolving clustering method which can take into account the incremental nature of the data and meet with its principal requirements. Our method explores a deep learning technique to learn incrementally from unlabelled examples generated at high speed which need to be clustered instantly. To evaluate the performance of our method, we have conducted several experiments using the Sanders, HCR and Terr-Attacks datasets.


Sign in / Sign up

Export Citation Format

Share Document