EXTRACTING TOPICS FROM A TV CHANNEL'S FACEBOOK PAGE USING CONTEXTUALIZED DOCUMENT EMBEDDING

Abstract. Topic models extract meaningful words from text collection, allowing for a better understanding of data. However, the results are often not coherent enough, and thus harder to interpret. Adding more contextual knowledge to the model can enhance coherence. In recent years, neural network-based topic models become available, and the development level of the neural model has developed thanks to BERT-based representation. In this study, we suggest a model extract news on the Aljazeera Facebook page. Our approach combines the neural model (ProdLDA) and the Arabic Pre-training BERT transformer model (AraBERT). Therefore, the proposed model produces more expressive and consistent topics than ELMO using different topic model algorithms (ProdLDA and LDA) with 0.883 in topic coherence.

Download Full-text

It all starts with entities: A Salient entity topic model

Natural Language Engineering ◽

10.1017/s1351324919000585 ◽

2019 ◽

Vol 26 (5) ◽

pp. 531-549

Author(s):

Chuan Wu ◽

Evangelos Kanoulas ◽

Maarten de Rijke

Keyword(s):

Topic Model ◽

State Of The Art ◽

Topic Models ◽

Generation Process ◽

Qualitative And Quantitative Analysis ◽

Generative Process ◽

Qualitative And Quantitative ◽

Proposed Model ◽

Topic Distribution ◽

Document Generation

AbstractEntities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.

Download Full-text

Fault Diagnosis of Transformer Windings Based on Decision Tree and Fully Connected Neural Network

Energies ◽

10.3390/en14061531 ◽

2021 ◽

Vol 14 (6) ◽

pp. 1531

Author(s):

ZhenHua Li ◽

Yujie Zhang ◽

Ahmed Abu-Siada ◽

Xingxin Chen ◽

Zhenxing Li ◽

...

Keyword(s):

Neural Network ◽

Decision Tree ◽

Classification Model ◽

Response Analysis ◽

Decision Tree Classification ◽

Lumped Parameter ◽

Proposed Model ◽

Industry Practice ◽

Transformer Model ◽

Fully Connected

While frequency response analysis (FRA) is a well matured technique widely used by current industry practice to detect the mechanical integrity of power transformers, interpretation of FRA signatures is still challenging, regardless of the research efforts in this area. This paper presents a method for reliable quantitative and qualitative analysis to the transformer FRA signatures based on a decision tree classification model and a fully connected neural network. Several levels of different six fault types are obtained using a lumped parameter-based transformer model. Results show that the proposed model performs well in the training and the validation stages, and is of good generalization ability.

Download Full-text

Document Informed Neural Autoregressive Topic Models with Distributional Prior

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016505 ◽

2019 ◽

Vol 33 ◽

pp. 6505-6512 ◽

Cited By ~ 2

Author(s):

Pankaj Gupta ◽

Yatin Chaudhary ◽

Florian Buettner ◽

Hinrich Schütze

Keyword(s):

Topic Model ◽

State Of The Art ◽

Topic Models ◽

Language Modeling ◽

Context Information ◽

Short Text ◽

Neuron Networks ◽

Proposed Model ◽

Actual Meaning ◽

Biological Neuron

We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., “networks” used in the contexts artificial neural networks vs. biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe. We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains.

Download Full-text

Oil Saturation Log Prediction Using Neural Network in New Steamflood Area

Proc. Indon. Petrol. Assoc., Digital Technical Conference, 2020 ◽

10.29118/ipa20-g-307 ◽

2020 ◽

Author(s):

A. Syahputra

Keyword(s):

Neural Network ◽

Water Saturation ◽

Prediction Method ◽

Neural Model ◽

Training Model ◽

Acquisition Time ◽

Oil Saturation ◽

Full Field ◽

Remaining Oil ◽

Observation Wells

Surveillance is very important in managing a steamflood project. On the current surveillance plan, Temperature and steam ID logs are acquired on observation wells at least every year while CO log (oil saturation log or SO log) every 3 years. Based on those surveillance logs, a dynamic full field reservoir model is updated quarterly. Typically, a high depletion rate happens in a new steamflood area as a function of drainage activities and steamflood injection. Due to different acquisition time, there is a possibility of misalignment or information gaps between remaining oil maps (ie: net pay, average oil saturation or hydrocarbon pore thickness map) with steam chest map, for example a case of high remaining oil on high steam saturation interval. The methodology that is used to predict oil saturation log is neural network. In this neural network method, open hole observation wells logs (static reservoir log) such as vshale, porosity, water saturation effective, and pay non pay interval), dynamic reservoir logs as temperature, steam saturation, oil saturation, and acquisition time are used as input. A study case of a new steamflood area with 16 patterns of single reservoir target used 6 active observation wells and 15 complete logs sets (temperature, steam ID, and CO log), 19 incomplete logs sets (only temperature and steam ID) since 2014 to 2019. Those data were divided as follows ~80% of completed log set data for neural network training model and ~20% of completed log set data for testing the model. As the result of neural model testing, R2 is score 0.86 with RMS 5% oil saturation. In this testing step, oil saturation log prediction is compared to actual data. Only minor data that shows different oil saturation value and overall shape of oil saturation logs are match. This neural network model is then used for oil saturation log prediction in 19 incomplete log set. The oil saturation log prediction method can fill the gap of data to better describe the depletion process in a new steamflood area. This method also helps to align steam map and remaining oil to support reservoir management in a steamflood project.

Download Full-text

Hybrid Deep Neural Model for Duplicate Question Detection in Transliterated Bi-lingual Data

Recent Patents on Computer Science ◽

10.2174/2213275912666190710152709 ◽

2019 ◽

Vol 12 ◽

Author(s):

Seema Rani ◽

Avadhesh Kumar ◽

Naresh Kumar

Keyword(s):

Question Answering ◽

Neural Model ◽

Manhattan Distance ◽

Semantic Matching ◽

English Only ◽

Detection Model ◽

Proposed Model ◽

Reported Study ◽

Social Media Platforms ◽

Efficient Information

Background: Duplicate content often corrupts the filtering mechanism in online question answering. Moreover, as users are usually more comfortable conversing in their native language questions, transliteration adds to the challenges in detecting duplicate questions. This compromises with the response time and increases the answer overload. Thus, it has now become crucial to build clever, intelligent and semantic filters which semantically match linguistically disparate questions. Objective: Most of the research on duplicate question detection has been done on mono-lingual, majorly English Q&A platforms. The aim is to build a model which extends the cognitive capabilities of machines to interpret, comprehend and learn features for semantic matching in transliterated bi-lingual Hinglish (Hindi + English) data acquired from different Q&A platforms. Method: In the proposed DQDHinglish (Duplicate Question Detection) Model, firstly language transformation (transliteration & translation) is done to convert the bi-lingual transliterated question into a mono-lingual English only text. Next a hybrid of Siamese neural network containing two identical Long-term-Short-memory (LSTM) models and Multi-layer perceptron network is proposed to detect semantically similar question pairs. Manhattan distance function is used as the similarity measure. Result: A dataset was prepared by scrapping 100 question pairs from various social media platforms, such as Quora and TripAdvisor. The performance of the proposed model on the basis of accuracy and F-score. The proposed DQDHinglish achieves a validation accuracy of 82.40%. Conclusion: A deep neural model was introduced to find semantic match between English question and a Hinglish (Hindi + English) question such that similar intent questions can be combined to enable fast and efficient information processing and delivery. A dataset was created and the proposed model was evaluated on the basis of performance accuracy. To the best of our knowledge, this work is the first reported study on transliterated Hinglish semantic question matching.

Download Full-text

Nodule Detection with Convolutional Neural Network Using Apache Spark and GPU Frameworks

Applied Sciences ◽

10.3390/app11062838 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2838

Author(s):

Nikitha Johnsirani Venkatesan ◽

Dong Ryeol Shin ◽

Choon Sung Nam

Keyword(s):

Neural Network ◽

Radiation Dose ◽

Convolutional Neural Network ◽

Model Performance ◽

Performance Comparison ◽

Apache Spark ◽

Training Time ◽

Learning Framework ◽

Proposed Model

In the pharmaceutical field, early detection of lung nodules is indispensable for increasing patient survival. We can enhance the quality of the medical images by intensifying the radiation dose. High radiation dose provokes cancer, which forces experts to use limited radiation. Using abrupt radiation generates noise in CT scans. We propose an optimal Convolutional Neural Network model in which Gaussian noise is removed for better classification and increased training accuracy. Experimental demonstration on the LUNA16 dataset of size 160 GB shows that our proposed method exhibit superior results. Classification accuracy, specificity, sensitivity, Precision, Recall, F1 measurement, and area under the ROC curve (AUC) of the model performance are taken as evaluation metrics. We conducted a performance comparison of our proposed model on numerous platforms, like Apache Spark, GPU, and CPU, to depreciate the training time without compromising the accuracy percentage. Our results show that Apache Spark, integrated with a deep learning framework, is suitable for parallel training computation with high accuracy.

Download Full-text

Natural Disasters Intensity Analysis and Classification Based on Multispectral Images Using Multi-Layered Deep Convolutional Neural Network

Sensors ◽

10.3390/s21082648 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2648

Author(s):

Muhammad Aamir ◽

Tariq Ali ◽

Muhammad Irfan ◽

Ahmad Shaf ◽

Muhammad Zeeshan Azam ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Natural Disasters ◽

Deep Convolutional Neural Network ◽

Multispectral Images ◽

Learning Techniques ◽

Proposed Model ◽

Disaster Intensity ◽

And Performance

Natural disasters not only disturb the human ecological system but also destroy the properties and critical infrastructures of human societies and even lead to permanent change in the ecosystem. Disaster can be caused by naturally occurring events such as earthquakes, cyclones, floods, and wildfires. Many deep learning techniques have been applied by various researchers to detect and classify natural disasters to overcome losses in ecosystems, but detection of natural disasters still faces issues due to the complex and imbalanced structures of images. To tackle this problem, we propose a multilayered deep convolutional neural network. The proposed model works in two blocks: Block-I convolutional neural network (B-I CNN), for detection and occurrence of disasters, and Block-II convolutional neural network (B-II CNN), for classification of natural disaster intensity types with different filters and parameters. The model is tested on 4428 natural images and performance is calculated and expressed as different statistical values: sensitivity (SE), 97.54%; specificity (SP), 98.22%; accuracy rate (AR), 99.92%; precision (PRE), 97.79%; and F1-score (F1), 97.97%. The overall accuracy for the whole model is 99.92%, which is competitive and comparable with state-of-the-art algorithms.

Download Full-text

Predicting the Energy Consumption of a Robot in an Exploration Task Using Optimized Neural Networks

Electronics ◽

10.3390/electronics10080920 ◽

2021 ◽

Vol 10 (8) ◽

pp. 920

Author(s):

Liesle Caballero ◽

Álvaro Perafan ◽

Martha Rinaldy ◽

Winston Percybrooks

Keyword(s):

Neural Network ◽

Energy Consumption ◽

Mobile Robot ◽

Energy Budget ◽

Dynamic Models ◽

Pearson Correlation ◽

Experimental Conditions ◽

Grid Map ◽

Proposed Model ◽

Exploration Task

This paper deals with the problem of determining a useful energy budget for a mobile robot in a given environment without having to carry out experimental measures for every possible exploration task. The proposed solution uses machine learning models trained on a subset of possible exploration tasks but able to make predictions on untested scenarios. Additionally, the proposed model does not use any kinematic or dynamic models of the robot, which are not always available. The method is based on a neural network with hyperparameter optimization to improve performance. Tabu List optimization strategy is used to determine the hyperparameter values (number of layers and number of neurons per layer) that minimize the percentage relative absolute error (%RAE) while maximize the Pearson correlation coefficient (R) between predicted data and actual data measured under a number of experimental conditions. Once the optimized artificial neural network is trained, it can be used to predict the performance of an exploration algorithm on arbitrary variations of a grid map scenario. Based on such prediction, it is possible to know the energy needed for the robot to complete the exploration task. A total of 128 tests were carried out using a robot executing two exploration algorithms in a grid map with the objective of locating a target whose location is not known a priori by the robot. The experimental energy consumption was measured and compared with the prediction of our model. A success rate of 96.093% was obtained, measured as the percentage of tests where the energy budget suggested by the model was enough to actually carry out the task when compared to the actual energy consumed in the test, suggesting that the proposed model could be useful for energy budgeting in actual mobile robot applications.

Download Full-text

Tomato Leaf Disease Diagnosis Based on Improved Convolution Neural Network by Attention Module

Agriculture ◽

10.3390/agriculture11070651 ◽

2021 ◽

Vol 11 (7) ◽

pp. 651

Author(s):

Shengyi Zhao ◽

Yun Peng ◽

Jizhan Liu ◽

Shuo Wu

Keyword(s):

Neural Network ◽

High Performance ◽

Model Comparison ◽

Research Direction ◽

Disease Diagnosis ◽

Tomato Leaf ◽

Identification Accuracy ◽

Main Research ◽

Proposed Model ◽

Complex Features

Crop disease diagnosis is of great significance to crop yield and agricultural production. Deep learning methods have become the main research direction to solve the diagnosis of crop diseases. This paper proposed a deep convolutional neural network that integrates an attention mechanism, which can better adapt to the diagnosis of a variety of tomato leaf diseases. The network structure mainly includes residual blocks and attention extraction modules. The model can accurately extract complex features of various diseases. Extensive comparative experiment results show that the proposed model achieves the average identification accuracy of 96.81% on the tomato leaf diseases dataset. It proves that the model has significant advantages in terms of network complexity and real-time performance compared with other models. Moreover, through the model comparison experiment on the grape leaf diseases public dataset, the proposed model also achieves better results, and the average identification accuracy of 99.24%. It is certified that add the attention module can more accurately extract the complex features of a variety of diseases and has fewer parameters. The proposed model provides a high-performance solution for crop diagnosis under the real agricultural environment.

Download Full-text

Recurrent neural networks with long term temporal dependencies in machine tool wear diagnosis and prognosis

SN Applied Sciences ◽

10.1007/s42452-021-04427-5 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Jianlei Zhang ◽

Yukun Zeng ◽

Binil Starly

Keyword(s):

Neural Network ◽

Tool Wear ◽

Machine Tool ◽

Recurrent Neural Network ◽

Machine Tools ◽

Prediction Performance ◽

Sequential Data ◽

Diagnosis And Prognosis ◽

Proposed Model

AbstractData-driven approaches for machine tool wear diagnosis and prognosis are gaining attention in the past few years. The goal of our study is to advance the adaptability, flexibility, prediction performance, and prediction horizon for online monitoring and prediction. This paper proposes the use of a recent deep learning method, based on Gated Recurrent Neural Network architecture, including Long Short Term Memory (LSTM), which try to captures long-term dependencies than regular Recurrent Neural Network method for modeling sequential data, and also the mechanism to realize the online diagnosis and prognosis and remaining useful life (RUL) prediction with indirect measurement collected during the manufacturing process. Existing models are usually tool-specific and can hardly be generalized to other scenarios such as for different tools or operating environments. Different from current methods, the proposed model requires no prior knowledge about the system and thus can be generalized to different scenarios and machine tools. With inherent memory units, the proposed model can also capture long-term dependencies while learning from sequential data such as those collected by condition monitoring sensors, which means it can be accommodated to machine tools with varying life and increase the prediction performance. To prove the validity of the proposed approach, we conducted multiple experiments on a milling machine cutting tool and applied the model for online diagnosis and RUL prediction. Without loss of generality, we incorporate a system transition function and system observation function into the neural net and trained it with signal data from a minimally intrusive vibration sensor. The experiment results showed that our LSTM-based model achieved the best overall accuracy among other methods, with a minimal Mean Square Error (MSE) for tool wear prediction and RUL prediction respectively.

Download Full-text