A Densely Connected GRU Neural Network Based on Coattention Mechanism for Chinese Rice-Related Question Similarity Matching

Accurate and explainable health event predictions are becoming crucial for healthcare providers to develop care plans for patients. The availability of electronic health records (EHR) has enabled machine learning advances in providing these predictions. However, many deep-learning-based methods are not satisfactory in solving several key challenges: 1) effectively utilizing disease domain knowledge; 2) collaboratively learning representations of patients and diseases; and 3) incorporating unstructured features. To address these issues, we propose a collaborative graph learning model to explore patient-disease interactions and medical domain knowledge. Our solution is able to capture structural features of both patients and diseases. The proposed model also utilizes unstructured text data by employing an attention manipulating strategy and then integrates attentive text features into a sequential learning process. We conduct extensive experiments on two important healthcare problems to show the competitive prediction performance of the proposed method compared with various state-of-the-art models. We also confirm the effectiveness of learned representations and model interpretability by a set of ablation and case studies.

Download Full-text

Efficient natural language classification algorithm for detecting duplicate unsupervised features

Informatics and Automation - Информатика и автоматизация ◽

10.15622/ia.2021.3.5 ◽

2021 ◽

Vol 20 (3) ◽

pp. 623-653

Author(s):

Saud Altaf ◽

Sofia Iqbal ◽

Muhammad Waseem Soomro

Keyword(s):

Natural Language ◽

Short Term Memory ◽

Short Term ◽

Vocabulary Size ◽

Language Understanding ◽

Inverse Document Frequency ◽

Classification Technique ◽

Document Frequency ◽

Text Features ◽

Long Short Term Memory

This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Download Full-text

Object Oriented Modelling in Information Systems Based on Related Text Data

IFIP Advances in Information and Communication Technology - Artificial Intelligence Applications and Innovations ◽

10.1007/978-3-642-23960-1_26 ◽

2011 ◽

pp. 212-218

Author(s):

Kolyo Onkov

Keyword(s):

Information Systems ◽

Object Oriented ◽

Text Data ◽

Related Text ◽

Object Oriented Modelling

Download Full-text

Design of Vibration Frequency Method with Fine-Tuned Factor for Fault Detection of Three Phase Induction Motor

Journal of Innovative Image Processing - December 2019 ◽

10.36548/jiip.2021.1.005 ◽

2021 ◽

Vol 3 (1) ◽

pp. 52-65

Author(s):

Thomas Amanuel ◽

Amanuel Ghirmay ◽

Huruy Ghebremeskel ◽

Robel Ghebrehiwet ◽

Weldekidan Bahlibi

Keyword(s):

Induction Motor ◽

Vibration Frequency ◽

Vibration Monitoring ◽

Slight Variation ◽

Frequency Method ◽

Monitoring Method ◽

Stator Current ◽

Research Article ◽

Proposed Model ◽

Fault Parameters

This research article focuses on industrial applications to demonstrate the characterization of current and vibration analysis to diagnose the induction motor drive problems. Generally, the induction motor faults are detected by monitoring the current and proposed fine-tuned vibration frequency method. The stator short circuit fault, broken rotor bar fault, air gap eccentricity, and bearing fault are the common faults that occur in an induction motor. The detection process of the proposed method is based on sidebands around the supply frequency in the stator current signal and vibration. Moreover, it is very challenging to diagnose the problem that occur due to the complex electromagnetic and mechanical characteristics of an induction motor with vibration measures. The design of an accurate model to measure vibration and stator current is analyzed in this research article. The proposed method is showing how efficiently the root cause of the problem can be diagnosed by using the combination of current and vibration monitoring method. The proposed model is developed for induction motor and its circuit environment in MATLAB is verified to perform an accurate detection and diagnosis of motor fault parameters. All stator faults are turned to turn fault; further, the rotor-broken bar and eccentricity are structured in each test. The output response (torque and stator current) is simulated by using a modified winding procedure (MWP) approach by tuning the winding geometrical parameter. The proposed model in MATLAB Simulink environment is highly symmetrical, which can easily detect the signal component in fault frequencies that occur due to a slight variation and improper motor installation. Finally, this research article compares the other existing methods with proposed method.

Download Full-text

The Classification System of Literary Works Based on K-Means Clustering

Journal of Interconnection Networks ◽

10.1142/s0219265921410012 ◽

2021 ◽

pp. 2141001

Author(s):

Sanqiang Wei ◽

Hongxia Hou ◽

Hua Sun ◽

Wei Li ◽

Wenxia Song

Keyword(s):

Clustering Algorithm ◽

Performance Ratio ◽

Levels Of Abstraction ◽

Text Documents ◽

Text Data ◽

Literary Works ◽

Accuracy Comparison ◽

Word Classification ◽

Text Features ◽

And Performance

The plots in certain literary works are very complicated and hinder readers from understanding them. Therefore tools should be proposed to support readers; comprehension of complex literary works supports their understanding by providing the most important information to readers. A human reader must capture multiple levels of abstraction and meaning to formulate an understanding of a document. Hence, in this paper, an Improved [Formula: see text]-means clustering algorithm (IKCA) has been proposed for literary word classification. For text data, the words that can express exact semantic in a class are generally better features. This paper uses the proposed technique to capture numerous cluster centroids for every class and then select the high-frequency words in centroids the text features for classification. Furthermore, neural networks have been used to classify text documents and [Formula: see text]-mean to cluster text documents. To develop the model based on unsupervised and supervised techniques to meet and identify the similarity between documents. The numerical results show that the suggested model will enhance to increases quality comparison of the existing Algorithm and [Formula: see text]-means algorithm, accuracy comparison of ALA and IKCA (95.2%), time is taken for clustering is less than 2 hours, success rate (97.4%) and performance ratio (98.1%).

Download Full-text

Content Noise Detection Model Using Deep Learning in Web Forums

Sustainability ◽

10.3390/su12125074 ◽

2020 ◽

Vol 12 (12) ◽

pp. 5074

Author(s):

Jiyoung Woo ◽

Jaeseok Yun

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Model ◽

Detection Model ◽

Proposed Model ◽

Web Forum ◽

Web Forums ◽

Conventional Machine ◽

Text Features ◽

Deep Learning Model

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively.

Download Full-text

Development of Reading Comprehension System for Kannada Text Documents

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1008.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 42-45

Keyword(s):

Reading Comprehension ◽

Natural Language ◽

Language Processing ◽

Primary Language ◽

Grammatical Structure ◽

Text Documents ◽

Inverse Document Frequency ◽

Document Frequency ◽

Proposed Model ◽

The Given

Reading Comprehension (RC) plays an important role in Natural Language Processing (NLP) as it reads and understands text written in Natural Language. Reading Comprehension systems comprehend the given document and answer questions in the context of the given document. This paper proposes a Reading Comprehension System for Kannada documents. The RC system analyses text in the Kannada script and allows users to pose questions to It in Kannada. This system is aimed at masses whose primary language is Kannada - who would otherwise have difficulties in parsing through vast Kannada documents for the information they require. This paper discusses the proposed model built using Term Frequency - Inverse Document Frequency (TF-IDF) and its performance in extracting the answers from the context document. The proposed model captures the grammatical structure of Kannada to provide the most accurate answers to the user

Download Full-text

Webshell Detection Based on Executable Data Characteristics of PHP Code

Wireless Communications and Mobile Computing ◽

10.1155/2021/5533963 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Zulie Pan ◽

Yuanchao Chen ◽

Yu Chen ◽

Yi Shen ◽

Xuanzhen Guo

Keyword(s):

State Of The Art ◽

Recall Rate ◽

Remote Access ◽

Detection Accuracy ◽

Controlled Experiments ◽

Data Set ◽

Detection Model ◽

Proposed Model ◽

Text Features ◽

And Control

A webshell is a malicious backdoor that allows remote access and control to a web server by executing arbitrary commands. The wide use of obfuscation and encryption technologies has greatly increased the difficulty of webshell detection. To this end, we propose a novel webshell detection model leveraging the grammatical features extracted from the PHP code. The key idea is to combine the executable data characteristics of the PHP code with static text features for webshell classification. To verify the proposed model, we construct a cleaned data set of webshell consisting of 2,917 samples from 17 webshell collection projects and conduct extensive experiments. We have designed three sets of controlled experiments, the results of which show that the accuracy of the three algorithms has reached more than 99.40%, the highest reached 99.66%, the recall rate has been increased by at least 1.8%, the most increased by 6.75%, and the F1 value has increased by 2.02% on average. It not only confirms the efficiency of the grammatical features in webshell detection but also shows that our system significantly outperforms several state-of-the-art rivals in terms of detection accuracy and recall rate.

Download Full-text

English poems categorization using text mining and rough set theory

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v9i4.1898 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1701-1710

Author(s):

Saif Ali Alsaidi ◽

Ahmed T. Sadeq ◽

Hasanen S. Abdullah

Keyword(s):

Text Mining ◽

Set Theory ◽

Rough Set ◽

Text Categorization ◽

Rough Set Theory ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Text Data ◽

Data Set ◽

Proposed Model

In recent years, Text Mining wasan important topic because of the growth of digital text data from many sources such as government document, Email, Social Media, Website, etc. The English poemsare one of the text data to categorization English Poems will use Text categorization, Text categorization is a method in which classify documents into one or more categories that were predefined the category based on the text content in a document .In this paper we will solve the problem of how to categorize the English poem into one of the English Poems categorizations by using text mining technique and Machine learning algorithm, Our data set consist of seven categorizations for poems the data set is divided into two-part training (learning)and testing data. In the proposed model we apply the text preprocessing for the documents file to reduce the number of feature and reduce dimensionality the preprocessing process converts the text poem to features and remove the irrelevant feature by using text mining process (tokenize,remove stop word and stemming), to reduce the feature vector of the remaining feature we usetwo methods for feature selection and use Rough set theory as machine learning algorithm to perform the categorization, and we get 88% success classification of the proposed model.

Download Full-text

Novel Automated K-means++ Algorithm for Financial Data Sets

Mathematical Problems in Engineering ◽

10.1155/2021/5521119 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Guoyu Du ◽

Xuehua Li ◽

Lanjie Zhang ◽

Libo Liu ◽

Chaohua Zhao

Keyword(s):

Linear Time ◽

Sparse Matrix ◽

Data Sets ◽

Volume Data ◽

Text Data ◽

Data Set ◽

Inverse Document Frequency ◽

Document Frequency ◽

Initial Cluster

The K-means algorithm has been extensively investigated in the field of text clustering because of its linear time complexity and adaptation to sparse matrix data. However, it has two main problems, namely, the determination of the number of clusters and the location of the initial cluster centres. In this study, we propose an improved K-means++ algorithm based on the Davies-Bouldin index (DBI) and the largest sum of distance called the SDK-means++ algorithm. Firstly, we use the term frequency-inverse document frequency to represent the data set. Secondly, we measure the distance between objects by cosine similarity. Thirdly, the initial cluster centres are selected by comparing the distance to existing initial cluster centres and the maximum density. Fourthly, clustering results are obtained using the K-means++ method. Lastly, DBI is used to obtain optimal clustering results automatically. Experimental results on real bank transaction volume data sets show that the SDK-means++ algorithm is more effective and efficient than two other algorithms in organising large financial text data sets. The F-measure value of the proposed algorithm is 0.97. The running time of the SDK-means++ algorithm is reduced by 42.9% and 22.4% compared with that for K-means and K-means++ algorithms, respectively.

Download Full-text