Analysis of Textual Data Based on Inductive Learning Techniques

This paper introduces knowledge discovery methods based on inductive learning techniques from textual data. The author argues three methods extracting features of the textual data. First one activates a key concept dictionary, second one does a key phrase pattern dictionary, and third one does a named entity extractor. These features are used in order to generate rules representing relationships between the features and text classes. The rules are described in the format of a fuzzy decision tree. Also, these features are used in order to acquire a classification model based on SVM (Support Vector Machine). The model can classify new textual data into the text classes with high classification accuracy. Lastly, this paper introduces two application tasks based on these methods and verifies the effect of the methods.

Download Full-text

Ablation Analysis to Select Wearable Sensors for Classifying Standing, Walking, and Running

Sensors ◽

10.3390/s21010194 ◽

2020 ◽

Vol 21 (1) ◽

pp. 194

Author(s):

Sarah Gonzalez ◽

Paul Stegall ◽

Harvey Edwards ◽

Leia Stirling ◽

Ho Chit Siu

Keyword(s):

Activity Recognition ◽

Principal Components ◽

Classification Accuracy ◽

Wearable Sensors ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Techniques ◽

Measurement Units ◽

The Difference

The field of human activity recognition (HAR) often utilizes wearable sensors and machine learning techniques in order to identify the actions of the subject. This paper considers the activity recognition of walking and running while using a support vector machine (SVM) that was trained on principal components derived from wearable sensor data. An ablation analysis is performed in order to select the subset of sensors that yield the highest classification accuracy. The paper also compares principal components across trials to inform the similarity of the trials. Five subjects were instructed to perform standing, walking, running, and sprinting on a self-paced treadmill, and the data were recorded while using surface electromyography sensors (sEMGs), inertial measurement units (IMUs), and force plates. When all of the sensors were included, the SVM had over 90% classification accuracy using only the first three principal components of the data with the classes of stand, walk, and run/sprint (combined run and sprint class). It was found that sensors that were placed only on the lower leg produce higher accuracies than sensors placed on the upper leg. There was a small decrease in accuracy when the force plates are ablated, but the difference may not be operationally relevant. Using only accelerometers without sEMGs was shown to decrease the accuracy of the SVM.

Download Full-text

A visual terrain classification method for mobile robots’ navigation based on convolutional neural network and support vector machine

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220987917 ◽

2021 ◽

pp. 014233122098791

Author(s):

Wanli Wang ◽

Botao Zhang ◽

Kaiqi Wu ◽

Sergey A Chepinskiy ◽

Anton A Zhilenkov ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Mobile Robots ◽

Convolutional Neural Network ◽

Hybrid Method ◽

Classification Accuracy ◽

Support Vector ◽

High Classification Accuracy ◽

Enhance Efficiency ◽

Multi Class Classification

In this paper, a hybrid method based on deep learning is proposed to visually classify terrains encountered by mobile robots. Considering the limited computing resource on mobile robots and the requirement for high classification accuracy, the proposed hybrid method combines a convolutional neural network with a support vector machine to keep a high classification accuracy while improve work efficiency. The key idea is that the convolutional neural network is used to finish a multi-class classification and simultaneously the support vector machine is used to make a two-class classification. The two-class classification performed by the support vector machine is aimed at one kind of terrain that users are mostly concerned with. Results of the two classifications will be consolidated to get the final classification result. The convolutional neural network used in this method is modified for the on-board usage of mobile robots. In order to enhance efficiency, the convolutional neural network has a simple architecture. The convolutional neural network and the support vector machine are trained and tested by using RGB images of six kinds of common terrains. Experimental results demonstrate that this method can help robots classify terrains accurately and efficiently. Therefore, the proposed method has a significant potential for being applied to the on-board usage of mobile robots.

Download Full-text

A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500202 ◽

2019 ◽

Vol 18 (03) ◽

pp. 1950020 ◽

Cited By ~ 13

Author(s):

Alok Kumar Shukla ◽

Pradeep Singh ◽

Manu Vardhan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification Accuracy ◽

B Cell Lymphoma ◽

Feature Subset Selection ◽

Classification Model ◽

Significant Feature ◽

Support Vector ◽

Feature Subset ◽

Binary Genetic Algorithm

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).

Download Full-text

Fuzzy Decision-Tree-Based Analysis of Databases

Handbook of Research on Fuzzy Information Processing in Databases ◽

10.4018/978-1-59904-853-6.ch031 ◽

2011 ◽

pp. 760-783

Author(s):

Malcolm Beynon

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Inductive Learning ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Fuzzy Environment ◽

The United Kingdom ◽

Fuzzy Representation ◽

Fuzzy Decision Trees ◽

Low Pay

The general fuzzy decision tree approach encapsulates the benefits of being an inductive learning technique to classify objects, utilising the richness of the data being considered, as well as the readability and interpretability that accompanies its operation in a fuzzy environment. This chapter offers a description of fuzzy decision tree based research, including the exposition of small and large fuzzy decision trees to demonstrate their construction and practicality. The two large fuzzy decision trees described are associated with a real application, namely, the identification of workplace establishments in the United Kingdom that pay a noticeable proportion of their employees less than the legislated minimum wage. Two separate fuzzy decision tree analyses are undertaken on a low-pay database, which utilise different numbers of membership functions to fuzzify the continuous attributes describing the investigated establishments. The findings demonstrate the sensitivity of results when there are changes in the compactness of the fuzzy representation of the associated data.

Download Full-text

Twitter sentiment analysis for the estimation of voting intention in the 2017 Chilean elections

Intelligent Data Analysis ◽

10.3233/ida-194768 ◽

2020 ◽

Vol 24 (5) ◽

pp. 1141-1160

Author(s):

Tomás Alegre Sepúlveda ◽

Brian Keith Norambuena

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Sentiment Analysis ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Traditional Methods ◽

Actual Result ◽

Learning Techniques ◽

Vector Machines

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.

Download Full-text

A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network

BMC Bioinformatics ◽

10.1186/s12859-019-3039-3 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 4

Author(s):

Jianghui Wen ◽

Yeshu Liu ◽

Yu Shi ◽

Haoran Huang ◽

Bing Deng ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Classification Accuracy ◽

Messenger Rna ◽

Biological Activities ◽

Classification Model ◽

Support Vector ◽

Non Coding Rna ◽

Recognition Ability ◽

The Difference

Abstract Background Long-chain non-coding RNA (lncRNA) is closely related to many biological activities. Since its sequence structure is similar to that of messenger RNA (mRNA), it is difficult to distinguish between the two based only on sequence biometrics. Therefore, it is particularly important to construct a model that can effectively identify lncRNA and mRNA. Results First, the difference in the k-mer frequency distribution between lncRNA and mRNA sequences is considered in this paper, and they are transformed into the k-mer frequency matrix. Moreover, k-mers with more species are screened by relative entropy. The classification model of the lncRNA and mRNA sequences is then proposed by inputting the k-mer frequency matrix and training the convolutional neural network. Finally, the optimal k-mer combination of the classification model is determined and compared with other machine learning methods in humans, mice and chickens. The results indicate that the proposed model has the highest classification accuracy. Furthermore, the recognition ability of this model is verified to a single sequence. Conclusion We established a classification model for lncRNA and mRNA based on k-mers and the convolutional neural network. The classification accuracy of the model with 1-mers, 2-mers and 3-mers was the highest, with an accuracy of 0.9872 in humans, 0.8797 in mice and 0.9963 in chickens, which is better than those of the random forest, logistic regression, decision tree and support vector machine.

Download Full-text

EXPOSING DIGITAL VIDEO LOGO-REMOVAL FORGERY BY INCONSISTENCY OF BLUR

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001410008317 ◽

2010 ◽

Vol 24 (07) ◽

pp. 1027-1046 ◽

Cited By ~ 4

Author(s):

YUTING SU ◽

JING ZHANG ◽

YU HAN ◽

JING CHEN ◽

QINGZHONG LIU

Keyword(s):

Classification Accuracy ◽

Digital Video ◽

Support Vector ◽

Forgery Detection ◽

Computation Cost ◽

Statistical Property ◽

High Classification Accuracy ◽

Novel Approach ◽

Fine Classification ◽

Reference Areas

A novel approach for detecting video logo-removal forgery is proposed by measuring inconsistency of blur. Our approach is based on the assumption that if a digital video undergoes logo-removal forgery; the blurriness of the forged region is expected to be different as compared to the nontampered parts of the video. Blurriness is first estimated by analyzing the spatial and temporal statistical property of logo areas, and suspicious areas are roughly located; then features are extracted and a fine classification is implemented by applying support vector machine (SVM) to extract features. If the suspicious areas and the reference areas are classified into different classes, the video is judged as a forged video. Experimental results show that our method is robust to video lossy compression for logo-removal forgery detection with the advantages of high classification accuracy and low computation cost.

Download Full-text

Machine Learning Techniques for Land Use/Land Cover Classification of Medium Resolution Optical Satellite Imagery Focusing on Temporary Inundated Areas

Journal of Environmental Geography ◽

10.2478/jengeo-2020-0005 ◽

2020 ◽

Vol 13 (1-2) ◽

pp. 43-52

Author(s):

Boudewijn van Leeuwen ◽

Zalán Tobak ◽

Ferenc Kovács

Keyword(s):

Neural Network ◽

Machine Learning ◽

Land Use ◽

Land Cover ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Land Use Land Cover ◽

Learning Techniques

AbstractClassification of multispectral optical satellite data using machine learning techniques to derive land use/land cover thematic data is important for many applications. Comparing the latest algorithms, our research aims to determine the best option to classify land use/land cover with special focus on temporary inundated land in a flat area in the south of Hungary. These inundations disrupt agricultural practices and can cause large financial loss. Sentinel 2 data with a high temporal and medium spatial resolution is classified using open source implementations of a random forest, support vector machine and an artificial neural network. Each classification model is applied to the same data set and the results are compared qualitatively and quantitatively. The accuracy of the results is high for all methods and does not show large overall differences. A quantitative spatial comparison demonstrates that the neural network gives the best results, but that all models are strongly influenced by atmospheric disturbances in the image.

Download Full-text

Research on Automatic Classification Method of Ethnic Music Emotion Based on Machine Learning

Journal of Mathematics ◽

10.1155/2022/7554404 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Zijin Wu

Keyword(s):

Folk Music ◽

Classification Accuracy ◽

Fine Tuning ◽

Classification Model ◽

Classification Method ◽

Support Vector ◽

Ethnic Music ◽

Huge Impact ◽

Speed Up

With the development of the country’s economy, there is a flourishing situation in the field of culture and art. However, the diversification of artistic expressions has not brought development to folk music. On the contrary, it brought a huge impact, and some national music even fell into the dilemma of being lost. This article is mainly aimed at the recognition and classification of folk music emotions and finds the model that can make the classification accuracy rate as high as possible. The classification model used in this article is mainly after determining the use of Support Vector Machine (SVM) classification method, a variety of attempts have been made to feature extraction, and good results have been achieved. Explore the Deep Belief Network (DBN) pretraining and reverse fine-tuning process, using DBN to learn the fusion characteristics of music. According to the abstract characteristics learned by them, the recognition and classification of folk music emotions are carried out. The DBN is improved by adding “Dropout” to each Restricted Boltzmann Machine (RBM) and adjusting the increase standard of weight and bias. The improved network can avoid the overfitting problem and speed up the training of the network. Through experiments, it is found that using the fusion features proposed in this paper, through classification, the classification accuracy has been improved.

Download Full-text

An Improved Word Representation for Deep Learning Based NER in Indian Languages

Information ◽

10.3390/info10060186 ◽

2019 ◽

Vol 10 (6) ◽

pp. 186 ◽

Cited By ~ 1

Author(s):

Ajees A P ◽

Manju K ◽

Sumam Mary Idicula

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Indian Languages ◽

Named Entity ◽

Text Document ◽

Learning Techniques ◽

Word Representation

Named Entity Recognition (NER) is the process of identifying the elementary units in a text document and classifying them into predefined categories such as person, location, organization and so forth. NER plays an important role in many Natural Language Processing applications like information retrieval, question answering, machine translation and so forth. Resolving the ambiguities of lexical items involved in a text document is a challenging task. NER in Indian languages is always a complex task due to their morphological richness and agglutinative nature. Even though different solutions were proposed for NER, it is still an unsolved problem. Traditional approaches to Named Entity Recognition were based on the application of hand-crafted features to classical machine learning techniques such as Hidden Markov Model (HMM), Support Vector Machine (SVM), Conditional Random Field (CRF) and so forth. But the introduction of deep learning techniques to the NER problem changed the scenario, where the state of art results have been achieved using deep learning architectures. In this paper, we address the problem of effective word representation for NER in Indian languages by capturing the syntactic, semantic and morphological information. We propose a deep learning based entity extraction system for Indian languages using a novel combined word representation, including character-level, word-level and affix-level embeddings. We have used ‘ARNEKT-IECSIL 2018’ shared data for training and testing. Our results highlight the improvement that we obtained over the existing pre-trained word representations.

Download Full-text