Scalable Multilabel Learning Based on Feature and Label Dimensionality Reduction

The data-driven management of real-life systems based on a trained model, which in turn is based on the data gathered from its daily usage, has attracted a lot of attention because it realizes scalable control for large-scale and complex systems. To obtain a model within an acceptable computational cost that is restricted by practical constraints, the learning algorithm may need to identify essential data that carries important knowledge on the relation between the observed features representing the measurement value and labels encoding the multiple target concepts. This results in an increased computational burden owing to the concurrent learning of multiple labels. A straightforward approach to address this issue is feature selection; however, it may be insufficient to satisfy the practical constraints because the computational cost for feature selection can be impractical when the number of labels is large. In this study, we propose an efficient multilabel feature selection method to achieve scalable multilabel learning when the number of labels is large. The empirical experiments on several multilabel datasets show that the multilabel learning process can be boosted without deteriorating the discriminating power of the multilabel classifier.

Download Full-text

Multi-Objective Binary PSO with Kernel P System on GPU

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.3.3282 ◽

2018 ◽

Vol 13 (3) ◽

pp. 323-336 ◽

Cited By ~ 3

Author(s):

Naeimeh Elkhani ◽

Ravie Chandren Muniyandi ◽

Gexiang Zhang

Keyword(s):

Feature Selection ◽

Computational Cost ◽

Feature Selection Method ◽

Binary Particle Swarm Optimization ◽

P System ◽

Time Cost ◽

Multi Objective ◽

Almost All ◽

Binary Pso ◽

Independent Tasks

Computational cost is a big challenge for almost all intelligent algorithms which are run on CPU. In this regard, our proposed kernel P system multi-objective binary particle swarm optimization feature selection and classification method should perform with an efficient time that we aimed to settle via using potentials of membrane computing in parallel processing and nondeterminism. Moreover, GPUs perform better with latency-tolerant, highly parallel and independent tasks. In this study, to meet all the potentials of a membrane-inspired model particularly parallelism and to improve the time cost, feature selection method implemented on GPU. The time cost of the proposed method on CPU, GPU and Multicore indicates a significant improvement via implementing method on GPU.

Download Full-text

A fast method for statistical grammar induction

Natural Language Engineering ◽

10.1017/s1351324998001983 ◽

1998 ◽

Vol 4 (3) ◽

pp. 191-209 ◽

Cited By ~ 5

Author(s):

WIDE R. HOGENHOUT ◽

YUJI MATSUMOTO

Keyword(s):

Computational Complexity ◽

Language Processing ◽

Large Scale ◽

Learning Algorithm ◽

Computational Cost ◽

Computational Effort ◽

Fast Method ◽

Statistical Induction ◽

Stochastic Context Free Grammars ◽

Context Free

The statistical induction of stochastic context free grammars from bracketed corpora with the Inside Outside Algorithm is an appealing method for grammar learning, but the computational complexity of this algorithm has made it impossible to generate a large scale grammar. Researchers from natural language processing and speech recognition have suggested various methods to reduce the computational complexity and, at the same time, guide the learning algorithm towards a solution by, for example, placing constraints on the grammar. We suggest a method that strongly reduces that computational cost of the algorithm without placing constraints on the grammar. This method can in principle be combined with any of the constraints on grammars that have been suggested in earlier studies. We show that it is feasible to achieve results equivalent to earlier research, but with much lower computational effort. After creating a small grammar, the grammar is incrementally increased while rules that have become obsolete are removed at the same time. We explain the modifications to the algorithm, give results of experiments and compare these to results reported in other publications.

Download Full-text

Experimental Analysis for Semantic based Large Scale Service Composition using Deep Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j1061.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 4280-4283

Keyword(s):

Deep Learning ◽

Web Services ◽

Web Service ◽

Service Composition ◽

Web Application ◽

Large Scale ◽

Learning Algorithm ◽

Computational Cost ◽

Web Service Composition ◽

User Requirement

In Service Oriented Architecture (SOA) web services plays important role. Web services are web application components that can be published, found, and used on the Web. Also machine-to-machine communication over a network can be achieved through web services. Cloud computing and distributed computing brings lot of web services into WWW. Web service composition is the process of combing two or more web services to together to satisfy the user requirements. Tremendous increase in the number of services and the complexity in user requirement specification make web service composition as challenging task. The automated service composition is a technique in which Web Service Composition can be done automatically with minimal or no human intervention. In this paper we propose a approach of web service composition methods for large scale environment by considering the QoS Parameters. We have used stacked autoencoders to learn features of web services. Recurrent Neural Network (RNN) leverages uses the learned features to predict the new composition. Experiment results show the efficiency and scalability. Use of deep learning algorithm in web service composition, leads to high success rate and less computational cost.

Download Full-text

IoT Botnet Attack Detection Based on Optimized Extreme Gradient Boosting and Feature Selection

Sensors ◽

10.3390/s20216336 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6336 ◽

Cited By ~ 1

Author(s):

Mnahi Alqahtani ◽

Hassan Mathkour ◽

Mohamed Maher Ben Ismail

Keyword(s):

Feature Selection ◽

Large Scale ◽

Feature Selection Method ◽

Selection Method ◽

Attack Detection ◽

Gradient Boosting ◽

Fisher Score ◽

Detection Approach ◽

Extreme Gradient Boosting ◽

Iot Devices

Nowadays, Internet of Things (IoT) technology has various network applications and has attracted the interest of many research and industrial communities. Particularly, the number of vulnerable or unprotected IoT devices has drastically increased, along with the amount of suspicious activity, such as IoT botnet and large-scale cyber-attacks. In order to address this security issue, researchers have deployed machine and deep learning methods to detect attacks targeting compromised IoT devices. Despite these efforts, developing an efficient and effective attack detection approach for resource-constrained IoT devices remains a challenging task for the security research community. In this paper, we propose an efficient and effective IoT botnet attack detection approach. The proposed approach relies on a Fisher-score-based feature selection method along with a genetic-based extreme gradient boosting (GXGBoost) model in order to determine the most relevant features and to detect IoT botnet attacks. The Fisher score is a representative filter-based feature selection method used to determine significant features and discard irrelevant features through the minimization of intra-class distance and the maximization of inter-class distance. On the other hand, GXGBoost is an optimal and effective model, used to classify the IoT botnet attacks. Several experiments were conducted on a public botnet dataset of IoT devices. The evaluation results obtained using holdout and 10-fold cross-validation techniques showed that the proposed approach had a high detection rate using only three out of the 115 data traffic features and improved the overall performance of the IoT botnet attack detection process.

Download Full-text

A straightforward feature selection method based on mean ratio for classifiers

Intelligent Decision Technologies ◽

10.3233/idt-200186 ◽

2021 ◽

pp. 1-12

Author(s):

Emmanuel Tavares ◽

Alisson Marques Silva ◽

Gray Farias Moita ◽

Rodrigo Tomas Nogueira Cardoso

Keyword(s):

Feature Selection ◽

Computational Cost ◽

Feature Selection Method ◽

Predictive Ability ◽

Large Data ◽

Research Area ◽

Data Sets ◽

Promising Alternative ◽

The Mean ◽

Mathematical Operations

Feature Selection (FS) is currently a very important and prominent research area. The focus of FS is to identify and to remove irrelevant and redundant features from large data sets in order to reduced processing time and to improve the predictive ability of the algorithms. Thus, this work presents a straightforward and efficient FS method based on the mean ratio of the attributes (features) associated with each class. The proposed filtering method, here called MRFS (Mean Ratio Feature Selection), has only equations with low computational cost and with basic mathematical operations such as addition, division, and comparison. Initially, in the MRFS method, the average from the data sets associated with the different outputs is computed for each attribute. Then, the calculation of the ratio between the averages extracted from each attribute is performed. Finally, the attributes are ordered based on the mean ratio, from the smallest to the largest value. The attributes that have the lowest values are more relevant to the classification algorithms. The proposed method is evaluated and compared with three state-of-the-art methods in classification using four classifiers and ten data sets. Computational experiments and their comparisons against other feature selection methods show that MRFS is accurate and that it is a promising alternative in classification tasks.

Download Full-text

UNSUPERVISED FEATURE SELECTION USING INCREMENTAL LEAST SQUARES

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622011004671 ◽

2011 ◽

Vol 10 (06) ◽

pp. 967-987 ◽

Cited By ~ 14

Author(s):

RONG LIU ◽

ROBERT RALLO ◽

YORAM COHEN

Keyword(s):

Feature Selection ◽

Least Squares ◽

Selection Process ◽

Real Life ◽

Feature Selection Method ◽

Least Square ◽

Feature Subset ◽

Selection Algorithm ◽

Forward Selection ◽

Unsupervised Feature Selection

An unsupervised feature selection method is proposed for analysis of datasets of high dimensionality. The least square error (LSE) of approximating the complete dataset via a reduced feature subset is proposed as the quality measure for feature selection. Guided by the minimization of the LSE, a kernel least squares forward selection algorithm (KLS-FS) is developed that is capable of both linear and non-linear feature selection. An incremental LSE computation is designed to accelerate the selection process and, therefore, enhances the scalability of KLS-FS to high-dimensional datasets. The superiority of the proposed feature selection algorithm, in terms of keeping principal data structures, learning performances in classification and clustering applications, and robustness, is demonstrated using various real-life datasets of different sizes and dimensions.

Download Full-text

Feature Selection using K-Means Genetic Clustering To Predict Rheumatoid Arthritis Disease

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c6043.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 7020-7023

Keyword(s):

Rheumatoid Arthritis ◽

Genetic Algorithm ◽

Feature Selection ◽

Data Analysis ◽

Clustering Algorithm ◽

Learning Algorithm ◽

Real Life ◽

Common Disease ◽

Rheumatoid Arthritis Disease

In our Society, Aging society plays serious problems in health and medical care. When compared to other diseases in the real life Rheumatoid Arthritis disease is a common disease, Rheumatoid Arthritis is a disease that causes pain in musculoskeletal system that affect the quality of the people. Rheumatoid Arthritis is onset at middle age, but can affect children and young adults. If the disease is not monitored and treated as early as possible, it can cause serious joint deformities. Cluster analysis is an unsupervised learning technique in data mining for identifying or exploring out the structure of data without known about class label. Many clustering algorithms were proposed to analyze high volume of data, but many of them not evaluate cluster’s quality because of inconvenient features presented in the dataset. Feature selection is a prime task in data analysis in case of high dimensional dataset. Optimal subsets of features are enough to cluster the data. In this study, Rheumatoid Arthritis clinical data were analyzed to predict the patient affected with Rheumatoid Arthritis disease. In this study, KMeans clustering algorithm was used to predict the patient affected with Rheumatoid Arthritis Disease. Genetic algorithm is used to filter the feature and at the end of the process it finds optimal clusters for k-Means clustering algorithm. Based on the initial centroid , K-Means algorithm may have the chance of producing empty cluster. K-means does not effectively handle the outliers or noisy data in the dataset. K-means algorithm when combined with Genetic Algorithm shows high performance quality of clustering and fast evolution process when compared with K-Means alone. In this paper, to diagnosis Rheumatoid Arthritis disease we use machine learning algorithm FSKG. A predictive FSKG model is explored that diagnoses rheumatoid arthritis. After completing data analysis and pre-processing operations, Genetic Algorithm and K-Means Clustering Algorithm are integrated to choose correct features among all the features. Experimental Results from this study imply improved accuracy when compared to k-means algorithm for rheumatoid disease prediction.

Download Full-text

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

10.21203/rs.3.rs-31751/v1 ◽

2020 ◽

Author(s):

Kamal Berahmand ◽

Mehrdad Rostami ◽

Saman Forouzandeh

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Learning Algorithm ◽

Side Information ◽

Computational Cost ◽

Cost Savings ◽

Learning Task ◽

Pairwise Constraint ◽

Target Class ◽

Partially Labeled Data

Abstract In recent years, with the development of science and technology, there were considerable advances in datasets in various sciences, and many features are also shown for these datasets nowadays. With a high-dimensional dataset, many features are generally redundant and/or irrelevant for a provided learning task, which has adverse effects with regard to computational cost and/or performance. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. By appropriate reduction of the dimensions, in addition to time-cost savings, performance increases as well. In this paper, side information such as pairwise constraint is used to rank and reduce the dimensions. In the proposed method, the authors deal with checking the quality (strength or uncertainty) of the pairwise constraint. Usually, the quality of the pair of constraints on the dimension reduction is not calculated. In the first step, the strength matrix is created through a similarity matrix and uncertainty region. And then, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The findings indicate that the proposed approach improves previous related approaches with respect to the accuracy of constrained clustering. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.

Download Full-text

A novel feature selection method for large-scale data sets1

Intelligent Data Analysis ◽

10.3233/ida-2005-9302 ◽

2005 ◽

Vol 9 (3) ◽

pp. 237-251 ◽

Cited By ~ 1

Author(s):

Wei-Chou Chen ◽

Ming-Chun Yang ◽

Shian-Shyong Tseng

Keyword(s):

Feature Selection ◽

Large Scale ◽

Feature Selection Method ◽

Selection Method ◽

Large Scale Data ◽

Scale Data

Download Full-text

Evolutionary Multilabel Feature Selection Using Promising Feature Subset Generation

Journal of Sensors ◽

10.1155/2018/3419213 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12

Author(s):

Jaesung Lee ◽

Wangduk Seo ◽

Ho Han ◽

Dae-Won Kim

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Search Space ◽

Sensor Data ◽

Limited Resources ◽

Feature Subset ◽

Selection Methods ◽

Multilabel Learning ◽

Information Harvesting ◽

Generation Procedure

Recent progress in the development of sensor devices improves information harvesting and allows complex but intelligent applications based on learning hidden relations between collected sensor data and objectives. In this scenario, multilabel feature selection can play an important role in achieving better learning accuracy when constrained with limited resources. However, existing multilabel feature selection methods are search-ineffective because generated feature subsets frequently include unimportant features. In addition, only a few feature subsets compared to the search space are considered, yielding feature subsets with low multilabel learning accuracy. In this study, we propose an effective multilabel feature selection method based on a novel feature subset generation procedure. Experimental results demonstrate that the proposed method can identify better feature subsets than conventional methods.

Download Full-text