A New Approach for Supervised Dimensionality Reduction

This article develops a new approach for supervised dimensionality reduction. This approach considers both global and local structures of a labelled data set and maximizes a new objective that includes the effects from both of them. The objective can be approximately optimized by solving an eigenvalue problem. The approach is evaluated based on a few benchmark data sets and image databases. Its performance is also compared with a few other existing approaches for dimensionality reduction. Testing results show that, on average, this new approach can achieve more accurate results for dimensionality reduction than existing approaches.

Download Full-text

A new approach to the fuzzy c-means clustering algorithm by automatic weights and local clustering

10.24271/psr.18 ◽

2021 ◽

Vol 3 (1) ◽

pp. 1-7

Author(s):

Yadgar Sirwan Abdulrahman

Keyword(s):

Clustering Algorithm ◽

Similarity Criterion ◽

Real Data ◽

Well Being ◽

Classical Solutions ◽

Data Sets ◽

Data Set ◽

New Approach ◽

Fuzzy C Means Clustering ◽

Global And Local

Clustering is one of the essential strategies in data analysis. In classical solutions, all features are assumed to contribute equally to the data clustering. Of course, some features are more important than others in real data sets. As a result, essential features will have a more significant impact on identifying optimal clusters than other features. In this article, a fuzzy clustering algorithm with local automatic weighting is presented. The proposed algorithm has many advantages such as: 1) the weights perform features locally, meaning that each cluster's weight is different from the rest. 2) calculating the distance between the samples using a non-euclidian similarity criterion to reduce the noise effect. 3) the weight of the features is obtained comparatively during the learning process. In this study, mathematical analyzes were done to obtain the clustering centers well-being and the features' weights. Experiments were done on the data set range to represent the progressive algorithm's efficiency compared to other proposed algorithms with global and local features

Download Full-text

A Visual and VAE Based Hierarchical Indoor Localization Method

Sensors ◽

10.3390/s21103406 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3406

Author(s):

Jie Jiang ◽

Yin Zou ◽

Lidong Chen ◽

Yujie Fang

Keyword(s):

Image Retrieval ◽

Indoor Localization ◽

Data Sets ◽

Indoor Environments ◽

Global Features ◽

Data Set ◽

Data Annotation ◽

Wide Range ◽

Annotation Costs ◽

Global And Local

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Download Full-text

Global and Local Structures of Bifurcation Curves of ODE with Nonlinear Diffusion

International Journal of Differential Equations ◽

10.1155/2018/5053415 ◽

2018 ◽

Vol 2018 ◽

pp. 1-7

Author(s):

Tetsutaro Shibata

Keyword(s):

Mathematical Model ◽

Eigenvalue Problem ◽

Nonlinear Diffusion ◽

Nonlinear Term ◽

Nonlinear Eigenvalue Problem ◽

Maximum Norm ◽

Bifurcation Parameter ◽

Local Structures ◽

Global And Local ◽

The Mathematical Model

We consider the nonlinear eigenvalue problem Duu′′+λfu=0, u(t)>0, t∈I≔(0,1), u(0)=u(1)=0, where D(u)=uk, f(u)=u2n-k-1+sin⁡u, and λ>0 is a bifurcation parameter. Here, n∈N and k (0≤k<2n-1) are constants. This equation is related to the mathematical model of animal dispersal and invasion, and λ is parameterized by the maximum norm α=uλ∞ of the solution uλ associated with λ and is written as λ=λ(α). Since f(u) contains both power nonlinear term u2n-k-1 and oscillatory term sin⁡u, it seems interesting to investigate how the shape of λ(α) is affected by f(u). The purpose of this paper is to characterize the total shape of λ(α) by n and k. Precisely, we establish three types of shape of λ(α), which seem to be new.

Download Full-text

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Information Discovery and Delivery ◽

10.1108/idd-09-2018-0045 ◽

2019 ◽

Vol 47 (3) ◽

pp. 154-170

Author(s):

Janani Balakumar ◽

S. Vijayarani Mohan

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Classification ◽

Support Vector ◽

Data Sets ◽

Selection Algorithm ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Bee Colony

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

Weber's Ratio, Multidimensional Scaling and Incomplete Data Sets: New Light on an Old Problem

Proceedings of the Human Factors Society Annual Meeting ◽

10.1177/154193128803201713 ◽

1988 ◽

Vol 32 (17) ◽

pp. 1183-1187

Author(s):

J. G. Kreifeldt ◽

S. H. Levine ◽

M. C. Chuang

Keyword(s):

Multidimensional Scaling ◽

Incomplete Data ◽

Data Sets ◽

Observation Data ◽

Data Set ◽

New Approach ◽

Sensory Modalities ◽

Minimal Difference ◽

Typical Measurement ◽

Measurement Context

Sensory modalities exhibit a characteristic known as Weber's ratio which remarks that when two stimuli are compared for a difference: (1) there is some minimal nonzero difference which can be differentiated and (2) this minimal difference is a nearly constant proportion of the magnitude of the stimuli. Both of these would, in a typical measurement context, appear to be system defects. We have found through simulation explorations that in fact these are apparently the characteristics required by a system designed to extract an adequate amount of information from an incomplete observation data set according to a new approach to measurement.

Download Full-text

Learning to rank with click-through features in a reinforcement learning framework

International Journal of Web Information Systems ◽

10.1108/ijwis-12-2015-0046 ◽

2016 ◽

Vol 12 (4) ◽

pp. 448-476 ◽

Cited By ~ 2

Author(s):

Amir Hosein Keyhanipour ◽

Behzad Moshiri ◽

Maryam Piroozmand ◽

Farhad Oroumchian ◽

Ali Moeini

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Training Data ◽

High Dimensionality ◽

Compact Representation ◽

Second Phase ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Benchmark Data

Purpose Learning to rank algorithms inherently faces many challenges. The most important challenges could be listed as high-dimensionality of the training data, the dynamic nature of Web information resources and lack of click-through data. High dimensionality of the training data affects effectiveness and efficiency of learning algorithms. Besides, most of learning to rank benchmark datasets do not include click-through data as a very rich source of information about the search behavior of users while dealing with the ranked lists of search results. To deal with these limitations, this paper aims to introduce a novel learning to rank algorithm by using a set of complex click-through features in a reinforcement learning (RL) model. These features are calculated from the existing click-through information in the data set or even from data sets without any explicit click-through information. Design/methodology/approach The proposed ranking algorithm (QRC-Rank) applies RL techniques on a set of calculated click-through features. QRC-Rank is as a two-steps process. In the first step, Transformation phase, a compact benchmark data set is created which contains a set of click-through features. These feature are calculated from the original click-through information available in the data set and constitute a compact representation of click-through information. To find most effective click-through feature, a number of scenarios are investigated. The second phase is Model-Generation, in which a RL model is built to rank the documents. This model is created by applying temporal difference learning methods such as Q-Learning and SARSA. Findings The proposed learning to rank method, QRC-rank, is evaluated on WCL2R and LETOR4.0 data sets. Experimental results demonstrate that QRC-Rank outperforms the state-of-the-art learning to rank methods such as SVMRank, RankBoost, ListNet and AdaRank based on the precision and normalized discount cumulative gain evaluation criteria. The use of the click-through features calculated from the training data set is a major contributor to the performance of the system. Originality/value In this paper, we have demonstrated the viability of the proposed features that provide a compact representation for the click through data in a learning to rank application. These compact click-through features are calculated from the original features of the learning to rank benchmark data set. In addition, a Markov Decision Process model is proposed for the learning to rank problem using RL, including the sets of states, actions, rewarding strategy and the transition function.

Download Full-text

Critical evaluation of web-based prediction tools for human protein subcellular localization

Briefings in Bioinformatics ◽

10.1093/bib/bbz106 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1628-1640 ◽

Cited By ~ 9

Author(s):

Yinan Shen ◽

Yijie Ding ◽

Jijun Tang ◽

Quan Zou ◽

Fei Guo

Keyword(s):

Subcellular Localization ◽

Critical Evaluation ◽

Human Protein ◽

Data Sets ◽

Protein Subcellular Localization ◽

Data Set ◽

Prediction Tools ◽

Benchmark Data ◽

Subcellular Localization Prediction ◽

Localization Prediction

Abstract Human protein subcellular localization has an important research value in biological processes, also in elucidating protein functions and identifying drug targets. Over the past decade, a number of protein subcellular localization prediction tools have been designed and made freely available online. The purpose of this paper is to summarize the progress of research on the subcellular localization of human proteins in recent years, including commonly used data sets proposed by the predecessors and the performance of all selected prediction tools against the same benchmark data set. We carry out a systematic evaluation of several publicly available subcellular localization prediction methods on various benchmark data sets. Among them, we find that mLASSO-Hum and pLoc-mHum provide a statistically significant improvement in performance, as measured by the value of accuracy, relative to the other methods. Meanwhile, we build a new data set using the latest version of Uniprot database and construct a new GO-based prediction method HumLoc-LBCI in this paper. Then, we test all selected prediction tools on the new data set. Finally, we discuss the possible development directions of human protein subcellular localization. Availability: The codes and data are available from http://www.lbci.cn/syn/.

Download Full-text

Non-Covalent Interactions Atlas Benchmark Data Sets 3: Repulsive Contacts

10.26434/chemrxiv.13414931.v1 ◽

2020 ◽

Author(s):

Kristian Kříž ◽

Martin Nováček ◽

Jan Řezáč

Keyword(s):

Organic Molecules ◽

Energy Surface ◽

Molecular Complexes ◽

Data Sets ◽

Interaction Energies ◽

Data Set ◽

Benchmark Data ◽

Mechanical Methods ◽

Non Covalent Interactions ◽

Covalent Interactions

The new R739×5 data set from the Non-Covalent Interactions Atlas series (www.nciatlas.org) focuses on repulsive contacts in molecular complexes, covering organic molecules, sulfur, phosphorus, halogens and noble gases. Information on the repulsive parts of the potential energy surface is crucial for the development of robust empirically parametrized computational methods. We use the new data set of highly accurate CCSD(T)/CBS interaction energies to test existing DFT and semiempirical quantum-mechanical methods. On the example of the PM6 method, we analyze the source of the error and its relation to the difficulties in the description of conformational energies, and we also devise an immediately applicable correction that fixes the most serious uncorrected issues previously encountered in practical calculations.

Download Full-text

Queries related to COVID-19: a more effective retrieval through finetuned ALBERT with BM25L question answering system

World Journal of Engineering ◽

10.1108/wje-01-2021-0059 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Deepthi Godavarthi ◽

Mary Sowjanya A.

Keyword(s):

Question Answering ◽

Relevant Information ◽

Medical Community ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Open Research ◽

First Time ◽

Scientific Questions

Purpose The purpose of this paper is to build a better question answering (QA) system that can furnish more improved retrieval of answers related to COVID-19 queries from the COVID-19 open research data set (CORD-19). As CORD-19 has an up-to-date collection of coronavirus literature, text mining approaches can be successfully used to retrieve answers pertaining to all coronavirus-related questions. The existing a lite BERT for self-supervised learning of language representations (ALBERT) model is finetuned for retrieving all COVID relevant information to scientific questions posed by the medical community and to highlight the context related to the COVID-19 query. Design/methodology/approach This study presents a finetuned ALBERT-based QA system in association with Best Match25 (Okapi BM25) ranking function and its variant BM25L for context retrieval and provided high scores in benchmark data sets such as SQuAD for answers related to COVID-19 questions. In this context, this paper has built a QA system, pre-trained on SQuAD and finetuned it on CORD-19 data to retrieve answers related to COVID-19 questions by extracting semantically relevant information related to the question. Findings BM25L is found to be more effective in retrieval compared to Okapi BM25. Hence, finetuned ALBERT when extended to the CORD-19 data set provided accurate results. Originality/value The finetuned ALBERT QA system was developed and tested for the first time on the CORD-19 data set to extract context and highlight the span of the answer for more clarity to the user.

Download Full-text

A SURVEY ON THE CURES FOR THE CURSE OF DIMENSIONALITY IN BIG DATA

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19755 ◽

2017 ◽

Vol 10 (13) ◽

pp. 355 ◽

Cited By ~ 1

Author(s):

Reshma Remesh ◽

Pattabiraman. V

Keyword(s):

Dimensionality Reduction ◽

Input Data ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

High Dimensional ◽

Data Sets ◽

Learning Approaches ◽

Data Set ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques

Dimensionality reduction techniques are used to reduce the complexity for analysis of high dimensional data sets. The raw input data set may have large dimensions and it might consume time and lead to wrong predictions if unnecessary data attributes are been considered for analysis. So using dimensionality reduction techniques one can reduce the dimensions of input data towards accurate prediction with less cost. In this paper the different machine learning approaches used for dimensionality reductions such as PCA, SVD, LDA, Kernel Principal Component Analysis and Artificial Neural Network have been studied.

Download Full-text