LEADS-FRAG: A Benchmark Data Set for Assessment of Fragment Docking Performance

This article develops a new approach for supervised dimensionality reduction. This approach considers both global and local structures of a labelled data set and maximizes a new objective that includes the effects from both of them. The objective can be approximately optimized by solving an eigenvalue problem. The approach is evaluated based on a few benchmark data sets and image databases. Its performance is also compared with a few other existing approaches for dimensionality reduction. Testing results show that, on average, this new approach can achieve more accurate results for dimensionality reduction than existing approaches.

Download Full-text

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Information Discovery and Delivery ◽

10.1108/idd-09-2018-0045 ◽

2019 ◽

Vol 47 (3) ◽

pp. 154-170

Author(s):

Janani Balakumar ◽

S. Vijayarani Mohan

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Classification ◽

Support Vector ◽

Data Sets ◽

Selection Algorithm ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Bee Colony

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

A Benchmark Data Set for Model-Based Glycemic Control in Critical Care

Journal of Diabetes Science and Technology ◽

10.1177/193229680800200409 ◽

2008 ◽

Vol 2 (4) ◽

pp. 584-594 ◽

Cited By ~ 21

Author(s):

J. Geoffrey Chase ◽

Aaron LeCompte ◽

Geoffrey M. Shaw ◽

Amy Blakemore ◽

Jason Wong ◽

...

Keyword(s):

Critical Care ◽

Glycemic Control ◽

Data Set ◽

Benchmark Data ◽

Model Based

Download Full-text

BENCHMARK DATA SET FOR EVALUATION OF LINE BALANCING ALGORITHMS

IFAC Proceedings Volumes ◽

10.3182/20070523-3-es-4907.00009 ◽

2007 ◽

Vol 40 (2) ◽

pp. 48-53 ◽

Cited By ~ 2

Author(s):

Seamus M. McGovern ◽

Surendra M. Gupta

Keyword(s):

Line Balancing ◽

Data Set ◽

Benchmark Data

Download Full-text

A PROPOSED MAX-PRODUCT THRESHOLD UNIT FOR CLASSIFICATION OF PATTERN VECTORS

International Journal of Neural Systems ◽

10.1142/s0129065701000710 ◽

2001 ◽

Vol 11 (03) ◽

pp. 271-279 ◽

Cited By ~ 1

Author(s):

ROELOF K BROUWER

Keyword(s):

Weight Vector ◽

Data Set ◽

Benchmark Data ◽

Cervical Cell ◽

Credit Data ◽

Cell Data ◽

Pattern Vector ◽

Iris Data

This paper proposes a max-product threshold unit (maptu) that can successfully perform dichotomous classifications of pattern vectors. Maptu, with weight vector, w, classifies a pattern vector, x, by comparing x max-prod w to 0.5. Results obtained by other methods in classification of benchmark data are used for comparison to the method using maptu. The benchmark data consists of the Australian credit data set, cervical cell data set, diabetes data set and the iris data set.

Download Full-text

Learning to rank with click-through features in a reinforcement learning framework

International Journal of Web Information Systems ◽

10.1108/ijwis-12-2015-0046 ◽

2016 ◽

Vol 12 (4) ◽

pp. 448-476 ◽

Cited By ~ 2

Author(s):

Amir Hosein Keyhanipour ◽

Behzad Moshiri ◽

Maryam Piroozmand ◽

Farhad Oroumchian ◽

Ali Moeini

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Training Data ◽

High Dimensionality ◽

Compact Representation ◽

Second Phase ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Benchmark Data

Purpose Learning to rank algorithms inherently faces many challenges. The most important challenges could be listed as high-dimensionality of the training data, the dynamic nature of Web information resources and lack of click-through data. High dimensionality of the training data affects effectiveness and efficiency of learning algorithms. Besides, most of learning to rank benchmark datasets do not include click-through data as a very rich source of information about the search behavior of users while dealing with the ranked lists of search results. To deal with these limitations, this paper aims to introduce a novel learning to rank algorithm by using a set of complex click-through features in a reinforcement learning (RL) model. These features are calculated from the existing click-through information in the data set or even from data sets without any explicit click-through information. Design/methodology/approach The proposed ranking algorithm (QRC-Rank) applies RL techniques on a set of calculated click-through features. QRC-Rank is as a two-steps process. In the first step, Transformation phase, a compact benchmark data set is created which contains a set of click-through features. These feature are calculated from the original click-through information available in the data set and constitute a compact representation of click-through information. To find most effective click-through feature, a number of scenarios are investigated. The second phase is Model-Generation, in which a RL model is built to rank the documents. This model is created by applying temporal difference learning methods such as Q-Learning and SARSA. Findings The proposed learning to rank method, QRC-rank, is evaluated on WCL2R and LETOR4.0 data sets. Experimental results demonstrate that QRC-Rank outperforms the state-of-the-art learning to rank methods such as SVMRank, RankBoost, ListNet and AdaRank based on the precision and normalized discount cumulative gain evaluation criteria. The use of the click-through features calculated from the training data set is a major contributor to the performance of the system. Originality/value In this paper, we have demonstrated the viability of the proposed features that provide a compact representation for the click through data in a learning to rank application. These compact click-through features are calculated from the original features of the learning to rank benchmark data set. In addition, a Markov Decision Process model is proposed for the learning to rank problem using RL, including the sets of states, actions, rewarding strategy and the transition function.

Download Full-text

Benchmark Data Set for in Silico Prediction of Ames Mutagenicity

Journal of Chemical Information and Modeling ◽

10.1021/ci900161g ◽

2009 ◽

Vol 49 (9) ◽

pp. 2077-2081 ◽

Cited By ~ 159

Author(s):

Katja Hansen ◽

Sebastian Mika ◽

Timon Schroeter ◽

Andreas Sutter ◽

Antonius ter Laak ◽

...

Keyword(s):

In Silico ◽

In Silico Prediction ◽

Data Set ◽

Benchmark Data

Download Full-text

Regularization Destriping of Remote Sensing Imagery

10.5194/npg-2016-74 ◽

2016 ◽

Author(s):

Ranil Basnayake ◽

Erik Bollt ◽

Nicholas Tufillaro ◽

Jie Sun ◽

Michelle Gierach

Keyword(s):

Infrared Imaging ◽

Lagrange Equations ◽

Data Set ◽

Imaging Spectrometer ◽

Remote Imaging ◽

Benchmark Data ◽

Data Gaps ◽

Technical Details ◽

Data Fidelity ◽

Explicit Finite Difference

Abstract. We illustrate the utility of variational destriping for ocean color images from both mulitspectral and hyperspectral sensors. In particular, we examine data from a filter spectrometer, the Visible Infrared Imaging Radiometer Suite (VIIRS) on the Suomi National Polar Partnership (NPP) orbiter, and an airborne grating spectrometer, the Jet Population Laboratory's (JPL) hyperspectral Portable Remote Imaging Spectrometer (PRISM) sensor. We solve the destriping problem using a variational regularization method by giving weights spatially to preserve the other features of the image during the destriping process. The target functional penalizes `the neighborhood of stripes' (strictly, directionally uniform features) while promoting data fidelity, and the functional is minimized by solving the Euler-Lagrange equations with an explicit finite difference scheme. We show the accuracy of our method from a benchmark data set which represents the Sea Surface Temperature off the Coast of Oregon, USA. Technical details, such as how to impose continuity across data gaps using inpainting, are also described.

Download Full-text