A Novel Biased Diversity Ranking Model for Query-Oriented Multi-Document Summarization

2013 ◽  
Vol 380-384 ◽  
pp. 2811-2816
Author(s):  
Kai Lei ◽  
Yi Fan Zeng

Query-oriented multi-document summarization (QMDS) attempts to generate a concise piece of text byextracting sentences from a target document collection, with the aim of not only conveying the key content of that corpus, also, satisfying the information needs expressed by that query. Due to its great applicable value, QMDS has been intensively studied in recent decades. Three properties are supposed crucial for a good summary, i.e., relevance, prestige and low redundancy (orso-called diversity). Unfortunately, most existing work either disregarded the concern of diversity, or handled it with non-optimized heuristics, usually based on greedy sentences election. Inspired by the manifold-ranking process, which deals with query-biased prestige, and DivRank algorithm which captures query-independent diversity ranking, in this paper, we propose a novel biased diversity ranking model, named ManifoldDivRank, for query-sensitive summarization tasks. The top-ranked sentences discovered by our algorithm not only enjoy query-oriented high prestige, more importantly, they are dissimilar with each other. Experimental results on DUC2005and DUC2006 benchmark data sets demonstrate the effectiveness of our proposal.

2014 ◽  
Vol 574 ◽  
pp. 728-733
Author(s):  
Shu Xia Lu ◽  
Cai Hong Jiao ◽  
Le Tong ◽  
Yang Fan Zhou

Core Vector Machine (CVM) can be used to deal with large data sets by find minimum enclosing ball (MEB), but one drawback is that CVM is very sensitive to the outliers. To tackle this problem, we propose a novel Position Regularized Core Vector Machine (PCVM).In the proposed PCVM, the data points are regularized by assigning a position-based weighting. Experimental results on several benchmark data sets show that the performance of PCVM is much better than CVM.


Author(s):  
NEERAJ SAHU ◽  
D. S. RAJPUT ◽  
R. S. THAKUR ◽  
G. S. THAKUR

This paper presents Clustering Based Document classification and analysis of data. The proposed Clustering Based classification and analysis of data approach is based on Unsupervised and Supervised Document Classification. In this paper Unsupervised Document and Supervised Document Classification are used. In this approach Document collection, Text Preprocessing, Feature Selection, Indexing, Clustering Process and Results Analysis steps are used. Twenty News group data sets [20] are used in the Experiments. For experimental results analysis evaluated using the Analytical SAS 9.0 Software is used. The Experimental Results show the proposed approach out performs.


2011 ◽  
Vol 40 ◽  
pp. 469-521 ◽  
Author(s):  
A. Rahman ◽  
V. Ng

Traditional learning-based coreference resolvers operate by training the mention-pair model for determining whether two mentions are coreferent or not. Though conceptually simple and easy to understand, the mention-pair model is linguistically rather unappealing and lags far behind the heuristic-based coreference models proposed in the pre-statistical NLP era in terms of sophistication. Two independent lines of recent research have attempted to improve the mention-pair model, one by acquiring the mention-ranking model to rank preceding mentions for a given anaphor, and the other by training the entity-mention model to determine whether a preceding cluster is coreferent with a given mention. We propose a cluster-ranking approach to coreference resolution, which combines the strengths of the mention-ranking model and the entity-mention model, and is therefore theoretically more appealing than both of these models. In addition, we seek to improve cluster rankers via two extensions: (1) lexicalization and (2) incorporating knowledge of anaphoricity by jointly modeling anaphoricity determination and coreference resolution. Experimental results on the ACE data sets demonstrate the superior performance of cluster rankers to competing approaches as well as the effectiveness of our two extensions.


2014 ◽  
Vol 644-650 ◽  
pp. 2009-2012 ◽  
Author(s):  
Hai Tao Zhang ◽  
Bin Jun Wang

In order to solve the low efficiency problem of KNN or K-Means like algorithms in classification, a novel extension distance of interval is proposed to measure the similarity between testing data and the class domain. The method constructs representatives for data points in shorter time than traditional methods which replace original dataset to serve as the basis of classification. Virtually, the construction of the model containing representatives makes classification faster. Experimental results from two benchmark data sets, verify the effectiveness and applicability of the proposed work. The model based method using extension distance can effectively build data models to represent whole training data, and thus a high cost of classifying new instances problem is solved.


2013 ◽  
Vol 2013 ◽  
pp. 1-5
Author(s):  
Huawen Liu ◽  
Zhonglong Zheng ◽  
Jianmin Zhao ◽  
Ronghua Ye

Multilabel learning is now receiving an increasing attention from a variety of domains and many learning algorithms have been witnessed. Similarly, the multilabel learning may also suffer from the problems of high dimensionality, and little attention has been paid to this issue. In this paper, we propose a new ensemble learning algorithms for multilabel data. The main characteristic of our method is that it exploits the features with local discriminative capabilities for each label to serve the purpose of classification. Specifically, for each label, the discriminative capabilities of features on positive and negative data are estimated, and then the top features with the highest capabilities are obtained. Finally, a binary classifier for each label is constructed on the top features. Experimental results on the benchmark data sets show that the proposed method outperforms four popular and previously published multilabel learning algorithms.


2012 ◽  
Vol 490-495 ◽  
pp. 1372-1376
Author(s):  
Qing Feng Liu

The fuzzy C-means algorithm is an iterative algorithm in which the desired number of clusters C and the initial clustering seeds has to be pre-defined. The seeds are modified in each stage of the algorithm and for each object a degree of membership to each of the clusters is estimated. In this paper, an extensional clustering algorithm of FCM based on an intuitionistic extension index, denoted E-FCM algorithm, is proposed. For comparing the performance of the above mentioned two algorithms, the experimental results of three benchmark data sets show that the E-FCM algorithm outperforms the FCM algorithm.


2018 ◽  
Vol 30 (5) ◽  
pp. 1426-1447 ◽  
Author(s):  
Lingling Zhang ◽  
Jun Liu ◽  
Minnan Luo ◽  
Xiaojun Chang ◽  
Qinghua Zheng

Due to the difficulty of collecting labeled images for hundreds of thousands of visual categories, zero-shot learning, where unseen categories do not have any labeled images in training stage, has attracted more attention. In the past, many studies focused on transferring knowledge from seen to unseen categories by projecting all category labels into a semantic space. However, the label embeddings could not adequately express the semantics of categories. Furthermore, the common semantics of seen and unseen instances cannot be captured accurately because the distribution of these instances may be quite different. For these issues, we propose a novel deep semisupervised method by jointly considering the heterogeneity gap between different modalities and the correlation among unimodal instances. This method replaces the original labels with the corresponding textual descriptions to better capture the category semantics. This method also overcomes the problem of distribution difference by minimizing the maximum mean discrepancy between seen and unseen instance distributions. Extensive experimental results on two benchmark data sets, CU200-Birds and Oxford Flowers-102, indicate that our method achieves significant improvements over previous methods.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


2021 ◽  
Vol 17 (3) ◽  
pp. 1548-1561
Author(s):  
Kristian Kříž ◽  
Martin Nováček ◽  
Jan Řezáč

Sign in / Sign up

Export Citation Format

Share Document