scholarly journals Sequence clustering and labeling for unsupervised query intent discovery

Author(s):  
Jackie Chi Kit Cheung ◽  
Xiao Li
2014 ◽  
Vol 36 (3) ◽  
pp. 636-642 ◽  
Author(s):  
Lu BAI ◽  
Jia-Feng GUO ◽  
Lei CAO ◽  
Xue-Qi CHENG

Author(s):  
Quan Zou ◽  
Gang Lin ◽  
Xingpeng Jiang ◽  
Xiangrong Liu ◽  
Xiangxiang Zeng

2016 ◽  
Vol 15 (4) ◽  
pp. 479-485 ◽  
Author(s):  
Amy Coward ◽  
Dervla T.D. Kenna ◽  
Claire Perry ◽  
Kate Martin ◽  
Michel Doumith ◽  
...  

Author(s):  
Ming Cao ◽  
Qinke Peng ◽  
Ze-Gang Wei ◽  
Fei Liu ◽  
Yi-Fan Hou

The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.


2009 ◽  
Vol 6 (7) ◽  
pp. 1368-1372 ◽  
Author(s):  
Khalid Jaber ◽  
Nur'Aini Abdul Ras ◽  
Rosni Abdullah

Author(s):  
N. Allott ◽  
P. Halstead ◽  
P. Fazackerley

Sign in / Sign up

Export Citation Format

Share Document