Faster Compressed Suffix Trees for Repetitive Text Collections

Highly-available models and IPv4 have garnered improbable interest from both statisticians and experts in the last several years. Here, we show the emulation of suffix trees. We motivate an algorithm for suffix trees, which we use to demonstrate that e-business and replication can interact to solve this challenge.

Download Full-text

Clone detection using rolling hashing, suffix trees and dagification: A case study

2012 6th International Workshop on Software Clones (IWSC) ◽

10.1109/iwsc.2012.6227862 ◽

2012 ◽

Cited By ~ 3

Author(s):

Mikkel Jonsson Thomsen ◽

Fritz Henglein

Keyword(s):

Clone Detection ◽

Suffix Trees

Download Full-text

Accessor Variety Criteria for Chinese Word Extraction

Computational Linguistics ◽

10.1162/089120104773633394 ◽

2004 ◽

Vol 30 (1) ◽

pp. 75-93 ◽

Cited By ~ 55

Author(s):

Haodi Feng ◽

Kang Chen ◽

Xiaotie Deng ◽

Weimin Zheng

Keyword(s):

Iterative Methods ◽

Chinese Text ◽

Simple Rule ◽

Chinese Characters ◽

Chinese Word ◽

Text Collections ◽

Large Corpus

We are interested in the problem of word extraction from Chinese text collections. We define a word to be a meaningful string composed of several Chinese characters. For example, ‘percent’, and, ‘more and more’, are not recognized as traditional Chinese words from the viewpoint of some people. However, in our work, they are words because they are very widely used and have specific meanings. We start with the viewpoint that a word is a distinguished linguistic entity that can be used in many different language environments. We consider the characters that are directly before a string (predecessors) and the characters that are directly after a string (successors) as important factors for determining the independence of the string. We call such characters accessors of the string, consider the number of distinct predecessors and successors of a string in a large corpus (TREC 5 and TREC 6 documents), and use them as the measurement of the context independency of a string from the rest of the sentences in the document. Our experiments confirm our hypothesis and show that this simple rule gives quite good results for Chinese word extraction and is comparable to, and for long words outperforms, other iterative methods.

Download Full-text