scholarly journals Faster Compressed Suffix Trees for Repetitive Text Collections

Author(s):  
Gonzalo Navarro ◽  
Alberto Ordóñez
Author(s):  
Peter Organisciak ◽  
Grace Therrell ◽  
Maggie Ryan ◽  
Benjamin MacDonald Schmidt
Keyword(s):  

Author(s):  
Luke Gallagher ◽  
Antonio Mallia ◽  
J. Shane Culpepper ◽  
Torsten Suel ◽  
B. Barla Cambazoglu

2013 ◽  
Vol 462-463 ◽  
pp. 243-246
Author(s):  
Chang Guang Shi
Keyword(s):  

Highly-available models and IPv4 have garnered improbable interest from both statisticians and experts in the last several years. Here, we show the emulation of suffix trees. We motivate an algorithm for suffix trees, which we use to demonstrate that e-business and replication can interact to solve this challenge.


2004 ◽  
Vol 30 (1) ◽  
pp. 75-93 ◽  
Author(s):  
Haodi Feng ◽  
Kang Chen ◽  
Xiaotie Deng ◽  
Weimin Zheng

We are interested in the problem of word extraction from Chinese text collections. We define a word to be a meaningful string composed of several Chinese characters. For example, ‘percent’, and, ‘more and more’, are not recognized as traditional Chinese words from the viewpoint of some people. However, in our work, they are words because they are very widely used and have specific meanings. We start with the viewpoint that a word is a distinguished linguistic entity that can be used in many different language environments. We consider the characters that are directly before a string (predecessors) and the characters that are directly after a string (successors) as important factors for determining the independence of the string. We call such characters accessors of the string, consider the number of distinct predecessors and successors of a string in a large corpus (TREC 5 and TREC 6 documents), and use them as the measurement of the context independency of a string from the rest of the sentences in the document. Our experiments confirm our hypothesis and show that this simple rule gives quite good results for Chinese word extraction and is comparable to, and for long words outperforms, other iterative methods.


Sign in / Sign up

Export Citation Format

Share Document