Research the Key Technologies of the Mongolian Full-Text Retrieval Based on Lucene

2013 ◽  
Vol 347-350 ◽  
pp. 2185-2190
Author(s):  
Guo Qiang Ding ◽  
Min Lin

Under the premise of in-depth understanding of Lucene full-text retrieval technology, this paper will apply it to the Mongolian text search. First, several key issues are proposed which are need to be addressed in achieving the Mongolian text search technology, and give the corresponding solutions to achieve the Mongolian full-text retrieval in Lucene. Second, this paper provides a fast, accurate and comprehensive Mongolian information full-text search service, played a key role in promoting the development of the Mongolian search engine.

2011 ◽  
Vol 135-136 ◽  
pp. 369-374
Author(s):  
Yang Sen Zhang ◽  
Gai Juan Huang

In this paper, we have designed and realized a efficient full-text retrieval system for the basic annotation People's Daily Corpus based on the inverted index technology. According to the characteristics of the basic annotation People’s Daily Corpus data, we have analyzed the methods and strategies of system implementing thoroughly. On the basis of comparing the various schemes, we have put forward to the three levels index structure of Chinese character, word and address set, and given the design approach of each level index dictionary structure. After converting the unstructured People’s Daily corpus into index structured data, we realized the full-text search algorithm correspond to the proposed index structure. Experimental results show that the proposed search algorithm has achieved the target of "ten millions Chinese characters, response in a second", improved the speed of the People's Daily Corpus full-text search.


2014 ◽  
Vol 952 ◽  
pp. 355-358 ◽  
Author(s):  
Chun Feng Xu ◽  
Hong Hua Xu

As an important branch of modern information retrieval technology, full-text search is not only an important tool for dealing with unstructured data, but also one of the mainstream technology of search engines .This paper starts from studying the working principles and process of search engine model in deep and discuss Lucene's architecture with previously knowledge. The main emphasis is placed on the problem of some basic algorithms of Chinese word segmentation and Relevance Ranking. Finally, we set up a system based on Lucene of full-text retrieval by applying these technologies.


2012 ◽  
Vol 02 (04) ◽  
pp. 106-109 ◽  
Author(s):  
Rujia Gao ◽  
Danying Li ◽  
Wanlong Li ◽  
Yaze Dong

Author(s):  
Mary Holstege

To a search engine, indexes are specified by the content: the words, phrases, and characters that are actually present tell the search engine what inverted indexes to create. Other external knowledge can be applied add to this inventory of indexes. For example, knowledge of the document language can lead to indexes for word stems or decompounding. These can unify different content into the same index or split the same content into multiple indexes. That is, different words manifest in the content can be unified under a single search key, and the same word can have multiple manifestations under different search keys. Turning this around, the indexes represent the retrievable information content in the document. Full text search is not an either/or yes/no system, but one of relative fit (scoring). Precision balances against recall, mediated by scoring. The search engine perspective offers a different way to think about markup: As a specification of the retrievable information content of the document. As something that can, with additional information, unify different markup or provide multiple distinct views of the same markup. As something that can be present to greater or lesser degrees, with a goodness of match (scoring). As a specification that can be adjusted to balance precision and recall. What does this search engine perspective on markup mean, concretely? Can we use it to reframe some persistent conundrums, such as vocabulary resolution and overlap? Let's see.


Author(s):  
Cornelia Gyorodi ◽  
Robert Gyorodi ◽  
George Pecherle ◽  
George Mihai Cornea

In this article we will try to explain how we can create a search engine using the powerful MySQL full-text search. The ever increasing demands of the web requires cheap and elaborate search options. One of the most important issues for a search engine is to have the capacity to order its results set as relevance and provide the user with suggestions in the case of a spelling mistake or a small result set. In order to fulfill this request we thought about using the powerful MySQL full-text search. This option is suitable for small to medium scale websites. In order to provide sound like capabilities, a second table containing a bag of words from the main table together with the corresponding metaphone is created. When a suggestion is needed, this table is interrogated for the metaphone of the searched word and the result set is computed resulting a suggestion.


Author(s):  
Pavel Šimek ◽  
Jiří Vaněk ◽  
Jan Jarolímek

The majority of Internet users use the global network to search for different information using fulltext search engines such as Google, Yahoo!, or Seznam. The web presentation operators are trying, with the help of different optimization techniques, to get to the top places in the results of fulltext search engines. Right there is a great importance of Search Engine Optimization and Search Engine Marketing, because normal users usually try links only on the first few pages of the fulltext search engines results on certain keywords and in catalogs they use primarily hierarchically higher placed links in each category. Key to success is the application of optimization methods which deal with the issue of keywords, structure and quality of content, domain names, individual sites and quantity and reliability of backward links. The process is demanding, long-lasting and without a guaranteed outcome. A website operator without advanced analytical tools do not identify the contribution of individual documents from which the entire web site consists. If the web presentation operators want to have an overview of their documents and web site in global, it is appropriate to quantify these positions in a specific way, depending on specific key words. For this purpose serves the quantification of competitive value of documents, which consequently sets global competitive value of a web site. Quantification of competitive values is performed on a specific full-text search engine. For each full-text search engine can be and often are, different results. According to published reports of ClickZ agency or Market Share is according to the number of searches by English-speaking users most widely used Google search engine, which has a market share of more than 80%. The whole procedure of quantification of competitive values is common, however, the initial step which is the analysis of keywords depends on a choice of the fulltext search engine.


2011 ◽  
Vol 21 (2) ◽  
pp. 191-196
Author(s):  
Tatsuma KAWANAKA ◽  
WATAGAMI Yukiharu ◽  
Takehiko MURAKAWA ◽  
Masaru NAKAGAWA

Sign in / Sign up

Export Citation Format

Share Document