Focused Crawling Using Latent Semantic Indexing – An Application for Vertical Search Engines

Author(s):  
George Almpanidis ◽  
Constantine Kotropoulos ◽  
Ioannis Pitas
2020 ◽  
Vol 17 (5) ◽  
pp. 742-749
Author(s):  
Fawaz Al-Anzi ◽  
Dia AbuZeina

The Vector Space Model (VSM) is widely used in data mining and Information Retrieval (IR) systems as a common document representation model. However, there are some challenges to this technique such as high dimensional space and semantic looseness of the representation. Consequently, the Latent Semantic Indexing (LSI) was suggested to reduce the feature dimensions and to generate semantic rich features that can represent conceptual term-document associations. In fact, LSI has been effectively employed in search engines and many other Natural Language Processing (NLP) applications. Researchers thereby promote endless effort seeking for better performance. In this paper, we propose an innovative method that can be used in search engines to find better matched contents of the retrieving documents. The proposed method introduces a new extension for the LSI technique based on the cosine similarity measures. The performance evaluation was carried out using an Arabic language data collection that contains 800 medical related documents, with more than 47,222 unique words. The proposed method was assessed using a small testing set that contains five medical keywords. The results show that the performance of the proposed method is superior when compared to the standard LSI


2007 ◽  
Vol 32 (6) ◽  
pp. 886-908 ◽  
Author(s):  
G. Almpanidis ◽  
C. Kotropoulos ◽  
I. Pitas

Author(s):  
N. Blynova

Latent semantic indexing (LSI) is becoming more and more popular in copywriting, gradually replacing texts written on the principles of SEO. LSI was called in the 2010s, when popular search engines switched to a qualitatively new way of ranking materials and sites. The difference between SEO and LSI ways of creation lies in the fact that search engines rank SEO materials by keywords, while LSI are ranked how fully the topic is covered and how useful the article will be to the reader. Consequently, in addition to keywords and phrases, the associative core is involved here. Materials written for people have replaced the texts created for the search engine. The article describes the algorithm for creation of the associative and thematic core, the ways in which this can be done. The basic steps helping to create an LSI text are also shown.The author underlines that due to the specificity of the presentation of a significant amount of information and the maximum expertise in the disclosure of the topic, text writers accustomed to working on the principles of SEO have to learn to write within a new paradigm. The owners of the websites that host articles created by LSI principles have discovered the advantages of this way of presenting information, since their resources have become better indexed and take the leading positions in search results. Such algorithms as “Baden-Baden”, “Korolev” and “Panda” have positively influenced the Internet environment as a whole, since re-optimized texts, which were filled with keys and were of little use to the reader, now have turned out to be on the last positions of issue. The new method of ranking according to the LSI method allows specialists to create the texts that are not only useful and expert but also differ in lexical richness, using expressive and figurative means of the language, which could not be assumed in SEO materials.It is highlighted in the article the use of neural networks should bring the way of presenting information to the consumer’s needs even more, inventing techniques that will allow leading materials created in an ordinary language to lead the positions without the need to incorporate key phrases into the text. We believe that the LSI-method, which has perfectly manifested itself in copywriting, is capable of unlocking the potential of the media texts, which are now being written on the principles of SEO.


Author(s):  
Ni Made Ari Lestari ◽  
Made Sudarma

E-commerce is a sale and purchase transactions that occur through electronic systems such as the Internet, WWW, or other computer networks. E-commerce involves electronic data interchange and automated data collection systems. In all e-commerce search engine provided a column for the search items desired by the user. In e-commerce such as Tokopedia, Lazada, MatahariMall, Amazon, and other search engines that provided just use a regular search engine technology. In the usual search engines getting longer sentences from the input or output of goods search results will be more extensive and more. However, by utilizing the semantic indexing technology, the longer and clear input desired goods, the number of searches will be few and accurately in accordance with the input that helps the user in decision making. In this study discussed how to build a search engine on the web e-commerce by using Latent Semantic Indexing. The first starts from the use of Text Mining methods for word processing, and the method Levenshtein Distance to repair automatic word and the last Latent Semantic Indexing for information processing and input expenditure.


2008 ◽  
Vol 7 (1) ◽  
pp. 182-191 ◽  
Author(s):  
Sebastian Klie ◽  
Lennart Martens ◽  
Juan Antonio Vizcaíno ◽  
Richard Côté ◽  
Phil Jones ◽  
...  

2011 ◽  
Vol 181-182 ◽  
pp. 830-835
Author(s):  
Min Song Li

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.


2021 ◽  
Vol 12 (4) ◽  
pp. 169-185
Author(s):  
Saida Ishak Boushaki ◽  
Omar Bendjeghaba ◽  
Nadjet Kamel

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.


Sign in / Sign up

Export Citation Format

Share Document