Focused Crawling Using Latent Semantic Indexing – An Application for Vertical Search Engines

Enhanced Latent Semantic Indexing Using Cosine Similarity Measures for Medical Application

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/5/7 ◽

2020 ◽

Vol 17 (5) ◽

pp. 742-749

Author(s):

Fawaz Al-Anzi ◽

Dia AbuZeina

Keyword(s):

Language Processing ◽

Search Engines ◽

Dimensional Space ◽

Similarity Measures ◽

Medical Application ◽

Latent Semantic Indexing ◽

Arabic Language ◽

Cosine Similarity ◽

Semantic Indexing ◽

Cosine Similarity Measures

The Vector Space Model (VSM) is widely used in data mining and Information Retrieval (IR) systems as a common document representation model. However, there are some challenges to this technique such as high dimensional space and semantic looseness of the representation. Consequently, the Latent Semantic Indexing (LSI) was suggested to reduce the feature dimensions and to generate semantic rich features that can represent conceptual term-document associations. In fact, LSI has been effectively employed in search engines and many other Natural Language Processing (NLP) applications. Researchers thereby promote endless effort seeking for better performance. In this paper, we propose an innovative method that can be used in search engines to find better matched contents of the retrieving documents. The proposed method introduces a new extension for the LSI technique based on the cosine similarity measures. The performance evaluation was carried out using an Arabic language data collection that contains 800 medical related documents, with more than 47,222 unique words. The proposed method was assessed using a small testing set that contains five medical keywords. The results show that the performance of the proposed method is superior when compared to the standard LSI

Download Full-text

Combining text and link analysis for focused crawling—An application for vertical search engines

Information Systems ◽

10.1016/j.is.2006.09.004 ◽

2007 ◽

Vol 32 (6) ◽

pp. 886-908 ◽

Cited By ~ 51

Author(s):

G. Almpanidis ◽

C. Kotropoulos ◽

I. Pitas

Keyword(s):

Search Engines ◽

Link Analysis ◽

Focused Crawling ◽

Vertical Search

Download Full-text

Latent semantic indexing (LSI) and its impact on copywriting

Комунікації та комунікативні технології ◽

10.15421/291901 ◽

2019 ◽

pp. 4-12

Author(s):

N. Blynova

Keyword(s):

Search Engines ◽

Ordinary Language ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

New Paradigm ◽

Keywords And Phrases ◽

The Media ◽

Internet Environment ◽

The Difference ◽

Key Phrases

Latent semantic indexing (LSI) is becoming more and more popular in copywriting, gradually replacing texts written on the principles of SEO. LSI was called in the 2010s, when popular search engines switched to a qualitatively new way of ranking materials and sites. The difference between SEO and LSI ways of creation lies in the fact that search engines rank SEO materials by keywords, while LSI are ranked how fully the topic is covered and how useful the article will be to the reader. Consequently, in addition to keywords and phrases, the associative core is involved here. Materials written for people have replaced the texts created for the search engine. The article describes the algorithm for creation of the associative and thematic core, the ways in which this can be done. The basic steps helping to create an LSI text are also shown.The author underlines that due to the specificity of the presentation of a significant amount of information and the maximum expertise in the disclosure of the topic, text writers accustomed to working on the principles of SEO have to learn to write within a new paradigm. The owners of the websites that host articles created by LSI principles have discovered the advantages of this way of presenting information, since their resources have become better indexed and take the leading positions in search results. Such algorithms as “Baden-Baden”, “Korolev” and “Panda” have positively influenced the Internet environment as a whole, since re-optimized texts, which were filled with keys and were of little use to the reader, now have turned out to be on the last positions of issue. The new method of ranking according to the LSI method allows specialists to create the texts that are not only useful and expert but also differ in lexical richness, using expressive and figurative means of the language, which could not be assumed in SEO materials.It is highlighted in the article the use of neural networks should bring the way of presenting information to the consumer’s needs even more, inventing techniques that will allow leading materials created in an ordinary language to lead the positions without the need to incorporate key phrases into the text. We believe that the LSI-method, which has perfectly manifested itself in copywriting, is capable of unlocking the potential of the media texts, which are now being written on the principles of SEO.

Download Full-text

Perencanaan Search Engine E-commerce dengan Metode Latent Semantic Indexing Berbasis Multiplatform

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2017.v08.i01.p04 ◽

2017 ◽

pp. 31

Author(s):

Ni Made Ari Lestari ◽

Made Sudarma

Keyword(s):

Search Engine ◽

Search Engines ◽

Word Processing ◽

Latent Semantic Indexing ◽

Electronic Data Interchange ◽

Levenshtein Distance ◽

Electronic Systems ◽

Semantic Indexing ◽

Automated Data Collection ◽

Data Interchange

E-commerce is a sale and purchase transactions that occur through electronic systems such as the Internet, WWW, or other computer networks. E-commerce involves electronic data interchange and automated data collection systems. In all e-commerce search engine provided a column for the search items desired by the user. In e-commerce such as Tokopedia, Lazada, MatahariMall, Amazon, and other search engines that provided just use a regular search engine technology. In the usual search engines getting longer sentences from the input or output of goods search results will be more extensive and more. However, by utilizing the semantic indexing technology, the longer and clear input desired goods, the number of searches will be few and accurately in accordance with the input that helps the user in decision making. In this study discussed how to build a search engine on the web e-commerce by using Latent Semantic Indexing. The first starts from the use of Text Mining methods for word processing, and the method Levenshtein Distance to repair automatic word and the last Latent Semantic Indexing for information processing and input expenditure.

Download Full-text

A NEW APPROACH TOWARDS VERTICAL SEARCH ENGINES - Intelligent Focused Crawling and Multilingual Semantic Techniques

Proceedings of the 6th International Conference on Web Information Systems and Technology ◽

10.5220/0002777901810186 ◽

2010 ◽

Keyword(s):

Search Engines ◽

Focused Crawling ◽

New Approach ◽

Vertical Search ◽

Semantic Techniques

Download Full-text

Support vector machine for customized email filtering based on improving latent semantic indexing

2005 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2005.1527599 ◽

2005 ◽

Cited By ~ 1

Author(s):

Qing Yang ◽

Fang-Min Li

Keyword(s):

Support Vector Machine ◽

Latent Semantic Indexing ◽

Support Vector ◽

Semantic Indexing ◽

Email Filtering

Download Full-text

Analyzing Large-Scale Proteomics Projects with Latent Semantic Indexing

Journal of Proteome Research ◽

10.1021/pr070461k ◽

2008 ◽

Vol 7 (1) ◽

pp. 182-191 ◽

Cited By ~ 33

Author(s):

Sebastian Klie ◽

Lennart Martens ◽

Juan Antonio Vizcaíno ◽

Richard Côté ◽

Phil Jones ◽

...

Keyword(s):

Large Scale ◽

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text

A Method Based on Support Vector Machine for Feature Selection of Latent Semantic Features

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.181-182.830 ◽

2011 ◽

Vol 181-182 ◽

pp. 830-835

Author(s):

Min Song Li

Keyword(s):

Support Vector Machine ◽

Text Categorization ◽

Latent Semantic Indexing ◽

Classification Performance ◽

Compact Representation ◽

Support Vector ◽

Semantic Features ◽

Semantic Indexing ◽

Feature Extraction Method ◽

Feature Subspace

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.

Download Full-text

Adaptive label-driven scaling for latent semantic indexing

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08 ◽

10.1145/1390334.1390525 ◽

2008 ◽

Cited By ~ 1

Author(s):

Xiaojun Quan ◽

Enhong Chen ◽

Qiming Luo ◽

Hui Xiong

Keyword(s):

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2021100109 ◽

2021 ◽

Vol 12 (4) ◽

pp. 169-185

Author(s):

Saida Ishak Boushaki ◽

Omar Bendjeghaba ◽

Nadjet Kamel

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Latent Semantic Indexing ◽

Research Area ◽

Semantic Indexing ◽

Local Optima ◽

Symbiotic Organisms Search ◽

Symbiotic Organisms

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

Download Full-text