Research and Application of Automated Search Engine Based on Machine Learning

Author(s):  
Jinghang Fan ◽  
Xu Gao ◽  
Teng Wang ◽  
Ruiying Liu ◽  
Yuxue Yang
2007 ◽  
Vol 30 ◽  
pp. 181-212 ◽  
Author(s):  
S. P. Ponzetto ◽  
M. Strube

Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.


2011 ◽  
Vol 460-461 ◽  
pp. 747-753
Author(s):  
Ying Shi Kang ◽  
Hai Ning Wang

With the rapid development of internet technology, focusing on the product design of individual users, emphasizing the interaction design for Web and improving the user experience have become an inevitable trend of Web design, and also the hot spot of the design of personalized search engine. This paper proposed an optimized algorithm for building user models for product design websites. In order to show the design dimensions of Web pages presented by a browser, a concept of freshness is presented in this algorithm. By analyzing the user behavior of browsing Web pages, the model was updated using methods of machine learning. At last, the performance and effectiveness of this algorithm was analyzed and estimated through the simulation experiment.


TEM Journal ◽  
2021 ◽  
pp. 1377-1384
Author(s):  
Dominika Krasňanská ◽  
Silvia Komara ◽  
Mária Vojtková

Keyword analysis is a way to gain insight into market behaviour. It is a detailed analysis of words and phrases that are relevant to the selected area. Keyword analysis should be the first step in any search engine optimization, as it reveals what keywords users enter into search engines when searching the Internet. The keyword categorization process takes up almost half of the total analysis time, as it is not automated. There is currently no known tool in the online advertising market that facilitates keyword categorization. The main goal of this paper is to streamline the process of keyword analysis using selected statistical methods of machine learning applied in the categorization of a specific example.


2021 ◽  
Author(s):  
Felipe Cujar-Rosero ◽  
David Santiago Pinchao Ortiz ◽  
Silvio Ricardo Timaran Pereira ◽  
Jimmy Mateo Guerrero Restrepo

This paper presents the final results of the research project that aimed to build a Semantic Search Engine that uses an Ontology and a model trained with Machine Learning to support the semantic search of research projects of the System of Research from the University of Nariño. For the construction of FENIX, as this Engine is called, it was used a methodology that includes the stages: appropriation of knowledge, installation and configuration of tools, libraries and technologies, collection, extraction and preparation of research projects, design and development of the Semantic Search Engine. The main results of the work were three: a) the complete construction of the Ontology with classes, object properties (predicates), data properties (attributes) and individuals (instances) in Protegé, SPARQL queries with Apache Jena Fuseki and the respective coding with Owlready2 using Jupyter Notebook with Python within the virtual environment of anaconda; b) the successful training of the model for which Machine Learning algorithms and specifically Natural Language Processing algorithms were used such as: SpaCy, NLTK, Word2vec and Doc2vec, this was also done in Jupyter Notebook with Python within the virtual environment of anaconda and with Elasticsearch; and c) the creation of FENIX managing and unifying the queries for the Ontology and for the Machine Learning model. The tests showed that FENIX was successful in all the searches that were carried out because its results were satisfactory.


2011 ◽  
Vol 50-51 ◽  
pp. 644-648
Author(s):  
Xiao Qing Zhou ◽  
Xiao Ping Tang

The traditional search engine is unable to correct search for the magnanimous information in Deep Web hides. The Web database's classification is the key step which integrates with the Web database classification and retrieves. This article has proposed one kind of classification based on machine learning's web database. The experiment has indicated that after this taxonomic approach undergoes few sample training, it can achieve the very good classified effect, and along with training sample's increase, this classifier's performance maintains stable and the rate of accuracy and the recalling rate fluctuate in the very small scope.


Healthcare ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 156
Author(s):  
Abdullah Bin Shams ◽  
Ehsanul Hoque Apu ◽  
Ashiqur Rahman ◽  
Md. Mohsin Sarker Raihan ◽  
Nazeeba Siddika ◽  
...  

Misinformation such as on coronavirus disease 2019 (COVID-19) drugs, vaccination or presentation of its treatment from untrusted sources have shown dramatic consequences on public health. Authorities have deployed several surveillance tools to detect and slow down the rapid misinformation spread online. Large quantities of unverified information are available online and at present there is no real-time tool available to alert a user about false information during online health inquiries over a web search engine. To bridge this gap, we propose a web search engine misinformation notifier extension (SEMiNExt). Natural language processing (NLP) and machine learning algorithm have been successfully integrated into the extension. This enables SEMiNExt to read the user query from the search bar, classify the veracity of the query and notify the authenticity of the query to the user, all in real-time to prevent the spread of misinformation. Our results show that SEMiNExt under artificial neural network (ANN) works best with an accuracy of 93%, F1-score of 92%, precision of 92% and a recall of 93% when 80% of the data is trained. Moreover, ANN is able to predict with a very high accuracy even for a small training data size. This is very important for an early detection of new misinformation from a small data sample available online that can significantly reduce the spread of misinformation and maximize public health safety. The SEMiNExt approach has introduced the possibility to improve online health management system by showing misinformation notifications in real-time, enabling safer web-based searching on health-related issues.


Author(s):  
Siji Jose Pulluparambil ◽  
Subrahmanya Bhat

Purpose: Google Search is currently the most preferred search engine worldwide, making it one of the websites with the highest traffic. It assists people in discovering the content they are searching for, from the large repository of the World Wide Web. Google has grown to be the best in the search engine market that it is the single most important variable to be considered when optimizing a website for search. There are many ranking algorithms used by Google to make the searching process more precise. Google has the vision “to provide access to the world's information in one click”. Machine learning is the most popular methodology applied in predicting future outcomes or organizing information to assist people in making required decisions.ML algorithms are trained over instances or examples through which they analyze the historical data available and learn from past experiences. By repeatedly training over the samples, the patterns in the data can be identified in order to make predictions about the future. Google, as an organization, can be a pioneer in ML, and as a technology product, can be a use case for machine learning. Here, a case analysis has been prepared on few applications of machine learning in the products and services of Google. Within this paper, we highlight their technological history, services with machine learning applications, financial plans, and challenges. The paper also tries to examine the various products of Google which apply ML, such as Google Maps, Gmail, Google Photos, Google Assistant, and review the algorithms used in each service. Approach: The detailed survey method on secondary data is used for analysing the data. Findings: Based on the developed case study, it is clearly evident that Google is using machine learning algorithms with few artificial intelligence features to enhance the quality of the services they provide. Originality: A new way of analysis was performed to identify the methods used in the organization’s services. Paper Type: Descriptive Case Study Research


Sign in / Sign up

Export Citation Format

Share Document