web documents
Recently Published Documents


TOTAL DOCUMENTS

732
(FIVE YEARS 71)

H-INDEX

28
(FIVE YEARS 3)

2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

In distributed information retrieval systems, information in web should be ranked based on a combination of multiple features. Linear combination of ranks has been the dominant approach due to its simplicity and efficiency. Such a combination scheme in distributed infrastructure requires that ranks in resources or agents are comparable to each other. The main challenge is how to transform the raw rank values of different criteria appropriately to make them comparable before any combination. In this manuscript, we will demonstrate how to rank Web documents based on its resource-provided information stream and how to combine and incorporate several raking schemas in one time. The system was tested on the queries provided by a Text Retrieval Conference (TREC), and our experimental results showed that it is robust and efficient compared with similar platforms that used offline data resources.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Bin Li ◽  
Ting Zhang

In order to obtain the scene information of the ordinary football game more comprehensively, an algorithm of collecting the scene information of the ordinary football game based on web documents is proposed. The commonly used T-graph web crawler model is used to collect the sample nodes of a specific topic in the football game scene information and then collect the edge document information of the football game scene information topic after the crawling stage of the web crawler. Using the feature item extraction algorithm of semantic analysis, according to the similarity of the feature items, the feature items of the football game scene information are extracted to form a web document. By constructing a complex network and introducing the local contribution and overlap coefficient of the community discovery feature selection algorithm, the features of the web document are selected to realize the collection of football game scene information. Experimental results show that the algorithm has high topic collection capabilities and low computational cost, the average accuracy of equilibrium is always around 98%, and it has strong quantification capabilities for web crawlers and communities.


2021 ◽  
Author(s):  
Welton Santos ◽  
Elverton Fazzion ◽  
Diego Dias ◽  
Marcelo Guimarães ◽  
Elisa Tuler ◽  
...  
Keyword(s):  

2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Falah Al-akashi ◽  
Diana Inkpen

What is a real time agent, how does it remedy ongoing daily frustrations for users, and how does it improve the retrieval performance in World Wide Web? These are the main question we focus on this manuscript. In many distributed information retrieval systems, information in agents should be ranked based on a combination of multiple criteria. Linear combination of ranks has been the dominant approach due to its simplicity and effectiveness. Such a combination scheme in distributed infrastructure requires that the ranks in resources or agents are comparable to each other before combined. The main challenge is transforming the raw rank values of different criteria appropriately to make them comparable before any combination. Different ways for ranking agents make this strategy difficult. In this research, we will demonstrate how to rank Web documents based on resource-provided information how to combine several resources raking schemas in one time. The proposed system was implemented specifically in data provided by agents to create a comparable combination for different attributes. The proposed approach was tested on the queries provided by Text Retrieval Conference (TREC). Experimental results showed that our approach is effective and robust compared with offline search platforms.


Author(s):  
Youngseok Lee ◽  
Jungwon Cho

In this paper, we propose a web document ranking method using topic modeling for effective information collection and classification. The proposed method is applied to the document ranking technique to avoid duplicated crawling when crawling at high speed. Through the proposed document ranking technique, it is feasible to remove redundant documents, classify the documents efficiently, and confirm that the crawler service is running. The proposed method enables rapid collection of many web documents; the user can search the web pages with constant data update efficiently. In addition, the efficiency of data retrieval can be improved because new information can be automatically classified and transmitted. By expanding the scope of the method to big data based web pages and improving it for application to various websites, it is expected that more effective information retrieval will be possible.


2021 ◽  
Author(s):  
Alaa Hussainalsaid

This thesis proposes automatic classification of the emotional content of web documents using Natural Language Processing (NLP) algorithms. We used online articles and general documents to verify the performance of the algorithm, such as general web pages and news articles. The experiments used sentiment analysis that extracts sentiment of web documents. We used unigram and bigram approaches that are known as special types of N-gram, where N=1 and N=2, respectively. The unigram model analyses the probability to hit each word in the corpus independently; however, the bigram model analyses the probability of a word occurring depending on the previous word. Our results show that the unigram model has a better performance compared to the bigram model in terms of automatic classification of the emotional content of web documents.


2021 ◽  
Author(s):  
Alaa Hussainalsaid

This thesis proposes automatic classification of the emotional content of web documents using Natural Language Processing (NLP) algorithms. We used online articles and general documents to verify the performance of the algorithm, such as general web pages and news articles. The experiments used sentiment analysis that extracts sentiment of web documents. We used unigram and bigram approaches that are known as special types of N-gram, where N=1 and N=2, respectively. The unigram model analyses the probability to hit each word in the corpus independently; however, the bigram model analyses the probability of a word occurring depending on the previous word. Our results show that the unigram model has a better performance compared to the bigram model in terms of automatic classification of the emotional content of web documents.


Author(s):  
Arjun Roy ◽  
Pavlos Fafalios ◽  
Asif Ekbal ◽  
Xiaofei Zhu ◽  
Stefan Dietze
Keyword(s):  

Author(s):  
Santosh Kumar ◽  
Ravi Kumar

The internet is very huge in size and increasing exponentially. Finding any relevant information from such a huge information source is now becoming very difficult. Millions of web pages are returned in response to a user's ordinary query. Displaying these web pages without ranking makes it very challenging for the user to find the relevant results of a query. This paper has proposed a novel approach that utilizes web content, usage, and structure data to prioritize web documents. The proposed approach has applications in several major areas like web personalization, adaptive website development, recommendation systems, search engine optimization, business intelligence solutions, etc. Further, the proposed approach has been compared experimentally by other approaches, WDPGA, WDPSA, and WDPII, and it has been observed that with a little trade off time, it has an edge over these approaches.


2021 ◽  
pp. 39-51
Author(s):  
Mary Ann Fitzgerald

This qualitative study describes strategies employed by sophisticated adult World Wide Web users as they evaluate authentic Web information with the purpose of adapting these strategies for children in K-12 settings. The participants in this study followed think-aloud protocols and answered interview questions about two Web documents containing numerous misinformation devices. Evaluative strategies from these verbalizations were extracted and analyzed. Findings include a list of strategies and a description of three evaluative “styles.” Finally, suggestions for the use and teaching of these strategies in elementary school through middle school are made.


Sign in / Sign up

Export Citation Format

Share Document