web documents Latest Research Papers

A Scalable Real Time Agent-Based Information Retrieval Engine

International Journal of Software Innovation ◽

10.4018/ijsi.292022 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Information Retrieval ◽

Web Documents ◽

Distributed Information ◽

Agent Based ◽

Retrieval Systems ◽

Main Challenge ◽

Combination Scheme ◽

Information Retrieval Systems ◽

Information Stream ◽

Retrieval Engine

In distributed information retrieval systems, information in web should be ranked based on a combination of multiple features. Linear combination of ranks has been the dominant approach due to its simplicity and efficiency. Such a combination scheme in distributed infrastructure requires that ranks in resources or agents are comparable to each other. The main challenge is how to transform the raw rank values of different criteria appropriately to make them comparable before any combination. In this manuscript, we will demonstrate how to rank Web documents based on its resource-provided information stream and how to combine and incorporate several raking schemas in one time. The system was tested on the queries provided by a Text Retrieval Conference (TREC), and our experimental results showed that it is robust and efficient compared with similar platforms that used offline data resources.

An Algorithm of Scene Information Collection in General Football Matches Based on Web Documents

Security and Communication Networks ◽

10.1155/2021/5801631 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Bin Li ◽

Ting Zhang

Keyword(s):

Semantic Analysis ◽

Computational Cost ◽

Web Crawler ◽

Web Documents ◽

Web Document ◽

Football Game ◽

Average Accuracy ◽

Local Contribution ◽

Scene Information ◽

The Web

In order to obtain the scene information of the ordinary football game more comprehensively, an algorithm of collecting the scene information of the ordinary football game based on web documents is proposed. The commonly used T-graph web crawler model is used to collect the sample nodes of a specific topic in the football game scene information and then collect the edge document information of the football game scene information topic after the crawling stage of the web crawler. Using the feature item extraction algorithm of semantic analysis, according to the similarity of the feature items, the feature items of the football game scene information are extracted to form a web document. By constructing a complex network and introducing the local contribution and overlap coefficient of the community discovery feature selection algorithm, the features of the web document are selected to realize the collection of football game scene information. Experimental results show that the algorithm has high topic collection capabilities and low computational cost, the average accuracy of equilibrium is always around 98%, and it has strong quantification capabilities for web crawlers and communities.

Semantically Time Tracking of Events from Web Documents

10.1145/3470482.3479627 ◽

2021 ◽

Author(s):

Welton Santos ◽

Elverton Fazzion ◽

Diego Dias ◽

Marcelo Guimarães ◽

Elisa Tuler ◽

...

Keyword(s):

Web Documents

A New Approach of Intelligent Data Retrieval Paradigm

Artificial Intelligence Advances ◽

10.30564/aia.v3i2.3219 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Falah Al-akashi ◽

Diana Inkpen

Keyword(s):

Data Retrieval ◽

Main Question ◽

Retrieval Performance ◽

Web Documents ◽

New Approach ◽

Distributed Information ◽

Retrieval Systems ◽

Main Challenge ◽

Combination Scheme ◽

Information Retrieval Systems

What is a real time agent, how does it remedy ongoing daily frustrations for users, and how does it improve the retrieval performance in World Wide Web? These are the main question we focus on this manuscript. In many distributed information retrieval systems, information in agents should be ranked based on a combination of multiple criteria. Linear combination of ranks has been the dominant approach due to its simplicity and effectiveness. Such a combination scheme in distributed infrastructure requires that the ranks in resources or agents are comparable to each other before combined. The main challenge is transforming the raw rank values of different criteria appropriately to make them comparable before any combination. Different ways for ranking agents make this strategy difficult. In this research, we will demonstrate how to rank Web documents based on resource-provided information how to combine several resources raking schemas in one time. The proposed system was implemented specifically in data provided by agents to create a comparable combination for different attributes. The proposed approach was tested on the queries provided by Text Retrieval Conference (TREC). Experimental results showed that our approach is effective and robust compared with offline search platforms.

Web document classification using topic modeling based document ranking

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i3.pp2386-2392 ◽

2021 ◽

Vol 11 (3) ◽

pp. 2386

Author(s):

Youngseok Lee ◽

Jungwon Cho

Keyword(s):

Topic Modeling ◽

High Speed ◽

Data Retrieval ◽

Web Pages ◽

Web Documents ◽

Document Ranking ◽

Web Document ◽

New Information ◽

Data Update ◽

Ranking Technique

In this paper, we propose a web document ranking method using topic modeling for effective information collection and classification. The proposed method is applied to the document ranking technique to avoid duplicated crawling when crawling at high speed. Through the proposed document ranking technique, it is feasible to remove redundant documents, classify the documents efficiently, and confirm that the crawler service is running. The proposed method enables rapid collection of many web documents; the user can search the web pages with constant data update efficiently. In addition, the efficiency of data retrieval can be improved because new information can be automatically classified and transmitted. By expanding the scope of the method to big data based web pages and improving it for application to various websites, it is expected that more effective information retrieval will be possible.

Automatic classification of the emotional content of web documents

10.32920/ryerson.14653809.v1 ◽

2021 ◽

Author(s):

Alaa Hussainalsaid

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Automatic Classification ◽

Emotional Content ◽

Web Pages ◽

Web Documents ◽

N Gram

This thesis proposes automatic classification of the emotional content of web documents using Natural Language Processing (NLP) algorithms. We used online articles and general documents to verify the performance of the algorithm, such as general web pages and news articles. The experiments used sentiment analysis that extracts sentiment of web documents. We used unigram and bigram approaches that are known as special types of N-gram, where N=1 and N=2, respectively. The unigram model analyses the probability to hit each word in the corpus independently; however, the bigram model analyses the probability of a word occurring depending on the previous word. Our results show that the unigram model has a better performance compared to the bigram model in terms of automatic classification of the emotional content of web documents.

Automatic classification of the emotional content of web documents

10.32920/ryerson.14653809 ◽

2021 ◽

Author(s):

Alaa Hussainalsaid

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Automatic Classification ◽

Emotional Content ◽

Web Pages ◽

Web Documents ◽

N Gram

This thesis proposes automatic classification of the emotional content of web documents using Natural Language Processing (NLP) algorithms. We used online articles and general documents to verify the performance of the algorithm, such as general web pages and news articles. The experiments used sentiment analysis that extracts sentiment of web documents. We used unigram and bigram approaches that are known as special types of N-gram, where N=1 and N=2, respectively. The unigram model analyses the probability to hit each word in the corpus independently; however, the bigram model analyses the probability of a word occurring depending on the previous word. Our results show that the unigram model has a better performance compared to the bigram model in terms of automatic classification of the emotional content of web documents.

Exploiting stance hierarchies for cost-sensitive stance detection of Web documents

Journal of Intelligent Information Systems ◽

10.1007/s10844-021-00642-z ◽

2021 ◽

Author(s):

Arjun Roy ◽

Pavlos Fafalios ◽

Asif Ekbal ◽

Xiaofei Zhu ◽

Stefan Dietze

Keyword(s):

Web Documents

WDPMA

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2021040101 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-24

Author(s):

Santosh Kumar ◽

Ravi Kumar

Keyword(s):

Business Intelligence ◽

Information Source ◽

Relevant Information ◽

The Internet ◽

Web Pages ◽

Web Content ◽

Web Documents ◽

Website Development ◽

Novel Approach ◽

Off Time

The internet is very huge in size and increasing exponentially. Finding any relevant information from such a huge information source is now becoming very difficult. Millions of web pages are returned in response to a user's ordinary query. Displaying these web pages without ranking makes it very challenging for the user to find the relevant results of a query. This paper has proposed a novel approach that utilizes web content, usage, and structure data to prioritize web documents. The proposed approach has applications in several major areas like web personalization, adaptive website development, recommendation systems, search engine optimization, business intelligence solutions, etc. Further, the proposed approach has been compared experimentally by other approaches, WDPGA, WDPSA, and WDPII, and it has been observed that with a little trade off time, it has an edge over these approaches.

Critical Thinking: Tools for Internet Information Evaluation

IASL Annual Conference Proceedings ◽

10.29173/iasl8180 ◽

2021 ◽

pp. 39-51

Author(s):

Mary Ann Fitzgerald

Keyword(s):

Elementary School ◽

Qualitative Study ◽

World Wide ◽

Web Documents ◽

Think Aloud Protocols ◽

Web Information ◽

Internet Information ◽

Adult World ◽

K 12 ◽

Thinking Tools

This qualitative study describes strategies employed by sophisticated adult World Wide Web users as they evaluate authentic Web information with the purpose of adapting these strategies for children in K-12 settings. The participants in this study followed think-aloud protocols and answered interview questions about two Web documents containing numerous misinformation devices. Evaluative strategies from these verbalizations were extracted and analyzed. Findings include a list of strategies and a description of three evaluative “styles.” Finally, suggestions for the use and teaching of these strategies in elementary school through middle school are made.

web documents
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Scalable Real Time Agent-Based Information Retrieval Engine

An Algorithm of Scene Information Collection in General Football Matches Based on Web Documents

Semantically Time Tracking of Events from Web Documents

A New Approach of Intelligent Data Retrieval Paradigm

Web document classification using topic modeling based document ranking

Automatic classification of the emotional content of web documents

Automatic classification of the emotional content of web documents

Exploiting stance hierarchies for cost-sensitive stance detection of Web documents

WDPMA

Critical Thinking: Tools for Internet Information Evaluation

Export Citation Format

web documentsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Scalable Real Time Agent-Based Information Retrieval Engine

An Algorithm of Scene Information Collection in General Football Matches Based on Web Documents

Semantically Time Tracking of Events from Web Documents

A New Approach of Intelligent Data Retrieval Paradigm

Web document classification using topic modeling based document ranking

Automatic classification of the emotional content of web documents

Automatic classification of the emotional content of web documents

Exploiting stance hierarchies for cost-sensitive stance detection of Web documents

WDPMA

Critical Thinking: Tools for Internet Information Evaluation

web documents
Recently Published Documents