UCrawler: A learning-based web crawler using a URL knowledge base

2018 ◽

Vol 14 (2) ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

B Sathiya ◽

T.V. Geetha

Keyword(s):

Knowledge Base ◽

Web Pages ◽

Knowledge Sources ◽

Statistical Measure ◽

Ontology Learning ◽

Web Page ◽

Semantic Query ◽

Probabilistic Knowledge ◽

Discovery Algorithms ◽

Different Sources

The prime textual sources used for ontology learning are a domain corpus and dynamic large text from web pages. The first source is limited and possibly outdated, while the second is uncertain. To overcome these shortcomings, a novel ontology learning methodology is proposed to utilize the different sources of text such as a corpus, web pages and the massive probabilistic knowledge base, Probase, for an effective automated construction of ontology. Specifically, to discover taxonomical relations among the concept of the ontology, a new web page based two-level semantic query formation methodology using the lexical syntactic patterns (LSP) and a novel scoring measure: Fitness built on Probase are proposed. Also, a syntactic and statistical measure called COS (Co-occurrence Strength) scoring, and Domain and Range-NTRD (Non-Taxonomical Relation Discovery) algorithms are proposed to accurately identify non-taxonomical relations(NTR) among concepts, using evidence from the corpus and web pages.

Download Full-text

Intelligent systems using Web-pages as knowledge base for statistical decision making

New Generation Computing ◽

10.1007/bf03037241 ◽

1999 ◽

Vol 17 (4) ◽

pp. 349-358 ◽

Cited By ~ 7

Author(s):

Kazunori Fujimoto ◽

Kazumitsu Matsuzawa

Keyword(s):

Decision Making ◽

Knowledge Base ◽

Intelligent Systems ◽

Web Pages ◽

Statistical Decision ◽

Statistical Decision Making

Download Full-text

Constructing Personal Knowledge Base: Automatic Key-Phrase Extraction from Multiple-Domain Web Pages

New Frontiers in Applied Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-28320-8_6 ◽

2012 ◽

pp. 65-76 ◽

Cited By ~ 1

Author(s):

Yin-Fu Huang ◽

Cin-Siang Ciou

Keyword(s):

Knowledge Base ◽

Personal Knowledge ◽

Web Pages ◽

Phrase Extraction ◽

Key Phrase Extraction ◽

Multiple Domain

Download Full-text

Search Engine

The Dark Web ◽

10.4018/978-1-5225-3163-0.ch016 ◽

2018 ◽

pp. 359-374

Author(s):

Dilip Kumar Sharma ◽

A. K. Sharma

Keyword(s):

Computer Networks ◽

Search Engines ◽

Web Search ◽

Relevant Information ◽

Vital Role ◽

Deep Web ◽

Telecommunication Networks ◽

Web Pages ◽

Web Crawler ◽

Main Components

ICT plays a vital role in human development through information extraction and includes computer networks and telecommunication networks. One of the important modules of ICT is computer networks, which are the backbone of the World Wide Web (WWW). Search engines are computer programs that browse and extract information from the WWW in a systematic and automatic manner. This paper examines the three main components of search engines: Extractor, a web crawler which starts with a URL; Analyzer, an indexer that processes words on the web page and stores the resulting index in a database; and Interface Generator, a query handler that understands the need and preferences of the user. This paper concentrates on the information available on the surface web through general web pages and the hidden information behind the query interface, called deep web. This paper emphasizes the Extraction of relevant information to generate the preferred content for the user as the first result of his or her search query. This paper discusses the aspect of deep web with analysis of a few existing deep web search engines.

Download Full-text

Search Engine

International Journal of Information Communication Technologies and Human Development ◽

10.4018/ijicthd.2011040103 ◽

2011 ◽

Vol 3 (2) ◽

pp. 38-51 ◽

Cited By ~ 6

Author(s):

Dilip Kumar Sharma ◽

A. K. Sharma

Keyword(s):

Computer Networks ◽

Search Engines ◽

Web Search ◽

Relevant Information ◽

Vital Role ◽

Deep Web ◽

Telecommunication Networks ◽

Web Pages ◽

Web Crawler ◽

Main Components

ICT plays a vital role in human development through information extraction and includes computer networks and telecommunication networks. One of the important modules of ICT is computer networks, which are the backbone of the World Wide Web (WWW). Search engines are computer programs that browse and extract information from the WWW in a systematic and automatic manner. This paper examines the three main components of search engines: Extractor, a web crawler which starts with a URL; Analyzer, an indexer that processes words on the web page and stores the resulting index in a database; and Interface Generator, a query handler that understands the need and preferences of the user. This paper concentrates on the information available on the surface web through general web pages and the hidden information behind the query interface, called deep web. This paper emphasizes the Extraction of relevant information to generate the preferred content for the user as the first result of his or her search query. This paper discusses the aspect of deep web with analysis of a few existing deep web search engines.

Download Full-text

Applying Artificial Intelligence to Short-Term Traffic Flow Forecasting

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.5214 ◽

2012 ◽

Vol 433-440 ◽

pp. 5214-5217

Author(s):

Hai Huang

Keyword(s):

Neural Network ◽

Traffic Flow ◽

Optimization Algorithm ◽

Fuzzy Neural Network ◽

Pso Algorithm ◽

Local Optimum ◽

Short Term ◽

Traffic Flow Forecasting ◽

Fuzzy Neural ◽

Low Efficiency

Short-term traffic flow forecasting has a high requirement for the responding time and accuracy of the forecasting method because the result is directly used for instant traffic inducing. Based on the introduction of the fuzzy neural network model for short-term traffic flow forecasting together with its detailed procedures, this paper adopt the particle swarm optimization algorithm to train the fuzzy neural network. Its global searching and optimization algorithm helps to overcome the shortcomings of the traditional fuzzy neural network, such as its low efficiency and “local optimum”. A case study is also given for the PSO algorithm to train the fuzzy neural network for traffic flow forecasting. The result shows that the average square error is 0.932 when the PSO algorithm is put to use for the network training, which is 3.926 when the PSO is not used. Thus result is more accurate and it requires less time for the training procedures. It proves this method is feasible and efficient.

Download Full-text

BIM-BASED RISK IDENTIFICATION SYSTEM IN TUNNEL CONSTRUCTION

Journal of Civil Engineering and Management ◽

10.3846/13923730.2015.1023348 ◽

2016 ◽

Vol 22 (4) ◽

pp. 529-539 ◽

Cited By ~ 36

Author(s):

Limao ZHANG ◽

Xianguo WU ◽

Lieyun DING ◽

Miroslaw J. SKIBNIEWSKI ◽

Yujie LU

Keyword(s):

Knowledge Base ◽

Risk Identification ◽

Information Modeling ◽

Rule Base ◽

Identification System ◽

Tunnel Construction ◽

Case Based Reasoning ◽

Metro Station ◽

Domain Experts ◽

Low Efficiency

This paper presents an innovative approach of integrating Building Information Modeling (BIM) and expert systems to address deficiencies in traditional safety risk identification process in tunnel construction. A BIM-based Risk Identification Expert System (B-RIES) composed of three main built-in subsystems: BIM extraction, knowledge base management, and risk identification subsystems, is proposed. The engineering parameter information related to risk factors is first extracted from BIM of a specific project where the Industry Foundation Classes (IFC) standard plays a bridge role between the BIM data and tunnel construction safety risks. An integrated knowledge base, consisting of fact base, rule base and case base, is then established to systematize the fragmented explicit and tacit knowledge. Finally, a hybrid inference approach, with case-based reasoning and rule-based reasoning combined, is developed to improve the flexibility and comprehensiveness of the system reasoning capacity. B-RIES is used to overcome low-efficiency in traditional information extraction, reduce the dependence on domain experts, and facilitate knowledge sharing and communication among dispersed clients and domain experts. The identification of a safety hazard regarding the water gushing in one metro station of China is presented in a case study. The results demonstrate the feasibility of B-RIES and its application effectiveness.

Download Full-text

BUILDING A KNOWLEDGE BASE FOR IMPLEMENTING A WEB-BASED COMPUTERIZED RECOMMENDATION SYSTEM

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213007003552 ◽

2007 ◽

Vol 16 (05) ◽

pp. 793-828 ◽

Cited By ~ 10

Author(s):

JUAN D. VELÁSQUEZ ◽

VASILE PALADE

Keyword(s):

Knowledge Base ◽

Web Site ◽

Web Mining ◽

Recommendation System ◽

The Internet ◽

Web Pages ◽

Web Based ◽

Web Logs ◽

Mining Tools ◽

The Web

Understanding the web user browsing behaviour in order to adapt a web site to the needs of a particular user represents a key issue for many commercial companies that do their business over the Internet. This paper presents the implementation of a Knowledge Base (KB) for building web-based computerized recommender systems. The Knowledge Base consists of a Pattern Repository that contains patterns extracted from web logs and web pages, by applying various web mining tools, and a Rule Repository containing rules that describe the use of discovered patterns for building navigation or web site modification recommendations. The paper also focuses on testing the effectiveness of the proposed online and offline recommendations. An ample real-world experiment is carried out on a web site of a bank.

Download Full-text

An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm

Cybernetics and Information Technologies ◽

10.2478/cait-2021-0022 ◽

2021 ◽

Vol 21 (2) ◽

pp. 105-120

Author(s):

K. S. Sakunthala Prabha ◽

C. Mahesh ◽

S. P. Raja

Keyword(s):

Semantic Similarity ◽

Similarity Measure ◽

String Matching ◽

Similarity Score ◽

Web Page ◽

Web Crawler ◽

Matching Algorithm ◽

Relevance Score ◽

Focused Crawlers ◽

The Web

Abstract Topic precise crawler is a special purpose web crawler, which downloads appropriate web pages analogous to a particular topic by measuring cosine similarity or semantic similarity score. The cosine based similarity measure displays inaccurate relevance score, if topic term does not directly occur in the web page. The semantic-based similarity measure provides the precise relevance score, even if the synonyms of the given topic occur in the web page. The unavailability of the topic in the ontology produces inaccurate relevance score by the semantic focused crawlers. This paper overcomes these glitches with a hybrid string-matching algorithm by combining the semantic similarity-based measure with the probabilistic similarity-based measure. The experimental results revealed that this algorithm increased the efficiency of the focused web crawlers and achieved better Harvest Rate (HR), Precision (P) and Irrelevance Ratio (IR) than the existing web focused crawlers achieve.

Download Full-text

An Incremental Acquisition Method for Web Forensics

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2021110116 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1-13

Author(s):

Guangxuan Chen ◽

Guangxiao Chen ◽

Lei Zhang ◽

Qiang Liu

Keyword(s):

Real Time ◽

Digital Forensics ◽

Recall Rate ◽

Web Pages ◽

Web Page ◽

Content Extraction ◽

Data Redundancy ◽

Repeated Acquisition ◽

Low Efficiency ◽

Acquisition Method

In order to solve the problems of repeated acquisition, data redundancy and low efficiency in the process of website forensics, this paper proposes an incremental acquisition method orientecd to dynamic websites. This method realized the incremental collection on dynamically updated websites through acquiring and parsing web pages, URL deduplication, web page denoising, web page content extraction and hashing. Experiments show that the algorithm has relative high acquisition precision and recall rate, and can be combined with other data to perform effective digital forensics on dynamically updated real-time websites.

Download Full-text

UCrawler: A learning-based web crawler using a URL knowledge base

Automatic Ontology Learning from Multiple Knowledge Sources of Text

Intelligent systems using Web-pages as knowledge base for statistical decision making

Constructing Personal Knowledge Base: Automatic Key-Phrase Extraction from Multiple-Domain Web Pages

Search Engine

Search Engine

Applying Artificial Intelligence to Short-Term Traffic Flow Forecasting

BIM-BASED RISK IDENTIFICATION SYSTEM IN TUNNEL CONSTRUCTION

BUILDING A KNOWLEDGE BASE FOR IMPLEMENTING A WEB-BASED COMPUTERIZED RECOMMENDATION SYSTEM

An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm

An Incremental Acquisition Method for Web Forensics

Export Citation Format