scholarly journals IHWC: intelligent hidden web crawler for harvesting data in urban domains

Author(s):  
Sawroop Kaur ◽  
Aman Singh ◽  
G. Geetha ◽  
Xiaochun Cheng

AbstractDue to the massive size of the hidden web, searching, retrieving and mining rich and high-quality data can be a daunting task. Moreover, with the presence of forms, data cannot be accessed easily. Forms are dynamic, heterogeneous and spread over trillions of web pages. Significant efforts have addressed the problem of tapping into the hidden web to integrate and mine rich data. Effective techniques, as well as application in special cases, are required to be explored to achieve an effective harvest rate. One such special area is atmospheric science, where hidden web crawling is least implemented, and crawler is required to crawl through the huge web to narrow down the search to specific data. In this study, an intelligent hidden web crawler for harvesting data in urban domains (IHWC) is implemented to address the relative problems such as classification of domains, prevention of exhaustive searching, and prioritizing the URLs. The crawler also performs well in curating pollution-related data. The crawler targets the relevant web pages and discards the irrelevant by implementing rejection rules. To achieve more accurate results for a focused crawl, ICHW crawls the websites on priority for a given topic. The crawler has fulfilled the dual objective of developing an effective hidden web crawler that can focus on diverse domains and to check its integration in searching pollution data in smart cities. One of the objectives of smart cities is to reduce pollution. Resultant crawled data can be used for finding the reason for pollution. The crawler can help the user to search the level of pollution in a specific area. The harvest rate of the crawler is compared with pioneer existing work. With an increase in the size of a dataset, the presented crawler can add significant value to emission accuracy. Our results are demonstrating the accuracy and harvest rate of the proposed framework, and it efficiently collect hidden web interfaces from large-scale sites and achieve higher rates than other crawlers.

2018 ◽  
Vol 7 (3) ◽  
pp. 1119
Author(s):  
Jyoti Mor ◽  
Dr Dinesh Rai ◽  
Dr Naresh Kumar

In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engine’s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XML based approach. This focused crawling based architecture is trained to download only the high quality data from the internet leaving behind the web pages which are not relevant to the desired domain. Here, a detailed layout of the proposed system is described which is capable of reducing the load on network and reducing the problem arise in residency of mobile agent at the remote server.  


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The WWW contains huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals / magazine), blogs etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the Hidden Web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, we present the architecture of a parallel crawler for the Hidden Web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel Hidden web crawler (PSHWC), not only effectively but also efficiently extracts and download the contents in the Hidden web databases


Electronics ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 218
Author(s):  
Ala’ Khalifeh ◽  
Khalid A. Darabkh ◽  
Ahmad M. Khasawneh ◽  
Issa Alqaisieh ◽  
Mohammad Salameh ◽  
...  

The advent of various wireless technologies has paved the way for the realization of new infrastructures and applications for smart cities. Wireless Sensor Networks (WSNs) are one of the most important among these technologies. WSNs are widely used in various applications in our daily lives. Due to their cost effectiveness and rapid deployment, WSNs can be used for securing smart cities by providing remote monitoring and sensing for many critical scenarios including hostile environments, battlefields, or areas subject to natural disasters such as earthquakes, volcano eruptions, and floods or to large-scale accidents such as nuclear plants explosions or chemical plumes. The purpose of this paper is to propose a new framework where WSNs are adopted for remote sensing and monitoring in smart city applications. We propose using Unmanned Aerial Vehicles to act as a data mule to offload the sensor nodes and transfer the monitoring data securely to the remote control center for further analysis and decision making. Furthermore, the paper provides insight about implementation challenges in the realization of the proposed framework. In addition, the paper provides an experimental evaluation of the proposed design in outdoor environments, in the presence of different types of obstacles, common to typical outdoor fields. The experimental evaluation revealed several inconsistencies between the performance metrics advertised in the hardware-specific data-sheets. In particular, we found mismatches between the advertised coverage distance and signal strength with our experimental measurements. Therefore, it is crucial that network designers and developers conduct field tests and device performance assessment before designing and implementing the WSN for application in a real field setting.


Author(s):  
Bassel Al Homssi ◽  
Akram Al-Hourani ◽  
Kagiso Magowe ◽  
James Delaney ◽  
Neil Tom ◽  
...  
Keyword(s):  

2020 ◽  
Vol 10 (1) ◽  
pp. 1-16
Author(s):  
Isaac Nyabisa Oteyo ◽  
Mary Esther Muyoka Toili

AbstractResearchers in bio-sciences are increasingly harnessing technology to improve processes that were traditionally pegged on pen-and-paper and highly manual. The pen-and-paper approach is used mainly to record and capture data from experiment sites. This method is typically slow and prone to errors. Also, bio-science research activities are often undertaken in remote and distributed locations. Timeliness and quality of data collected are essential. The manual method is slow to collect quality data and relay it in a timely manner. Capturing data manually and relaying it in real time is a daunting task. The data collected has to be associated to respective specimens (objects or plants). In this paper, we seek to improve specimen labelling and data collection guided by the following questions; (1) How can data collection in bio-science research be improved? (2) How can specimen labelling be improved in bio-science research activities? We present WebLog, an application that we prototyped to aid researchers generate specimen labels and collect data from experiment sites. We use the application to convert the object (specimen) identifiers into quick response (QR) codes and use them to label the specimens. Once a specimen label is successfully scanned, the application automatically invokes the data entry form. The collected data is immediately sent to the server in electronic form for analysis.


Smart Cities ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 662-685
Author(s):  
Stephan Olariu

Under present-day practices, the vehicles on our roadways and city streets are mere spectators that witness traffic-related events without being able to participate in the mitigation of their effect. This paper lays the theoretical foundations of a framework for harnessing the on-board computational resources in vehicles stuck in urban congestion in order to assist transportation agencies with preventing or dissipating congestion through large-scale signal re-timing. Our framework is called VACCS: Vehicular Crowdsourcing for Congestion Support in Smart Cities. What makes this framework unique is that we suggest that in such situations the vehicles have the potential to cooperate with various transportation authorities to solve problems that otherwise would either take an inordinate amount of time to solve or cannot be solved for lack for adequate municipal resources. VACCS offers direct benefits to both the driving public and the Smart City. By developing timing plans that respond to current traffic conditions, overall traffic flow will improve, carbon emissions will be reduced, and economic impacts of congestion on citizens and businesses will be lessened. It is expected that drivers will be willing to donate under-utilized on-board computing resources in their vehicles to develop improved signal timing plans in return for the direct benefits of time savings and reduced fuel consumption costs. VACCS allows the Smart City to dynamically respond to traffic conditions while simultaneously reducing investments in the computational resources that would be required for traditional adaptive traffic signal control systems.


2021 ◽  
Vol 13 (2) ◽  
pp. 176
Author(s):  
Peng Zheng ◽  
Zebin Wu ◽  
Jin Sun ◽  
Yi Zhang ◽  
Yaoqin Zhu ◽  
...  

As the volume of remotely sensed data grows significantly, content-based image retrieval (CBIR) becomes increasingly important, especially for cloud computing platforms that facilitate processing and storing big data in a parallel and distributed way. This paper proposes a novel parallel CBIR system for hyperspectral image (HSI) repository on cloud computing platforms under the guide of unmixed spectral information, i.e., endmembers and their associated fractional abundances, to retrieve hyperspectral scenes. However, existing unmixing methods would suffer extremely high computational burden when extracting meta-data from large-scale HSI data. To address this limitation, we implement a distributed and parallel unmixing method that operates on cloud computing platforms in parallel for accelerating the unmixing processing flow. In addition, we implement a global standard distributed HSI repository equipped with a large spectral library in a software-as-a-service mode, providing users with HSI storage, management, and retrieval services through web interfaces. Furthermore, the parallel implementation of unmixing processing is incorporated into the CBIR system to establish the parallel unmixing-based content retrieval system. The performance of our proposed parallel CBIR system was verified in terms of both unmixing efficiency and accuracy.


2004 ◽  
Vol 31 (3) ◽  
pp. 319 ◽  
Author(s):  
Jane Catherine Kitson

Sooty shearwaters (tītī, muttonbird, Puffinus griseus) are highly abundant migratory seabirds, which return to breeding colonies in New Zealand. The Rakiura Māori annual chick harvest on islands adjacent to Rakiura (Stewart Island), is one of the last large-scale customary uses of native wildlife in New Zealand. This study aimed to establish whether the rate at which muttonbirders can extract chicks from their breeding burrows indicates population trends of sooty shearwaters. Harvest rates increased slightly with increasing chick densities on Putauhinu Island. Birders' harvest rates vary in their sensitivities to changing chick density. Therefore a monitoring panel requires careful screening to ensure that harvest rates of the birders selected are sensitive to chick density, and represents a cross-section of different islands. Though harvest rates can provide only a general index of population change, it can provide an inexpensive and feasible way to measure population trends. Detecting trends is the first step to assessing the long-term sustainability of the harvest.


Author(s):  
Alessandro Achille ◽  
Giovanni Paolini ◽  
Glen Mbeng ◽  
Stefano Soatto

Abstract We introduce an asymmetric distance in the space of learning tasks and a framework to compute their complexity. These concepts are foundational for the practice of transfer learning, whereby a parametric model is pre-trained for a task, and then fine tuned for another. The framework we develop is non-asymptotic, captures the finite nature of the training dataset and allows distinguishing learning from memorization. It encompasses, as special cases, classical notions from Kolmogorov complexity and Shannon and Fisher information. However, unlike some of those frameworks, it can be applied to large-scale models and real-world datasets. Our framework is the first to measure complexity in a way that accounts for the effect of the optimization scheme, which is critical in deep learning.


Sign in / Sign up

Export Citation Format

Share Document