scholarly journals Learning to Crawl

2020 ◽  
Vol 34 (04) ◽  
pp. 6046-6053
Author(s):  
Utkarsh Upadhyay ◽  
Robert Busa-Fekete ◽  
Wojciech Kotlowski ◽  
David Pal ◽  
Balazs Szorenyi

Web crawling is the problem of keeping a cache of webpages fresh, i.e., having the most recent copy available when a page is requested. This problem is usually coupled with the natural restriction that the bandwidth available to the web crawler is limited. The corresponding optimization problem was solved optimally by Azar et al. (2018) under the assumption that, for each webpage, both the elapsed time between two changes and the elapsed time between two requests follows a Poisson distribution with known parameters. In this paper, we study the same control problem but under the assumption that the change rates are unknown a priori, and thus we need to estimate them in an online fashion using only partial observations (i.e., single-bit signals indicating whether the page has changed since the last refresh). As a point of departure, we characterise the conditions under which one can solve the problem with such partial observability. Next, we propose a practical estimator and compute confidence intervals for it in terms of the elapsed time between the observations. Finally, we show that the explore-and-commit algorithm achieves an O(√T) regret with a carefully chosen exploration horizon. Our simulation study shows that our online policy scales well and achieves close to optimal performance for a wide range of parameters.

2022 ◽  
Vol 2022 ◽  
pp. 1-10
Author(s):  
WenNing Wu ◽  
ZhengHong Deng

Wi-Fi-enabled information terminals have become enormously faster and more powerful because of this technology’s rapid advancement. As a result of this, the field of artificial intelligence (AI) was born. Artificial intelligence (AI) has been used in a wide range of societal contexts. It has had a significant impact on the realm of education. Using big data to support multistage views of every subject of opinion helps to recognize the unique characteristics of each aspect and improves social network governance’s suitability. As public opinion in colleges and universities becomes an increasingly important vehicle for expressing public opinion, this paper aims to explore the concepts of public opinion based on the web crawler and CNN (Convolutional Neural Network) model. Web crawler methodology is utilised to gather the data given by students of college and universities and mention them in different dimensions. This CNN has robust data analysis capability; this proposed model uses the CNN to analyse the public opinion. Preprocessing of data is done using the oversampling method to maximize the effect of classification. Through the association of descriptions, comprehensive utilization of image information like user influence, stances of comments, topics, time of comments, etc., to suggest guidance phenomenon for various schemes, helps to enhance the effectiveness and targeted social governance of networks. The overall experimentation was carried out in python here in which the suggested methodology was predicting the positive and negative opinion of the students over the web crawler technology with a low rate of error when compared to other existing methodology.


Author(s):  
Ramandeep Kaur ◽  
Navpreet Kaur

The cloud computing can be essentially expressed as aconveyance of computing condition where distinctive assets are conveyed as a support of the client or different occupants over the web. The task scheduling basically concentrates on improving the productive use of assets and henceforth decrease in task fruition time. Task scheduling is utilized to allot certain tasks to specific assets at a specific time occurrence. A wide range of systems has been exhibited to take care of the issues of scheduling of various tasks. Task scheduling enhances the productive use of asset and yields less reaction time with the goal that the execution of submitted tasks happens inside a conceivable least time. This paper talks about the investigation of need, length and due date based task scheduling calculations utilized as a part of cloud computing.


10.29007/2k64 ◽  
2018 ◽  
Author(s):  
Pat Prodanovic ◽  
Cedric Goeury ◽  
Fabrice Zaoui ◽  
Riadh Ata ◽  
Jacques Fontaine ◽  
...  

This paper presents a practical methodology developed for shape optimization studies of hydraulic structures using environmental numerical modelling codes. The methodology starts by defining the optimization problem and identifying relevant problem constraints. Design variables in shape optimization studies are configuration of structures (such as length or spacing of groins, orientation and layout of breakwaters, etc.) whose optimal orientation is not known a priori. The optimization problem is solved numerically by coupling an optimization algorithm to a numerical model. The coupled system is able to define, test and evaluate a multitude of new shapes, which are internally generated and then simulated using a numerical model. The developed methodology is tested using an example of an optimum design of a fish passage, where the design variables are the length and the position of slots. In this paper an objective function is defined where a target is specified and the numerical optimizer is asked to retrieve the target solution. Such a definition of the objective function is used to validate the developed tool chain. This work uses the numerical model TELEMAC- 2Dfrom the TELEMAC-MASCARET suite of numerical solvers for the solution of shallow water equations, coupled with various numerical optimization algorithms available in the literature.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Spyridoula Vazou ◽  
Collin A. Webster ◽  
Gregory Stewart ◽  
Priscila Candal ◽  
Cate A. Egan ◽  
...  

Abstract Background/Objective Movement integration (MI) involves infusing physical activity into normal classroom time. A wide range of MI interventions have succeeded in increasing children’s participation in physical activity. However, no previous research has attempted to unpack the various MI intervention approaches. Therefore, this study aimed to systematically review, qualitatively analyze, and develop a typology of MI interventions conducted in primary/elementary school settings. Subjects/Methods Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed to identify published MI interventions. Irrelevant records were removed first by title, then by abstract, and finally by full texts of articles, resulting in 72 studies being retained for qualitative analysis. A deductive approach, using previous MI research as an a priori analytic framework, alongside inductive techniques were used to analyze the data. Results Four types of MI interventions were identified and labeled based on their design: student-driven, teacher-driven, researcher-teacher collaboration, and researcher-driven. Each type was further refined based on the MI strategies (movement breaks, active lessons, other: opening activity, transitions, reward, awareness), the level of intrapersonal and institutional support (training, resources), and the delivery (dose, intensity, type, fidelity). Nearly half of the interventions were researcher-driven, which may undermine the sustainability of MI as a routine practice by teachers in schools. An imbalance is evident on the MI strategies, with transitions, opening and awareness activities, and rewards being limitedly studied. Delivery should be further examined with a strong focus on reporting fidelity. Conclusions There are distinct approaches that are most often employed to promote the use of MI and these approaches may often lack a minimum standard for reporting MI intervention details. This typology may be useful to effectively translate the evidence into practice in real-life settings to better understand and study MI interventions.


2021 ◽  
pp. 0310057X2097665
Author(s):  
Natasha Abeysekera ◽  
Kirsty A Whitmore ◽  
Ashvini Abeysekera ◽  
George Pang ◽  
Kevin B Laupland

Although a wide range of medical applications for three-dimensional printing technology have been recognised, little has been described about its utility in critical care medicine. The aim of this review was to identify three-dimensional printing applications related to critical care practice. A scoping review of the literature was conducted via a systematic search of three databases. A priori specified themes included airway management, procedural support, and simulation and medical education. The search identified 1544 articles, of which 65 were included. Ranging across many applications, most were published since 2016 in non – critical care discipline-specific journals. Most studies related to the application of three-dimensional printed models of simulation and reported good fidelity; however, several studies reported that the models poorly represented human tissue characteristics. Randomised controlled trials found some models were equivalent to commercial airway-related skills trainers. Several studies relating to the use of three-dimensional printing model simulations for spinal and neuraxial procedures reported a high degree of realism, including ultrasonography applications three-dimensional printing technologies. This scoping review identified several novel applications for three-dimensional printing in critical care medicine. Three-dimensional printing technologies have been under-utilised in critical care and provide opportunities for future research.


2016 ◽  
Vol 12 (S325) ◽  
pp. 145-155
Author(s):  
Fionn Murtagh

AbstractThis work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or ‘photo-z’ problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Kelin Lu ◽  
K. C. Chang ◽  
Rui Zhou

This paper addresses the problem of distributed fusion when the conditional independence assumptions on sensor measurements or local estimates are not met. A new data fusion algorithm called Copula fusion is presented. The proposed method is grounded on Copula statistical modeling and Bayesian analysis. The primary advantage of the Copula-based methodology is that it could reveal the unknown correlation that allows one to build joint probability distributions with potentially arbitrary underlying marginals and a desired intermodal dependence. The proposed fusion algorithm requires no a priori knowledge of communications patterns or network connectivity. The simulation results show that the Copula fusion brings a consistent estimate for a wide range of process noises.


ReCALL ◽  
1999 ◽  
Vol 11 (S1) ◽  
pp. 31-39
Author(s):  
Pierre-Yves Foucou ◽  
Natalie Kübler

In this paper, we present the Web-based CALL environment (or WALL) which is currently being experimented with at the University of Paris 13 in the Computer Science Department of the Institut Universitaire de Technologie. Our environment is being developed to teach computer science (CS) English to CS French-speaking students, and will be extended to other languages for specific purposes such as, for example, English or French for banking, law, economics or medicine, where on-line resources are available.English, and more precisely CS English is, for our students, a necessary tool, and not an object of study. The learning activities must therefore stimulate the students' interest and reflection about language phenomena. Our pedagogical objective, relying on research acquisition (Wokusch 1997) consists in linking various texts together with other documents, such as different types of dictionaries or other types of texts, so that knowledge can be acquired using various appropriate contexts.Language teachers are not supposed to be experts in fields such as computer sciences or economics. We aim at helping them to make use of the authentic documents that are related to the subject area in which they teach English. As shown in Foucou and Kübler (1998) the wide range of resources available on the Web can be processed to obtain corpora, i.e. teaching material. Our Web-based environment therefore provides teachers with a series of tools which enable them to access information about the selected specialist subject, select appropriate specialised texts, produce various types of learning activities and evaluate students' progress.Commonly used textbooks Tor specialised English offer a wide range of learning activities, but they are based on documents that very quickly become obsolete, and that are sometimes widely modified. Moreover, they are not adaptable to the various levels of language of the students. From the students' point of view, working on obsolete texts that are either too easy or too difficult can quickly become demotivating, not to say boring.In the next section, we present the general architecture of the teaching/learning environment; the method of accessing and using it, for teachers as well as for students, is then described. The following section deals with the actual production of exercises and their limits. We conclude and present some possible research directions.


2017 ◽  
Vol 4 (1) ◽  
pp. 95-110 ◽  
Author(s):  
Deepika Punj ◽  
Ashutosh Dixit

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.


2005 ◽  
Vol 10 (4) ◽  
pp. 517-541 ◽  
Author(s):  
Mike Thelwall

The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.


Sign in / Sign up

Export Citation Format

Share Document