Efficient watcher based web crawler design

Purpose – The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages. Design/methodology/approach – In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process. Findings – Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems. Originality/value – The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites.

Download Full-text

Marketing and delivering information literacy on the web, yesterday and today, from 2009 to 2012

Library Hi Tech News ◽

10.1108/lhtn-03-2014-0019 ◽

2014 ◽

Vol 31 (4) ◽

pp. 10-13 ◽

Cited By ~ 1

Author(s):

Sharon Q. Yang

Keyword(s):

Information Literacy ◽

Web Sites ◽

Academic Libraries ◽

Design Methodology ◽

Status Quo ◽

Content Type ◽

The Status ◽

Practical Implications ◽

The Web ◽

Made In

Purpose – This study aims to ascertain the trends and changes of how academic libraries market and deliver information literacy (IL) on the web. Design/methodology/approach – The author compares the findings from two separate studies that scanned the Web sites for IL-related activities in 2009 and 2012, respectively. Findings – Academic libraries intensified their efforts to promote and deliver IL on the web between 2009 and 2012. There was a significant increase in IL-related activities on the web in the three-year period. Practical implications – The findings describe the status quo and changes in IL-related activities on the libraries’ Web sites. This information may help librarians to know what they have been doing and if there is space for improvement. Originality/value – This is the only study that spans three years in measuring the progress librarians made in marketing and delivering IL on the Web.

Download Full-text

Complementing consumer magazine brands with internet extensions?

Internet Research ◽

10.1108/10662240910981371 ◽

2009 ◽

Vol 19 (4) ◽

pp. 408-424 ◽

Cited By ~ 11

Author(s):

Anssi Tarkiainen ◽

Hanna‐Kaisa Ellonen ◽

Olli Kuivalainen

Keyword(s):

Web Sites ◽

Web Site ◽

Design Methodology ◽

Brand Extensions ◽

Content Type ◽

Print Version ◽

Consumer Magazine ◽

The Relationship ◽

The Web

PurposeThe purpose of this paper is to increase understanding of the effects of web site extension on the parent‐magazine brand in the context of experiential goods, and to identify factors that are related to success.Design/methodology/approachThe paper focuses on the relationship between consumers' experiences on magazine web sites and their loyalty towards the print magazine.FindingsThere are different ways in which the web site can complement the print version. The first mechanism is related to engaging in more frequent communication with the magazine's readers, and the second is related to consumer‐initiated interaction between other readers. In both cases something is offered that cannot be obtained from the print magazine, but is assumed to complement it.Originality/valueThe paper increases understanding of brand extensions with regard to experiential goods, but more research is needed on the factors that are related to extension success.

Download Full-text

Website-reflected operating characteristics of wineries’ wine clubs

International Journal of Wine Business Research ◽

10.1108/ijwbr-04-2013-0013 ◽

2014 ◽

Vol 26 (4) ◽

pp. 244-258

Author(s):

Nicholas C. Williamson ◽

Joy Bhadury

Keyword(s):

Empirical Research ◽

Research Design ◽

Web Sites ◽

Design Methodology ◽

Evolutionary Process ◽

Great Majority ◽

Operating Characteristics ◽

Content Type ◽

The Usa ◽

The Web

Purpose – The purpose of this empirical research is to identify the distinguishing operating characteristics of wineries that use what is alleged to be the most profitable channel of distribution for marketing wine in the USA: the wine club. Design/methodology/approach – The research design entails the contrasting of the Web site-reflected operating features of wineries that support wine clubs with wineries that do not. Findings – Support was found for the great majority of operating features identified in the literature as likely characterizing the operations of wineries with wine clubs. A notable exception concerns the lack of confirmation of hypotheses concerning “Wine 2.0” variables. Research limitations/implications – In the apparent pursuit of higher profits, owners and managers of wineries with wine clubs more frequently adopt operating features that expose them to objective competitive comparisons than do owners and managers with other wineries. The former are also more prone to advertise on their Web sites a variety of offers that collectively constitute a more valuable quid pro quo in their relationships with consumer buyers than appears to be the case with other wineries. Strategically, results demonstrate that a winery’s adoption of a wine club is not a part of an evolutionary process of wineries in general. Originality/value – There has been no other published empirical research that concerned the identification of distinguishing operating features of wineries that use what has been argued to be the most profitable channel for marketing wine at retail in the USA: the wine club channel. Winery owners and managers will find particular value in the results and implications of the research.

Download Full-text

A webometric analysis of major keywords and expressions in biochemistry using LexiURL Searcher

The Electronic Library ◽

10.1108/el-03-2014-0054 ◽

2015 ◽

Vol 33 (6) ◽

pp. 1163-1173 ◽

Cited By ~ 1

Author(s):

Kobra Taram ◽

Abbas Doulani

Keyword(s):

Web Sites ◽

Web Site ◽

Design Methodology ◽

Information Access ◽

Link Analysis ◽

Content Type ◽

Common Thread ◽

The Common ◽

Mesh Database ◽

The Web

Purpose – The purpose of this paper is to explore webometric analysis of keywords and expressions of the biochemistry field of study via LexiURL Searcher. Design/methodology/approach – Interfaces for assisting users with information access have received considerable attention. Along with the extraction of data on Web sites for webometric purposes (e.g. link analysis, ranking of Web sites, etc.), LexiURL Searcher presents some information on the arrangement of links among different Web sites. Such capability enables users to identify one or more Web sites around their intended subject and, accordingly, explore all Web sites linked with their identified Web site(s). LexiURL Searcher has preceded webometric analysis by considering the main expressions and keywords derived from the MeSH database. Findings – The worldwide survey indicated that links from countries such as England, Japan, Germany, Australia and Canada were among the Web sites that are most used in biochemistry. Alternatively, other countries such as Singapore, Thailand and Poland had the most advantageous links to the outside world, whereas South Africa, New Zealand and The Netherlands had the least link effect. Biochemistry, being a specialized domain, would benefit greatly from site linking and would provide users the most assistance in information processing. Originality/value – Most webometric studies remain on the level of link analysis and Web site statuses; however, this paper gives information on the common thread Web sites based on a standard thesaurus.

Download Full-text

Wayback machine: reincarnation to vanished online citations

Program electronic library and information systems ◽

10.1108/prog-07-2013-0039 ◽

2015 ◽

Vol 49 (2) ◽

pp. 205-223

Author(s):

B T Sampath Kumar ◽

D Vinay Kumar ◽

K.R. Prithviraj

Keyword(s):

Half Life ◽

Design Methodology ◽

Editorial Staff ◽

Web Pages ◽

Error Message ◽

Scholarly Journals ◽

Content Type ◽

Life Period ◽

Depth Study ◽

The Web

Purpose – The purpose of this paper is to know the rate of loss of online citations used as references in scholarly journals. It also indented to recover the vanished online citations using Wayback Machine and also to calculate the half-life period of online citations. Design/methodology/approach – The study selected three journals published by Emerald publication. All 389 articles published in these three scholarly journals were selected. A total of 15,211 citations were extracted of which 13,281 were print citations and only 1,930 were online citations. The online citations so extracted were then tested to determine whether they were active or missing on the Web. W3C Link Checker was used to check the existence of online citations. The online citations which got HTTP error message while testing for its accessibility were then entered in to the search box of the Wayback Machine to recover vanished online citations. Findings – Study found that only 12.69 percent (1,930 out of 15,211) citations were online citations and the percentage of online citations varied from a low of 9.41 in the year 2011 to high of 17.52 in the year 2009. Another notable finding of the research was that 30.98 percent of online citations were not accessible (vanished) and remaining 69.02 percent of online citations were still accessible (active). The HTTP 404 error message – “page not found” was the overwhelming message encountered and represented 62.98 percent of all HTTP error message. It was found that the Wayback Machine had archived only 48.33 percent of the vanished web pages, leaving 51.67 percent still unavailable. The half-life of online citations was increased from 5.40 years to 11.73 years after recovering the vanished online citations. Originality/value – This is a systematic and in-depth study on recovery of vanished online citations cited in journals articles spanning a period of five years. The findings of the study will be helpful to researchers, authors, publishers, and editorial staff to recover vanishing online citations using Wayback Machine.

Download Full-text

Influence of language and file type on the web visibility of top European universities

Aslib Journal of Information Management ◽

10.1108/ajim-02-2013-0018 ◽

2014 ◽

Vol 66 (1) ◽

pp. 96-116 ◽

Cited By ~ 2

Author(s):

Enrique Orduña-Malea ◽

Jose Luis Ortega ◽

Isidro F. Aguillo

Keyword(s):

Search Engine ◽

Web Sites ◽

Design Methodology ◽

Content Type ◽

European Universities ◽

File Formats ◽

Google Search ◽

European University ◽

The Web ◽

Web Visibility

Purpose – The purpose of this paper is to detect whether both file type (a set of rich and web files) and language (English, Spanish, German, French and Italian) influence the web visibility of European universities. Design/methodology/approach – A webometrics analysis of the top 200 European universities (as ranked in the Ranking web of World Universities) was carried out by a manual query for each official URL identified by using the Google search engine (April 2012). A correlation analysis between visibility and file format page count is offered according to language. Finally, a prediction of visibility is shown by using the SMOreg function. Findings – The results indicate that Spanish and English are the languages that correlate most highly with web visibility. This correlation becomes greater – though moderate – when considering only PDF files. Research limitations/implications – The results are limited due to the low correlation between overall page count and visibility. The lack of an accurate search engine that would assist in link counting procedures makes this process difficult. Originality/value – An observed increase in correlation – although moderate – while analysing PDF files (in English and Spanish) is considered to be meaningful. This may indirectly confirm that specific file formats and languages generate different web visibility behaviour on European university web sites.

Download Full-text

Staff resources

The Bottom Line Managing Library Finances ◽

10.1108/08880450510597550 ◽

2005 ◽

Vol 18 (2) ◽

pp. 95-97 ◽

Cited By ~ 1

Author(s):

John Maxymuk

Keyword(s):

Staff Development ◽

Web Sites ◽

Design Methodology ◽

Information Resources ◽

Online Environment ◽

Content Type ◽

Finding Aids ◽

Types Of Information ◽

Training Organization ◽

The Web

PurposeTo show that despite libraries' tendencies to focus all their efforts – even in the online environment – on developing tools, resources, and finding aids for their patrons, some have also used the web to develop resources for staff needs.Design/methodology/approachSurveys a number of library web sites and highlights online resources that have been developed to assist library staff in areas of training, organization, and professional development.FindingsRanging from online instruction for new staff, listings of library policies and passwords, and resources for staff development, many libraries have begun to use their web sites to provide valuable information for staff too.Originality/valueThe examples presented in this column can provide guidance for any library beginning to use their web site to provide information resources for their staff. Several types of information are presented showing both the range of information of use to staff and a variety of methods to convey that information.

Download Full-text

Web Crawling on News Web Page using Different Frameworks

International Journal of Scientific Research in Science and Technology ◽

10.32628/cseit2174120 ◽

2021 ◽

pp. 513-519

Author(s):

Harshala Bhoir ◽

K. Jayamalini

Keyword(s):

Web Sites ◽

Parse Tree ◽

Web Pages ◽

Web Crawling ◽

Web Page ◽

Web Crawler ◽

Information Searching ◽

System Use ◽

Xpath Expression ◽

The Web

Now a days Internet is widely used by users to find required information. Searching on web for useful information has become more difficult. Web crawler helps to extract the relevant and irrelevant links from the web. Web crawler downloads web pages through the program. This paper implements web crawler with Scrapy and Beautiful Soup python web crawler framework to crawls news on news web sites.Scrapy is a web crawling framework that allow programmer to create spider that define how a certain site or a group of sites will be scraped. It has built-in support for extracting data from HTML sources using XPath expression and CSS expression. BeautifulSoup is a framework that extract data from web pages. Beautiful Soup provides a few simple methods for navigating, searching and modifying a parse tree. BeautifulSoup automatically convert incoming document to Unicode and outgoing document to UTF-8.Proposed system use BeautifulSoup and scrapy framework to crawls news web sites. This paper also compares scrapy and beautiful Soup4 web crawler frameworks.

Download Full-text

A tribute to Gwen Leighty

The Bottom Line Managing Library Finances ◽

10.1108/08880450610663627 ◽

2006 ◽

Vol 19 (2) ◽

pp. 84-86

Author(s):

Jennifer Paustenbaugh

Keyword(s):

Web Sites ◽

Academic Libraries ◽

Design Methodology ◽

Fund Raising ◽

Personal Knowledge ◽

Valuable Contribution ◽

Content Type ◽

History Of ◽

Development Network ◽

Fund Raiser

PurposeThe purpose of the paper is to provide a tribute to the life and work of library fund‐raiser Gwen Leighty.Design/methodology/approachThe paper uses personal knowledge and references to Academic Libraries Advancement and Development Network (ALADN) and LIBDEV web sites.FindingsThe paper finds that fundraising is connecting with people and the journey that each development officer must make while raising funds for their library.Originality/valueThe paper presents a brief history of ALADN and the valuable contribution one person made to the cause of library fund‐raising.

Download Full-text

Return to relevance

OCLC Systems & Services ◽

10.1108/oclc-10-2014-0035 ◽

2015 ◽

Vol 31 (1) ◽

pp. 2-6

Author(s):

Robert Fox

Keyword(s):

Literature Review ◽

Web Sites ◽

Web Site ◽

Design Methodology ◽

Library Science ◽

User Needs ◽

Content Type ◽

Web Presence ◽

Design Philosophy ◽

Information Professionals

Purpose – In order to continue to respond to patron needs in a relevant way, it is necessary to continuously reevaluate the central message that the library website is intended to convey. It ' s necessary to question assumptions, listen to user needs, and shift our paradigm to make the library web presence as effective as possible. Design/methodology/approach – This is a regular viewpoint column. A basic literature review was done prior to the column being written. Findings – The library Web site remains, in many respects, the “first face” of the library for patrons. To remain relevant, traditional methodologies used in library science may need to be set aside or catered to the needs of the patron. Originality/value – Various methods regarding design philosophy are explored which may be of use to information professionals responsible for the design and content of the library Web sites.

Download Full-text