web harvesting
Recently Published Documents


TOTAL DOCUMENTS

24
(FIVE YEARS 5)

H-INDEX

2
(FIVE YEARS 0)

2021 ◽  
Vol 6 (1) ◽  
pp. 202
Author(s):  
I Gede Surya Rahayuda ◽  
Ni Putu Linda Santiari

Publishing scientific articles online in journals is a must for researchers or academics. In choosing the journal of purpose, the researcher must look at important information on the journal's web, such as indexing, scope, fee, quarter and other information. This information is generally not collected in one page, but spread over several pages in a web journal. This will be complicated when researchers have to look at information in several journals, moreover, the information in these journals may change at any time. In this research, web harvesting design is conducted to retrieve information on web journals. With web harvesting, information that is spread across several pages can be collected into one, and researchers do not need to worry if the information has changed, because the information collected is the last or updated information. Harvesting technique is done by taking the page URL of the page, starting the source code from where the information is retrieved and end source code until the information stops being retrieved. Harvesting technique was successfully developed based on the web bootstrap framework. The test data is taken from several scientific journal webs. The information collected includes name, description, accreditation, indexing, scope, publication rate, publication charge, template and quarter. Based on tests carried out using black box testing, it is known that all the features made are as expected.


2021 ◽  
pp. 131-141
Author(s):  
Lija Jacob ◽  
K. T. Thomas
Keyword(s):  
Web Data ◽  

2020 ◽  
Vol 10 (01) ◽  
pp. 88-112
Author(s):  
Marinos Papadopoulos ◽  
Maria Botti ◽  
M. A. Paraskevi (Vicky) Ganatsiou ◽  
Christos Zampakolas

2019 ◽  
Vol 12 (2) ◽  
pp. 178-189
Author(s):  
Maria Bottis ◽  
Marinos Papadopoulos ◽  
Christos Zampakolas ◽  
Paraskevi Ganatsiou
Keyword(s):  

2019 ◽  
Vol 09 (03) ◽  
pp. 369-395
Author(s):  
Μaria Bottis ◽  
Marinos Papadopoulos ◽  
Christos Zampakolas ◽  
Paraskevi Ganatsiou

Author(s):  
Eduardo Wirthmann Ferreira

This article explores current salary levels and some other related questions using public recruitment data from the CharityJobs website. According to CharityJob, the site is the United Kingdom's busiest one for charity, fundraising, NGO and not for profit jobs. Data collection took place between 4 September and 20 November 2016 with some basic techniques of web scraping (web harvesting or web data extraction), which is a computer software technique of extracting information from websites. All the process is documented at: https://rpubs.com/EduardoWF/charityjobs. The source code in RMarkdown is available for download following GNU General Public License. Everything was prepared with the open-source and freely-accesible statistical computing software R (R version 3.2.0 - http://cran.r-project.org/) and the IDE RStudio (Version 0.99.441 - http://www.rstudio.com/). In addition to presenting these powerful tools and data-exploring techniques, I hope that this article can help the public, specially applicants and workers in civil-society organisations to get an update on salaries and trends in the sector. The jobs analysed here are mostly UK-based ones and published by UK-based organisations. Therefore, the results are not meant to represent the entire sector worldwide. I still hope though that this analysis can provide some positive contribuition to the evolution of work of civil-society organisation in both the southern and the northern hemispheres. This post is based on public data, is my sole responsibility and can in no way be taken to reflect the views of CharityJobs' staff. 


2018 ◽  
pp. 4619-4620
Author(s):  
Wolfgang Gatterbauer
Keyword(s):  

The Dark Web ◽  
2018 ◽  
pp. 199-226 ◽  
Author(s):  
B. Umamageswari ◽  
R. Kalpana

Web mining is done on huge amounts of data extracted from WWW. Many researchers have developed several state-of-the-art approaches for web data extraction. So far in the literature, the focus is mainly on the techniques used for data region extraction. Applications which are fed with the extracted data, require fetching data spread across multiple web pages which should be crawled automatically. For this to happen, we need to extract not only data regions, but also the navigation links. Data extraction techniques are designed for specific HTML tags; which questions their universal applicability for carrying out information extraction from differently formatted web pages. This chapter focuses on various web data extraction techniques available for different kinds of data rich pages, classification of web data extraction techniques and comparison of those techniques across many useful dimensions.


IFLA Journal ◽  
2017 ◽  
Vol 43 (4) ◽  
pp. 379-390
Author(s):  
Jhonny Antonio Pabón Cadavid

The evolution of legal deposit shows changes and challenges in collecting, access to and use of documentary heritage. Legal deposit emerged in New Zealand at the beginning of the 20th century with the aim of preserving print publications mainly for the use of a privileged part of society. In the 21st century legal deposit has evolved to include the safeguarding of electronic resources and providing access to the documentary heritage for all New Zealanders. The National Library of New Zealand has acquired new functions for a proper stewardship of digital heritage. E-deposit and web harvesting are two new mechanisms for collecting New Zealand publications. The article proposes that legal deposit through human rights and multiculturalism should involve different communities of heritage in web curation.


Author(s):  
B. Umamageswari ◽  
R. Kalpana

Web mining is done on huge amounts of data extracted from WWW. Many researchers have developed several state-of-the-art approaches for web data extraction. So far in the literature, the focus is mainly on the techniques used for data region extraction. Applications which are fed with the extracted data, require fetching data spread across multiple web pages which should be crawled automatically. For this to happen, we need to extract not only data regions, but also the navigation links. Data extraction techniques are designed for specific HTML tags; which questions their universal applicability for carrying out information extraction from differently formatted web pages. This chapter focuses on various web data extraction techniques available for different kinds of data rich pages, classification of web data extraction techniques and comparison of those techniques across many useful dimensions.


Sign in / Sign up

Export Citation Format

Share Document