web scraping
Recently Published Documents


TOTAL DOCUMENTS

302
(FIVE YEARS 227)

H-INDEX

7
(FIVE YEARS 3)

2021 ◽  
Vol 15 (30) ◽  
Author(s):  
Juan Carlos Gómez Sánchez ◽  
Francisco José Martínez López

Siendo evidente la preocupación social por la sostenibilidad medioambiental, los organismos europeos se han percatado de que, actuando en edades tempranas, se pueden paliar las consecuencias medioambientales a largo plazo y por ello se estableció en la 57 sesión de la ONU, al periodo 2004-2014 como la década en la que la educación sería la base de concienciación medioambiental. El presente artículo tiene por objetivo responder a los referentes marcados por la Orden ECD/65/2015 relativos al fomento de competencias transversales. Para ello, se planteará una propuesta académica en la materia de Tecnologías de la Información y Comunicación del primer curso de bachillerato en la que el alumnado deberá implementar una app basada en App Inventor con la finalidad de trabajar con variables climáticas de distintas bases de datos mediante técnicas basadas en interfaz de programación (API) y Web Scraping. Finalmente, se planteará una aplicación práctica con un microcontrolador y se comprobará la disparidad de resultados existente entre las bases de datos nacionales y las bases de datos internacionales debido al desfase de sincronización de sus sistemas de información.


2021 ◽  
Author(s):  
Alex Luscombe ◽  
Jamie Duncan ◽  
Kevin Walby

Computational methods are increasingly popular in criminal justice research. As more criminal justice data becomes available in big data and other digital formats, new means of embracing the computational turn are needed. In this article, we propose a framework for data collection and case sampling using computational methods, allowing researchers to conduct thick qualitative research – analyses concerned with the particularities of a social context or phenomenon – starting from big data, which is typically associated with thinner quantitative methods and the pursuit of generalizable findings. The approach begins by using open-source web scraping algorithms to collect content from a target website, online database, or comparable online source. Next, researchers use computational techniques from the field of natural language processing to explore themes and patterns in the larger data set. Based on these initial explorations, researchers algorithmically generate a subset of data for in-depth qualitative analysis. In this computationally driven process of data collection and case sampling, the larger corpus and subset are never entirely divorced, a feature we argue has implications for traditional qualitative research techniques and tenets. To illustrate this approach, we collect, subset, and analyze three years of news releases from the Royal Canadian Mounted Police website (N = 13,637) using a mix of web scraping, natural language processing, and visual discourse analysis. To enhance the pedagogical value of our intervention and facilitate replication and secondary analysis, we make all data and code available online in the form of a detailed, step-by-step tutorial.


Author(s):  
Piotr Śpiewanowski ◽  
Oleksandr Talavera ◽  
Linh Vi

The 21st-century economy is increasingly built around data. Firms and individuals upload and store enormous amount of data. Most of the produced data is stored on private servers, but a considerable part is made publicly available across the 1.83 billion websites available online. These data can be accessed by researchers using web-scraping techniques. Web scraping refers to the process of collecting data from web pages either manually or using automation tools or specialized software. Web scraping is possible and relatively simple thanks to the regular structure of the code used for websites designed to be displayed in web browsers. Websites built with HTML can be scraped using standard text-mining tools, either scripts in popular (statistical) programming languages such as Python, Stata, R, or stand-alone dedicated web-scraping tools. Some of those tools do not even require any prior programming skills. Since about 2010, with the omnipresence of social and economic activities on the Internet, web scraping has become increasingly more popular among academic researchers. In contrast to proprietary data, which might not be feasible due to substantial costs, web scraping can make interesting data sources accessible to everyone. Thanks to web scraping, the data are now available in real time and with significantly more details than what has been traditionally offered by statistical offices or commercial data vendors. In fact, many statistical offices have started using web-scraped data, for example, for calculating price indices. Data collected through web scraping has been used in numerous economic and finance projects and can easily complement traditional data sources.


2021 ◽  
pp. bmjebm-2021-111834
Author(s):  
Bethan Swift ◽  
Carl Heneghan ◽  
Jeffrey Aronson ◽  
David Howard ◽  
Georgia C Richards

ObjectivesTo examine coroners’ Prevention of Future Deaths (PFDs) reports to identify deaths involving SARS-CoV-2 that coroners deemed preventable.DesignConsecutive case series.SettingEngland and Wales.ParticipantsPatients reported in 510 PFDs dated between 01 January 2020 and 28 June 2021, collected from the UK’s Courts and Tribunals Judiciary website using web scraping to create an openly available database: https://preventabledeathstrackernet/.Main outcome measuresConcerns reported by coroners.ResultsSARS-CoV-2 was involved in 23 deaths reported by coroners in PFDs. Twelve deaths were indirectly related to the COVID-19 pandemic, defined as those that were not medically caused by SARS-CoV-2, but were associated with mitigation measures. In 11 cases, the coroner explicitly reported that COVID-19 had directly caused death. There was geographical variation in the reporting of PFDs; most (39%) were written by coroners in the North West of England. The coroners raised 56 concerns, problems in communication being the most common (30%), followed by failure to follow protocols (23%). Organisations in the National Health Service were sent the most PFDs (51%), followed by the government (26%), but responses to PFDs by these organisations were poor.ConclusionsPFDs contain a rich source of information on preventable deaths that has previously been difficult to examine systematically. Our openly available tool (https://preventabledeathstracker.net/) streamlines this process and has identified many concerns raised by coroners that should be addressed during the government’s inquiry into the handling of the COVID-19 pandemic, so that mistakes made are less likely to be repeated.Study protocol preregistrationhttps://osf.io/bfypc/.


2021 ◽  
Vol 9 ◽  
Author(s):  
Jianqiang Sun ◽  
Ryo Futahashi ◽  
Takehiko Yamanaka

Citizen science is essential for nationwide ecological surveys of species distribution. While the accuracy of the information collected by beginner participants is not guaranteed, it is important to develop an automated system to assist species identification. Deep learning techniques for image recognition have been successfully applied in many fields and may contribute to species identification. However, deep learning techniques have not been utilized in ecological surveys of citizen science, because they require the collection of a large number of images, which is time-consuming and labor-intensive. To counter these issues, we propose a simple and effective strategy to construct species identification systems using fewer images. As an example, we collected 4,571 images of 204 species of Japanese dragonflies and damselflies from open-access websites (i.e., web scraping) and scanned 4,005 images from books and specimens for species identification. In addition, we obtained field occurrence records (i.e., range of distribution) of all species of dragonflies and damselflies from the National Biodiversity Center, Japan. Using the images and records, we developed a species identification system for Japanese dragonflies and damselflies. We validated that the accuracy of the species identification system was improved by combining web-scraped and scanned images; the top-1 accuracy of the system was 0.324 when trained using only web-scraped images, whereas it improved to 0.546 when trained using both web-scraped and scanned images. In addition, the combination of images and field occurrence records further improved the top-1 accuracy to 0.668. The values of top-3 accuracy under the three conditions were 0.565, 0.768, and 0.873, respectively. Thus, combining images with field occurrence records markedly improved the accuracy of the species identification system. The strategy of species identification proposed in this study can be applied to any group of organisms. Furthermore, it has the potential to strike a balance between continuously recruiting beginner participants and updating the data accuracy of citizen science.


2021 ◽  
pp. 105065192110646
Author(s):  
John R. Gallagher ◽  
Aaron Beveridge

This article advocates for web scraping as an effective method to augment and enhance technical and professional communication (TPC) research practices. Web scraping is used to create consistently structured and well-sampled data sets about domains, communities, demographics, and topics of interest to TPC scholars. After providing an extended description of web scraping, the authors identify technical considerations of the method and provide practitioner narratives. They then describe an overview of project-oriented web scraping. Finally, they discuss implications for the concept as a sustainable approach to developing web scraping methods for TPC research.


2021 ◽  
Vol 30 (44) ◽  
pp. 14
Author(s):  
Jose Luis Argiñano ◽  
Udane Goikoetxea-Bilbao

Este artículo investiga la proliferación de las fake news en las redes sociales y la creciente preocupación por la alimentación. El objetivo es analizar la actividad de los nutricionistas españoles de referencia como verificadores de noticias en Instagram. Tras la elección de 9 nutricionistas instagrammers, mediante la técnica de web scraping se extrajeron 2.100 comentarios, entre el 1 de enero y el 30 de junio de 2019. A continuación, se procedió a su clasificación mediante un análisis semántico no supervisado y posteriormente se extrajeron los posts relacionados con falsedades (el 3,9%). Los resultados reflejan que estos influencers de la alimentación ejercen una labor de fact-checking en Instagram de bajo perfil, sin aprovechar el potencial de los recursos audiovisuales. No obstante, esta labor de verificación les permite potenciar una imagen independiente de las marcas comerciales y, paralelamente, construir comunidad con sus seguidores.


Sign in / Sign up

Export Citation Format

Share Document