Inconsistent XML as a barrier to reuse of Open Access Content

Author(s):  
Daniel Mietchen ◽  
Chris Maloney ◽  
Nils Dagsson Moskopp

In this paper, we will describe the current state of some of the tagging of articles within the PMC Open Access subset. As a case study, we will use our experiences developing the Open Access Media Importer, a tool to harvest content from the OA subset for automated upload to Wikimedia Commons. Tagging inconsistencies stretch across several aspects of the articles, ranging from licensing to keywords to the media types of supplementary materials. While all of these complicate large-scale reuse, the unclear licensing statements had the greatest impact, requiring us to implement text mining-like algorithms in order to accurately determine whether or not specific content was compatible with reuse on Wikimedia Commons. Besides presenting examples of incorrectly tagged XML from a range of publishers, we will also explore past and current efforts towards standardization of license tagging, and we will describe a set of recommendations related to tagging practices of certain data, to ensure that it is both compatible with existing standards, and consistent and machine-readable.

Author(s):  
Adam Łajczak

Abstract Changes in flood risk impacted by river training - case study of piedmont section of the Vistula river. Main problems concerning the flood risk in piedmont section of the Vistula, Southern Poland, are discussed. This stretch of the river is channelized since the middle of the 19th century. It is part of the mainstream discussion of the effectiveness of existing river channelization methods. The following problems are analysed: (1) current state of flood risk, (2) the rate of river flow, (3) changes in flood risk since the start of channelization efforts with respect to changing channel geometry and changing rates of river flow reflecting the effects of channelization work. Substantially increased bankfull discharge in a channelized river may be considered as a stable hydrologic feature of the river stretch analysed. This means that the river is effectively reducing the quantity of water available for flooding the inter-embankment zone. This statement is the basis for analysis of changes in flood risk in the river studied. An assessment of changes in flood risk for the piedmont section of the Vistula cannot be categorical. Some changes in discharge help reduce flood risk, while others increase it. The paper is based mainly on the State Hydrological Survey data over more than the last 100 years, a large-scale maps over the last 230 years, and fieldwork conducted by the author.


Author(s):  
Zhuang Liu ◽  
Degen Huang ◽  
Kaiyu Huang ◽  
Zhuang Li ◽  
Jun Zhao

There is growing interest in the tasks of financial text mining. Over the past few years, the progress of Natural Language Processing (NLP) based on deep learning advanced rapidly. Significant progress has been made with deep learning showing promising results on financial text mining models. However, as NLP models require large amounts of labeled training data, applying deep learning to financial text mining is often unsuccessful due to the lack of labeled training data in financial fields. To address this issue, we present FinBERT (BERT for Financial Text Mining) that is a domain specific language model pre-trained on large-scale financial corpora. In FinBERT, different from BERT, we construct six pre-training tasks covering more knowledge, simultaneously trained on general corpora and financial domain corpora, which can enable FinBERT model better to capture language knowledge and semantic information. The results show that our FinBERT outperforms all current state-of-the-art models. Extensive experimental results demonstrate the effectiveness and robustness of FinBERT. The source code and pre-trained models of FinBERT are available online.


2018 ◽  
Vol 6 ◽  
pp. 162-169
Author(s):  
Kalina Grzesiuk ◽  
Monika Wawer

This article concerns employer branding strategies implemented by the selected Polish companies. The main purpose of this paper is to present the current state of network tools utilization among the largest Polish private firms listed in 2017 by the Forbes Magazine. The media taken into consideration included the company’s website career page and the firm’s presence on the job related network sites such as: LinkedIn, GoldenLine, GoWork.pl and Pracuj.pl. The research described in this paper was based on a case study method. The results show that the company’s website and the social network sites are effective tools for building a firm’s profile as a part of employer branding strategies. However, with such a wide choice of the job related services available, a company must choose the services that allow the company to address the right target audience for active and passive job seekers.


2019 ◽  
Vol 10 ◽  
pp. 36-42
Author(s):  
Deimante Pankauskyte ◽  
Jolanta Valciukiene ◽  
Indrius Kuklys ◽  
Lina Kukliene

Analysis of the condition of the Agila dune is presented in this Article. The analysis is based on data collected during accurate geodetic measurements using LIDAR technology. The current state of the Agila dune was compared to the data of the previous year's LIDAR points in order to ensure the reliability and value of the research. In the course of the study, eleven cross sections were compared by height differences with previous year‘s measurements. The condition of the Agila dune was found to be the worst in three cross sections. First cross section‘s erosion measured at 13,98 meters, erosion in the fifth cross section – 9.90 meters, and erosion in the eighth cross section - 11.34 meters. The main reasons for the deterioration of the natural values of the Kursiu Nerija National Park are climate, wind, high visitor flows and the persistent failure to carry out comprehensive research. Therefore, in order to preserve these unique natural values, it is important to collect large-scale and high-precision data on the status of these values, to systematize, analyze and take appropriate protective measures.


2019 ◽  
Vol 67 (4) ◽  
pp. 594-610 ◽  
Author(s):  
Toru Takahashi

How can society deal with a severe disaster that entails wide-ranging societal problems? This article aims to elaborate a sociocybernetic approach to describe and improve collective efforts which are tackling such problems. First, a concept of ‘governing’ will be introduced to describe efforts of not only public actors but also private actors. A case study of recovery efforts which were made in response to the large-scale disasters in 1995 and 2011 in Japan shows that governing can and needs to be empowered by the media. Focusing on one disaster-hit area in Japan, this article examines in what way and to what extent the media (the mass media and the Internet) supported local non-profit organizations (NPOs) working for their communities. The supportive use of the media will be defined as ‘societal media’. Making use of these key terms (governing and societal media), this article concludes that combining a variety of efforts with the supportive use of media is an essential component of building resilience in contemporary society.


2017 ◽  
Vol 3 (1) ◽  
pp. 51
Author(s):  
Madoka Chosokabe ◽  
Maiko Sakamoto ◽  
Mikiyasu Nakayama

This case study examined public participation regarding reconstruction of the disaster-affected areas in the Fukushima Prefecture of Japan following the Great East Japan Earthquake in March 2011. The study aimed at (i) identifying the topics discussed and shared by the residents, and (ii) revealing the contribution of the media to ensuring information transparency in the regional planning process. We applied text mining—in particular, correspondence analysis—to (a) the text data from dialogue sessions with local residents and (b) newspaper articles that appeared in a nation-wide newspaper in order to identify the similarities and differences between the topics discussed by session participants and those that appeared in newspaper articles. It turned out that the newspaper articles did not adequately address some important topics discussed in the dialogues. This implies that the coverage by the mass media has much room for improvement in the transparency of the information it provides and supporting the development and implementation of administrative agendas involving citizens and municipalities.


2019 ◽  
Vol 7 ◽  
Author(s):  
Gabriel Muñoz ◽  
W. Daniel Kissling ◽  
E. Emiel van Loon

A considerable portion of primary biodiversity data is digitally locked inside published literature which is often stored as pdf files. Large-scale approaches to biodiversity science could benefit from retrieving this information and making it digitally accessible and machine-readable. Nonetheless, the amount and diversity of digitally published literature pose many challenges for knowledge discovery and retrieval. Text mining has been extensively used for data discovery tasks in large quantities of documents. However, text mining approaches for knowledge discovery and retrieval have been limited in biodiversity science compared to other disciplines. Here, we present a novel, open source text mining tool, the Biodiversity Observations Miner (BOM). This web application, written in R, allows the semi-automated discovery of punctual biodiversity observations (e.g. biotic interactions, functional or behavioural traits and natural history descriptions) associated with the scientific names present inside a corpus of scientific literature. Furthermore, BOM enable users the rapid screening of large quantities of literature based on word co-occurrences that match custom biodiversity dictionaries. This tool aims to increase the digital mobilisation of primary biodiversity data and is freely accessible via GitHub or through a web server.


2017 ◽  
Vol 21 (2) ◽  
pp. 196-222 ◽  
Author(s):  
Angela S. Lee ◽  
Ronald Weitzer ◽  
Daniel E. Martínez

Recent police killings of citizens in the United States have attracted massive coverage in the media, large-scale public protests, and demands for reform of police departments throughout the country. This study is based on a content analysis of newspaper coverage of recent high-profile incidents that resulted in a citizen’s death in Ferguson, North Charleston, and Baltimore. We identify both incident-specific content as well as more general patterns that transcend the three cases. News media coverage of similar incidents in past decades tended to be episodic and favored the police perspective. Our findings point to some important departures from this paradigm. Reporting in our three cases was more likely to draw connections between discrete incidents, to attach blame to the police, and to raise questions about the systemic causes of police misconduct. These findings may be corroborated in future studies of news media representations of high-profile policing incidents elsewhere.


1996 ◽  
Vol 5 (1) ◽  
pp. 23-32 ◽  
Author(s):  
Chris Halpin ◽  
Barbara Herrmann ◽  
Margaret Whearty

The family described in this article provides an unusual opportunity to relate findings from genetic, histological, electrophysiological, psychophysical, and rehabilitative investigation. Although the total number evaluated is large (49), the known, living affected population is smaller (14), and these are spread from age 20 to age 59. As a result, the findings described above are those of a large-scale case study. Clearly, more data will be available through longitudinal study of the individuals documented in the course of this investigation but, given the slow nature of the progression in this disease, such studies will be undertaken after an interval of several years. The general picture presented to the audiologist who must rehabilitate these cases is that of a progressive cochlear degeneration that affects only thresholds at first, and then rapidly diminishes speech intelligibility. The expected result is that, after normal language development, the patient may accept hearing aids well, encouraged by the support of the family. Performance and satisfaction with the hearing aids is good, until the onset of the speech intelligibility loss, at which time the patient will encounter serious difficulties and may reject hearing aids as unhelpful. As the histological and electrophysiological results indicate, however, the eighth nerve remains viable, especially in the younger affected members, and success with cochlear implantation may be expected. Audiologic counseling efforts are aided by the presence of role models and support from the other affected members of the family. Speech-language pathology services were not considered important by the members of this family since their speech production developed normally and has remained very good. Self-correction of speech was supported by hearing aids and cochlear implants (Case 5’s speech production was documented in Perkell, Lane, Svirsky, & Webster, 1992). These patients received genetic counseling and, due to the high penetrance of the disease, exhibited serious concerns regarding future generations and the hope of a cure.


2008 ◽  
Author(s):  
D. L. McMullin ◽  
A. R. Jacobsen ◽  
D. C. Carvan ◽  
R. J. Gardner ◽  
J. A. Goegan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document