Report from the Field: PubMed Central, an XML-based Archive of Life Sciences Journal Articles

Author(s):  
Jeff Beck

PubMed Central (PMC) is an XML-based archive of life sciences journal literature at the U.S. National Institutes of Heath that allows public access to full-text journal articles. The archive was created in 2000 and has grown steadily to over 2 million records. The project has been successful in part because of the strict XML control and the flexibility that PMC givesre its submitters. This paper gives an overview of the PMC data evaluation process; the XML processing model; the PMC philosophy toward XML use, including use of the NLM DTD, XML Taggging Style, usability or reusablilty of the XML, public XML tools, and our people; and some challenges we continue to face maintaining the archive.

Author(s):  
Jeffrey D. Beck

PubMed Central (PMC) is a free full-text XML-based archive of biomedical and life sciences journal literature at the U.S. National Library of Medicine. Publishers submit XML, images, and supplemental files for their articles, the text converts to a common JATS XML, and they load to the database cleanly. The power of XML compels it! But that is not the whole story (or even a true story). Policies, miscommunications, and technical misunderstandings conspire against our Utopian XML workflow. We will share the details of how we get 30,000 new articles into the archive each month.


Author(s):  
Martin Latterner ◽  
Dax Bamberger ◽  
Kelly Peters ◽  
Jeffrey D. Beck

PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine. PMC receives about 70,000 XML articles every month and uses XSLT to convert them into its preferred format. In 2021, PMC started to explore options to modernize its extensive conversion codebase leveraging XSLT 3.0. This paper describes XML conversion and its challenges at PMC. It then outlines the first approach that PMC is evaluating: breaking a single conversion operation into multiple, dynamic transformations using fn:transform, one of the powerful new tools available with XSLT 3.0.


2009 ◽  
Author(s):  
Martin Fenner

PubMed Central was launched in February 2000 by the U.S. National Institutes of Health (NIH) as a free digital archive of journal articles. Just as PubMed, PubMed Central covers research in the life sciences, but not other areas of research, e.g. ...


2020 ◽  
Vol 35 (2) ◽  
pp. 151-152
Author(s):  
Elizabeth Paz-Pacheco ◽  

Amid the uncertainties and challenges brought on by the COVID-19 pandemic, we celebrate another major milestone in the continuing journey of the JAFES. We formally announce here our acceptance to PubMed Central after being included in Scopus and Clarivate Analytics Emerging Sources Citation Index in the last 2 years. Launched in 2000, PubMed Central is a free archive of full-text biomedical and life sciences journal articles, serving as a digital counterpart to the print journal collection of the US National Library of Medicine. As a participating journal, JAFES shall be depositing full text articles starting from 2017 and these shall be available 100% open access and searchable also in MedLine.


2014 ◽  
Vol 33 (3) ◽  
pp. 5 ◽  
Author(s):  
Leslie A. Williams ◽  
Lynne M Fox ◽  
Christophe Roeder ◽  
Lawrence Hunter

<p>This case study examines strategies used to leverage the library’s existing journal licenses to obtain a large collection of full-text journal articles in extensible markup language (XML) format; the right to text mine the collection; and the right to use the collection and the data mined from it for grant-funded research to develop biomedical natural language processing (BNLP) tools. Researchers attempted to obtain content directly from PubMed Central (PMC). This attempt failed due to limits on use of content in PMC. Next researchers and their library liaison attempted to obtain content from contacts in the technical divisions of the publishing industry. This resulted in an incomplete research data set. Then researchers, the library liaison, and the acquisitions librarian collaborated with the sales and technical staff of a major science, technology, engineering, and medical (STEM) publisher to successfully create a method for obtaining XML content as an extension of the library’s typical acquisition process for electronic resources. Our experience led us to realize that text mining rights of full-text articles in XML format should routinely be included in the negotiation of the library’s licenses.</p>


2016 ◽  
Vol 72 (3) ◽  
pp. 454-489 ◽  
Author(s):  
Scott Hamilton Dewey

Purpose – The purpose of this paper is to provide a close, detailed analysis of the frequency, nature, and depth of visible use of two of Foucault’s classic early works, The Archaeology of Knowledge and The Order of Things, by library, and information science/studies (LIS) scholars. Design/methodology/approach – The study involved conducting extensive full-text searches in a large number of electronically available LIS journal databases to find citations of Foucault’s works, then examining each citing article and each individual citation to evaluate the nature and depth of each use. Findings – Contrary to initial expectations, the works in question are relatively little used by LIS scholars in journal articles, and where they are used, such use is often only vague, brief, or in passing. In short, works traditionally seen as central and foundational to discourse analysis appear relatively little in discussions of discourse. Research limitations/implications – The study was limited to a certain batch of LIS journal articles that are electronically available in full text at UCLA, where the study was conducted. The results potentially could change by focussing on a fuller or different collection of journals or on non-journal literature. More sophisticated bibliometric techniques could reveal different relative performance among journals. Other research approaches, such as discourse analysis, social network analysis, or scholar interviews, might reveal patterns of use and influence that are not visible in the journal literature. Originality/value – This study’s intensive, in-depth study of quality as well as quantity of citations challenges some existing assumptions regarding citation analysis and the sociology of citation practices, plus illuminating Foucault scholarship.


2015 ◽  
Vol 6 (1) ◽  
pp. 1 ◽  
Author(s):  
Şenay Kafkas ◽  
Jee-Hyub Kim ◽  
Xingjun Pi ◽  
Johanna R McEntyre

2010 ◽  
Vol 10 (3) ◽  
pp. 187-190 ◽  
Author(s):  
Claire Duffield ◽  
Sarah Fallon ◽  
Jean Stopford

AbstractThe team responsible for Legal Journals Index explain how journal articles are selected, indexed and loaded to this online legal information service provided by Sweet & Maxwell. They outline the history of LJI and discuss the criteria for determining which journals are included in the service; how the Articles team decides which articles will be indexed; the content of an LJI index entry; how an abstract is written; the use of the taxonomy; the full text journals service on Westlaw; and the work of the Document Delivery team.


2017 ◽  
Vol 7 (1) ◽  
pp. 131
Author(s):  
Deny Arnos Kwary ◽  
Dewantoro Ratri ◽  
Almira F. Artha

This study focuses on the use of lexical bundles (LBs), their structural forms, and their functional classifications in journal articles of four academic disciplines: Health sciences, Life sciences, Physical sciences, and Social sciences. The corpus comprises 2,937,431 words derived from 400 journal articles which were equally distributed in the four disciplines. The results show that Physical sciences feature the most number of lexical bundles, while Health sciences comprise the least. When we pair-up the disciplines, we found that Physical sciences and Social sciences shared the most number of LBs. We also found that there were no LBs shared between Health sciences and Physical sciences, and neither between Health sciences and Social sciences. For the distribution of the structural forms, we found that the prepositional-based and the verb-based bundles were the most frequent forms (each of them accounts for 37.1% of the LBs, making a total of 74.2%). Within the verb-based bundles, the passive form can be found in 12 out of 23 LB types. Finally, for the functional classifications, the number of referential expressions (40 LBs) is a lot higher than those of discourse organizers (12 LBs) and stance expressions (10 LBs). The high frequency of LBs in the referential expressions can be related to the needs to refer to theories, concepts, data and findings of the study.


Sign in / Sign up

Export Citation Format

Share Document