scholarly journals Working with batches of PDF files

2020 ◽  
Author(s):  
Moritz Mähr

Learn how to perform OCR and text extraction with free command line tools like Tesseract and Poppler and how to get an overview of large numbers of PDF documents using topic modeling.

2019 ◽  
Author(s):  
Charlotte A. Darby ◽  
Ravi Gaddipati ◽  
Michael C. Schatz ◽  
Ben Langmead

AbstractRead alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these “gold standard” Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-MEM, and vg to align more reads correctly. Source code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license.


2013 ◽  
Vol 99 (1) ◽  
pp. 39-56 ◽  
Author(s):  
Ondřej Bojar ◽  
Aleš Tamchyna

Abstract We present eman, a tool for managing large numbers of computational experiments. Over the years of our research in machine translation (MT), we have collected a couple of ideas for efficient experimenting. We believe these ideas are generally applicable in (computational) research of any field. We incorporated them into eman in order to make them available in a command-line Unix environment. The aim of this article is to highlight the core of the many ideas. We hope the text can serve as a collection of experiment management tips and tricks for anyone, regardless their field of study or computer platform they use. The specific examples we provide in eman’s current syntax are less important but they allow us to use concrete terms. The article thus also fills the gap in eman documentation by providing some high-level overview.


2020 ◽  
Vol 36 (12) ◽  
pp. 3712-3718
Author(s):  
Charlotte A Darby ◽  
Ravi Gaddipati ◽  
Michael C Schatz ◽  
Ben Langmead

Abstract Motivation Read alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. Results Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these ‘gold standard’ Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-maximal exact match and vg to align more reads correctly. Availability and implementation Source code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Jose Sergio Hleap ◽  
Melania E. Cristescu ◽  
Dirk Steinke

AbstractSummaryAmplicons to Global Gene (A2G2) is a Python wrapper that uses MAFFT and an “Amplicon to Gene” strategy to align very large numbers of sequences while improving alignment accuracy. It is specially developed to deal with conserved genes, where traditional aligners introduce a significant amount of gaps. A2G2 leverages the add sequences option of MAFFT to align the sequences to a global reference gene and a local reference region. Both of these references can be consensus sequences of trusted sources. Efficient parallelization of these tasks allows A2G2 to align a very large number of sequences (> 500K) in a reasonable amount of time. A2G2 can be imported in Python for easier integration with other software, or can be run via command line.AvailabilityA2G2 is implemented in Python 3 (3.6) and depends on MAFFT availability. Other package requirements can be found in the requirements.txt file at https://github.com/jshleap/A2G. A2G2 is also available via PyPi (https://pypi.org/project/A2G). It is licensed under the LGPLv3.Supplementary informationSupplementary material is available at github as jupyter notebook.


Author(s):  
T. G. Merrill ◽  
B. J. Payne ◽  
A. J. Tousimis

Rats given SK&F 14336-D (9-[3-Dimethylamino propyl]-2-chloroacridane), a tranquilizing drug, developed an increased number of vacuolated lymphocytes as observed by light microscopy. Vacuoles in peripheral blood of rats and humans apparently are rare and are not usually reported in differential counts. Transforming agents such as phytohemagglutinin and pokeweed mitogen induce similar vacuoles in in vitro cultures of lymphocytes. These vacuoles have also been reported in some of the lipid-storage diseases of humans such as amaurotic familial idiocy, familial neurovisceral lipidosis, lipomucopolysaccharidosis and sphingomyelinosis. Electron microscopic studies of Tay-Sachs' disease and of chloroquine treated swine have demonstrated large numbers of “membranous cytoplasmic granules” in the cytoplasm of neurons, in addition to lymphocytes. The present study was undertaken with the purpose of characterizing the membranous inclusions and developing an experimental animal model which may be used for the study of lipid storage diseases.


Author(s):  
Robert Corbett ◽  
Delbert E. Philpott ◽  
Sam Black

Observation of subtle or early signs of change in spaceflight induced alterations on living systems require precise methods of sampling. In-flight analysis would be preferable but constraints of time, equipment, personnel and cost dictate the necessity for prolonged storage before retrieval. Because of this, various tissues have been stored in fixatives and combinations of fixatives and observed at various time intervals. High pressure and the effect of buffer alone have also been tried.Of the various tissues embedded, muscle, cartilage and liver, liver has been the most extensively studied because it contains large numbers of organelles common to all tissues (Fig. 1).


Author(s):  
Roy Skidmore

The long-necked secretory cells in Onchidoris muricata are distributed in the anterior sole of the foot. These cells are interspersed among ciliated columnar and conical cells as well as short-necked secretory gland cells. The long-necked cells contribute a significant amount of mucoid materials to the slime on which the nudibranch travels. The body of these cells is found in the subepidermal tissues. A long process extends across the basal lamina and in between cells of the epidermis to the surface of the foot. The secretory granules travel along the process and their contents are expelled by exocytosis at the foot surface.The contents of the cell body include the nucleus, some endoplasmic reticulum, and an extensive Golgi body with large numbers of secretory vesicles (Fig. 1). The secretory vesicles are membrane bound and contain a fibrillar matrix. At high magnification the similarity of the contents in the Golgi saccules and the secretory vesicles becomes apparent (Fig. 2).


Author(s):  
C. C. Clawson ◽  
L. W. Anderson ◽  
R. A. Good

Investigations which require electron microscope examination of a few specific areas of non-homogeneous tissues make random sampling of small blocks an inefficient and unrewarding procedure. Therefore, several investigators have devised methods which allow obtaining sample blocks for electron microscopy from region of tissue previously identified by light microscopy of present here techniques which make possible: 1) sampling tissue for electron microscopy from selected areas previously identified by light microscopy of relatively large pieces of tissue; 2) dehydration and embedding large numbers of individually identified blocks while keeping each one separate; 3) a new method of maintaining specific orientation of blocks during embedding; 4) special light microscopic staining or fluorescent procedures and electron microscopy on immediately adjacent small areas of tissue.


Author(s):  
J.M. Titchmarsh

The advances in recent years in the microanalytical capabilities of conventional TEM's fitted with probe forming lenses allow much more detailed investigations to be made of the microstructures of complex alloys, such as ferritic steels, than have been possible previously. In particular, the identification of individual precipitate particles with dimensions of a few tens of nanometers in alloys containing high densities of several chemically and crystallographically different precipitate types is feasible. The aim of the investigation described in this paper was to establish a method which allowed individual particle identification to be made in a few seconds so that large numbers of particles could be examined in a few hours.A Philips EM400 microscope, fitted with the scanning transmission (STEM) objective lens pole-pieces and an EDAX energy dispersive X-ray analyser, was used at 120 kV with a thermal W hairpin filament. The precipitates examined were extracted using a standard C replica technique from specimens of a 2¼Cr-lMo ferritic steel in a quenched and tempered condition.


Author(s):  
H. J. Arnott ◽  
M. A. Webb ◽  
L. E. Lopez

Many papers have been published on the structure of calcium oxalate crystals in plants, however, few deal with the early development of crystals. Large numbers of idioblastic calcium oxalate crystal cells are found in the leaves of Vitis mustangensis, V. labrusca and V. vulpina. A crystal idioblast, or raphide cell, will produce 150-300 needle-like calcium oxalate crystals within a central vacuole. Each raphide crystal is autonomous, having been produced in a separate membrane-defined crystal chamber; the idioblast''s crystal complement is collectively embedded in a water soluble glycoprotein matrix which fills the vacuole. The crystals are twins, each having a pointed and a bidentate end (Fig 1); when mature they are about 0.5-1.2 μn in diameter and 30-70 μm in length. Crystal bundles, i.e., crystals and their matrix, can be isolated from leaves using 100% ETOH. If the bundles are treated with H2O the matrix surrounding the crystals rapidly disperses.


Sign in / Sign up

Export Citation Format

Share Document