scholarly journals EMBL2checklists: A Python package to facilitate the user-friendly submission of plant DNA barcoding sequences to ENA

2018 ◽  
Author(s):  
Michael Gruenstaeudl ◽  
Yannick Hartmaring

AbstractBackgroundThe submission of DNA sequences to public sequence databases is an essential, but insufficiently automated step in the process of generating and disseminating novel DNA sequence data. Despite the centrality of database submissions to biological research, the range of available software tools that facilitate the preparation of sequence data for database submissions is low, especially for sequences generated via plant DNA barcoding. Current submission procedures can be complex and prohibitively time expensive for any but a small number of input sequences. A user-friendly software tool is needed that streamlines the file preparation for database submissions of DNA sequences that are commonly generated in plant DNA barcoding.MethodsA Python package was developed that converts DNA sequences from the common EMBL and GenBank flat file formats to submission-ready, tab-delimited spreadsheets (so-called “checklists”) for a subsequent upload to the public sequence database of the European Nucleotide Archive (ENA). The software tool, titled “EMBL2checklists”, automatically converts DNA sequences, their annotation features, and associated metadata into the idiosyncratic format of marker-specific ENA checklists and, thus, generates output that can be uploaded via the interactive Webin submission system of ENA.ResultsEMBL2checklists provides a simple, platform-independent tool that automates the conversion of common plant DNA barcoding sequences into easily editable spreadsheets that require no further processing but their upload to ENA via the interactive Webin submission system. The software is equipped with an intuitive graphical as well as an efficient command-line interface for its operation. The utility of the software is illustrated by its application in the submission of DNA sequences of two recent plant phylogenetic investigations and one fungal metagenomic study.DiscussionEMBL2checklists bridges the gap between common software suites for DNA sequence assembly and annotation and the interactive data submission process of ENA. It represents an easy-to-use solution for plant biologists without bioinformatics expertise to generate submission-ready checklists from common plant DNA sequence data. It allows the post-processing of checklists as well as work-sharing during the submission process and solves a critical bottleneck in the effort to increase participation in public data sharing.

Genetics ◽  
1993 ◽  
Vol 134 (4) ◽  
pp. 1195-1204
Author(s):  
S Tarès ◽  
J M Cornuet ◽  
P Abad

Abstract An AluI family of highly reiterated nontranscribed sequences has been found in the genome of the honeybee Apis mellifera. This repeated sequence is shown to be present at approximately 23,000 copies per haploid genome constituting about 2% of the total genomic DNA. The nucleotide sequence of 10 monomers was determined. The consensus sequences is 176 nucleotides long and has an A + T content of 58%. There are clusters of both direct and inverted repeats. Internal subrepeating units ranging from 11 to 17 nucleotides are observed, suggesting that it could have evolved from a shorter sequence. DNA sequence data reveal that this repeat class is unusually homogeneous compared to the other class of invertebrate highly reiterated DNA sequences. The average pairwise sequence divergence between the repeats is 2.5%. In spite of this unusual homogeneity, divergence has been found in the repeated sequence hybridization ladder between four different honeybee subspecies. Therefore, the AluI highly reiterated sequences provide a new probe for fingerprinting in A. m. mellifera.


Zootaxa ◽  
2012 ◽  
Vol 3361 (1) ◽  
pp. 56-62 ◽  
Author(s):  
JOSEFINA CURIEL ◽  
JUAN J. MORRONE

Insect life stages are known imperfectly in many cases, and classifications are usually based on adult morphology. This isunfortunate as information on other life stages may be useful for biomonitoring. The major impediment to using elmid(Coleoptera) larvae for freshwater biomonitoring is the lack of larval descriptions and illustrations. Reliable molecular proto-cols may be used to associate larvae and adults. After adults of seven species of Mexican Macrelmis were identified morpho-logically, seven larval specimens were associated to them based on two gene fragments: Cox1 and Cob. The phylogeneticanalysis allowed identifying the larval specimens as Macrelmis leonilae, M. scutellaris, M. species 7, M. species 10, and M.species 11. Two species based on adults associated uncertainly with one larva, and one larva did not match with any adult. Adult/larval association in elmids using DNA sequence data seems to be promising in terms of speed and reliability.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ying Zhang ◽  
Yupei Zhou ◽  
Wei Sun ◽  
Lili Zhao ◽  
D. Pavlic-Zupanc ◽  
...  

The genus Botryosphaeria includes more than 200 epithets, but only the type species, Botryosphaeria dothidea and a dozen or more other species have been identified based on DNA sequence data. The taxonomic status of the other species remains unconfirmed because they lack either morphological information or DNA sequence data. In this study, types or authentic specimens of 16 “Botryosphaeria” species are reassessed to clarify their identity and phylogenetic position. nuDNA sequences of four regions, ITS, LSU, tef1-α and tub2, are analyzed and considered in combination with morphological characteristics. Based on the multigene phylogeny and morphological characters, Botryosphaeria cruenta and Botryosphaeria hamamelidis are transferred to Neofusicoccum. The generic status of Botryosphaeria aterrima and Botryosphaeria mirabile is confirmed in Botryosphaeria. Botryosphaeria berengeriana var. weigeliae and B. berengeriana var. acerina are treated synonyms of B. dothidea. Botryosphaeria mucosa is transferred to Neodeightonia as Neodeightonia mucosa, and Botryosphaeria ferruginea to Nothophoma as Nothophoma ferruginea. Botryosphaeria foliicola is reduced to synonymy with Phyllachorella micheliae. Botryosphaeria abuensis, Botryosphaeria aesculi, Botryosphaeria dasylirii, and Botryosphaeria wisteriae are tentatively kept in Botryosphaeria sensu stricto until further phylogenetic analysis is carried out on verified specimens. The ordinal status of Botryosphaeria apocyni, Botryosphaeria gaubae, and Botryosphaeria smilacinina cannot be determined, and tentatively accommodate these species in Dothideomycetes incertae sedis. The study demonstrates the significance of a polyphasic approach in characterizing type specimens, including the importance of using of DNA sequence data.


2016 ◽  
Vol 36 (1) ◽  
Author(s):  
Paola Berchialla

We introduce a Bayesian hierarchical model for mitochondrial DNA sequence data, which is fitted via acceptance-rejection algorithms. The model incorporates parametric models of population history explicitly as well as a mutational process allowing for a simultaneous parameter estimation whose importance has become increasingly clear in many recent studies. The model is applied to a sample of DNA sequences from the Italian population.


2021 ◽  
Vol 22 (3) ◽  
pp. 505
Author(s):  
SONIA GIULIETTI ◽  
TIZIANA ROMAGNOLI ◽  
ALESSANDRA CAMPANELLI ◽  
CECILIA TOTTI ◽  
STEFANO ACCORONI

The ecology and seasonality of Pseudo-nitzschia species and their contribution to phytoplankton community were analysed for the first time at the coastal station of the LTER-Senigallia-Susak transect (north-western Adriatic Sea) from 1988 to 2020. Species composition was addressed using DNA sequence data obtained from 106 monoclonal strains isolated from January 2018 to January 2020. The mean annual cycle of total phytoplankton in the study period (Feb 1988–Jan 2020) showed maximum abundances in winter followed by other peaks in spring and autumn. Diatoms were the main contributors in terms of abundance during the winter and the spring blooms. The autumn peak was due to phytoflagellates and diatoms. In summer phytoflagellates dominated the community, followed by diatoms and dinoflagellates, which in this season reached their annual maximum. Pseudo-nitzschia spp. represented on average 0.4–17.6% of diatom community, but during their blooms they could reach up to up to 90% of the total diatom abundances with 106 cells l-1. By LM, six different taxa were recognized: Pseudo-nitzschia cf. delicatissima and P. cf. pseudodelicatissima were the most abundant, followed by P. cf. fraudulenta, P. pungens, P. multistriata and P. cf. galaxiae. P. cf. fraudulenta and P. pungens were indicator taxa of winter. P. cf. delicatissima and P. cf. pseudodelicatissima were spring and summer taxa, respectively. P. galaxiae showed maximum abundances in autumn. DNA sequences revealed the presence of two species belonging to the ’P. seriata group’ (i.e. P. fraudulenta and P. pungens) and four species belonging to the ‘P. delicatissima group’ (P. calliantha and P. mannii within the P. pseudodelicatissima species complex, and P. delicatissima and P. cf. arenysensis within the P. delicatissima species complex). The presence of several cryptic and pseudo-cryptic species highlights the need to combine LM observations with DNA sequence data when the ecology of Pseudo-nitzschia is investigated. 


2019 ◽  
pp. 145-172
Author(s):  
Glenn-Peter Sætre ◽  
Mark Ravinet

The allelic evolutionary genetic models explored so far are applicable to genetic markers. However, DNA sequences harbor a lot of information about the evolutionary past that would be missed if different sequences were simply treated as different alleles. This chapter introduces some important methods and concepts applicable to the analysis of DNA-sequence data. The null models for analyzing sequence data are derived from the neutral theory of molecular evolution. Historically, however, the neutral theory has made a large impact on evolutionary genetics. Therefore, this chapter starts by reviewing its important contribution. Then, important parameters and statistics for analyzing sequence variation are introduced, including a plethora of neutrality tests. The chapter ends with a cautious focus on the powerful tool of genome scan analysis and its utility for identifying regions of the genomes potentially under selection. This includes a section on more recently derived statistics which incorporate information on haplotype structure.


1999 ◽  
Vol 37 (12) ◽  
pp. 3957-3964 ◽  
Author(s):  
Kerstin Voigt ◽  
Elizabeth Cigelnik ◽  
Kerry O'donnell

A molecular database for all clinically important Zygomycetes was constructed from nucleotide sequences from the nuclear small-subunit (18S) ribosomal DNA and domains D1 and D2 of the nuclear large-subunit (28S) ribosomal DNA. Parsimony analysis of the aligned 18S and 28S DNA sequences was used to investigate phylogenetic relationships among 42 isolates representing species of Zygomycetes reported to cause infections in humans and other animals, together with commonly cultured contaminants, with emphasis on members of the Mucorales. The molecular phylogeny provided strong support for the monophyly of the Mucorales, exclusive of Echinosporangium transversale andMortierella spp., which are currently misclassified within the Mucorales. Micromucor ramannianus, traditionally classified within Mortierella, and Syncephalastrum racemosum represent the basal divergences within the Mucorales. Based on the 18S gene tree topology, Absidia corymbiferaand Rhizomucor variabilis appear to be misplaced taxonomically. A. corymbifera is strongly supported as a sister group of the Rhizomucor miehei-Rhizomucor pusillusclade, while R. variabilis is nested withinMucor. The aligned 28S sequences were used to design 13 taxon-specific PCR primer pairs for those taxa most commonly implicated in infections. All of the primers specifically amplified DNA of the size predicted based on the DNA sequence data from the target taxa; however, they did not cross-react with phylogenetically related species. These primers have the potential to be used in a PCR assay for the rapid and accurate identification of the etiological agents of mucormycoses and entomophthoromycoses.


2019 ◽  
Vol 86 (1) ◽  
pp. 42-55 ◽  
Author(s):  
Fabio Crocetta ◽  
Luigi Caputi ◽  
Sofia Paz-Sedano ◽  
Valentina Tanduo ◽  
Angelo Vazzana ◽  
...  

Abstract Genetic connectivity plays a crucial role in shaping the geographic structure of species. Our aim in this study was to explore the pattern of genetic connectivity in Bursa scrobilator, an iconic marine caenogastropod with long-lived pelagic larvae. Our study was based on the analysis of DNA sequence data for the 658-bp barcoding fragment of the mitochondrial cytochrome c oxidase subunit I (COI) gene. This is the largest DNA sequence dataset assembled to date for B. scrobilator. These data confirm that the two recently described subspecies B. scrobilator scrobilator (Linnaeus, 1758), from the Mediterranean and Macaronesia, and B. s. coriacea (Reeve, 1844), from West Africa, constitute two evolutionarily significant units (ESUs). We found that for the nominal subspecies, the variation in morphology (shell, radula and gross anatomy) and DNA sequences was not geographically structured, and this agrees with what we would expect in a species with high connectivity at the larval stage. The divergence between the two subspecies cannot be easily explained by isolation by distance, and we would argue that one or more extrinsic factors may have played a role in isolating the two ESUs and maintaining that isolation.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Maleeha Najam ◽  
Raihan Ur Rasool ◽  
Hafiz Farooq Ahmad ◽  
Usman Ashraf ◽  
Asad Waqar Malik

Storing and processing of large DNA sequences has always been a major problem due to increasing volume of DNA sequence data. However, a number of solutions have been proposed but they require significant computation and memory. Therefore, an efficient storage and pattern matching solution is required for DNA sequencing data. Bloom filters (BFs) represent an efficient data structure, which is mostly used in the domain of bioinformatics for classification of DNA sequences. In this paper, we explore more dimensions where BFs can be used other than classification. A proposed solution is based on Multiple Bloom Filters (MBFs) that finds all the locations and number of repetitions of the specified pattern inside a DNA sequence. Both of these factors are extremely important in determining the type and intensity of any disease. This paper serves as a first effort towards optimizing the search for location and frequency of substrings in DNA sequences using MBFs. We expect that further optimizations in the proposed solution can bring remarkable results as this paper presents a proof of concept implementation for a given set of data using proposed MBFs technique. Performance evaluation shows improved accuracy and time efficiency of the proposed approach.


2021 ◽  
Vol 11 (2) ◽  
pp. 3542-3548

Identification is a very important part of the taxonomy. Since a species represents the basic unit of biological classification, identifying species is important to understand the systematics and the precise phylogenetic position of particular species. In recent years, species identification and delimitation have seen major improvements because of the incorporation of DNA sequence data. This review provides a comprehensive list of commonly employed nuclear and chloroplast regions used for the barcoding of plants.


Sign in / Sign up

Export Citation Format

Share Document