Depositing annotated sequences in GenBank: there needs to be a better way

2020 ◽  
Vol 19 (5-6) ◽  
pp. 337-338
Author(s):  
David Roy Smith

Abstract Submitting sequences to the National Center for Biotechnology Information (NCBI) is an integral part of research and the publication process for many disciplines within the life sciences, and it will only become more important as sequencing technologies continue to improve. Here, I argue that the available infrastructure and resources for uploading data to NCBI—especially the associated annotations of eukaryotic genomes—are inefficient, hard to use and sometimes just plain bad. This, in turn, is causing some researchers to forgo annotations entirely in their submissions. The time is overdue for the development of sophisticated, user-friendly software for depositing annotated sequences in GenBank.

Author(s):  
Roman Martin ◽  
Thomas Hackl ◽  
Georges Hattab ◽  
Matthias G Fischer ◽  
Dominik Heider

Abstract Motivation The generation of high-quality assemblies, even for large eukaryotic genomes, has become a routine task for many biologists thanks to recent advances in sequencing technologies. However, the annotation of these assemblies—a crucial step toward unlocking the biology of the organism of interest—has remained a complex challenge that often requires advanced bioinformatics expertise. Results Here, we present MOSGA (Modular Open-Source Genome Annotator), a genome annotation framework for eukaryotic genomes with a user-friendly web-interface that generates and integrates annotations from various tools. The aggregated results can be analyzed with a fully integrated genome browser and are provided in a format ready for submission to NCBI. MOSGA is built on a portable, customizable and easily extendible Snakemake backend, and thus, can be tailored to a wide range of users and projects. Availability and implementation We provide MOSGA as a web service at https://mosga.mathematik.uni-marburg.de and as a docker container at registry.gitlab.com/mosga/mosga: latest. Source code can be found at https://gitlab.com/mosga/mosga Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Veli-Matti Karhulahti ◽  
Hans-Joachim Backe

Abstract Background Open peer review practices are increasing in medicine and life sciences, but in social sciences and humanities (SSH) they are still rare. We aimed to map out how editors of respected SSH journals perceive open peer review, how they balance policy, ethics, and pragmatism in the review processes they oversee, and how they view their own power in the process. Methods We conducted 12 pre-registered semi-structured interviews with editors of respected SSH journals. Interviews consisted of 21 questions and lasted an average of 67 min. Interviews were transcribed, descriptively coded, and organized into code families. Results SSH editors saw anonymized peer review benefits to outweigh those of open peer review. They considered anonymized peer review the “gold standard” that authors and editors are expected to follow to respect institutional policies; moreover, anonymized review was also perceived as ethically superior due to the protection it provides, and more pragmatic due to eased seeking of reviewers. Finally, editors acknowledged their power in the publication process and reported strategies for keeping their work as unbiased as possible. Conclusions Editors of SSH journals preferred the benefits of anonymized peer review over open peer and acknowledged the power they hold in the publication process during which authors are almost completely disclosed to editorial bodies. We recommend journals to communicate the transparency elements of their manuscript review processes by listing all bodies who contributed to the decision on every review stage.


2019 ◽  
Vol 36 (5) ◽  
pp. 1647-1648 ◽  
Author(s):  
Bilal Wajid ◽  
Hasan Iqbal ◽  
Momina Jamil ◽  
Hafsa Rafique ◽  
Faria Anwar

Abstract Motivation Metabolomics is a data analysis and interpretation field aiming to study functions of small molecules within the organism. Consequently Metabolomics requires researchers in life sciences to be comfortable in downloading, installing and scripting of software that are mostly not user friendly and lack basic GUIs. As the researchers struggle with these skills, there is a dire need to develop software packages that can automatically install software pipelines truly speeding up the learning curve to build software workstations. Therefore, this paper aims to provide MetumpX, a software package that eases in the installation of 103 software by automatically resolving their individual dependencies and also allowing the users to choose which software works best for them. Results MetumpX is a Ubuntu-based software package that facilitate easy download and installation of 103 tools spread across the standard metabolomics pipeline. As far as the authors know MetumpX is the only solution of its kind where the focus lies on automating development of software workstations. Availability and implementation https://github.com/hasaniqbal777/MetumpX-bin. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Stáphane Deschamps ◽  
Yun Zhang ◽  
Victor Llaca ◽  
Liang Ye ◽  
Gregory May ◽  
...  

The advent of long-read sequencing technologies has greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer were combined with BioNano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final hybrid assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 value of 33.28Mbps and covers >90% of Sorghum bicolor expected genome length. A sequence accuracy of 99.67% was obtained in unique regions after aligning contigs against Illumina Tx430 data. Alignments showed that 99.4% of the 34,211 public gene models are present in the assembly, including 94.2% mapping end-to-end. Comparisons of the DLS optical maps against the public Sorghum Bicolor v3.0.1 BTx623 genome assembly suggest the presence of substantial genomic rearrangements whose origin remains to be determined.


2020 ◽  
Author(s):  
Marius Welzel ◽  
Anja Lange ◽  
Dominik Heider ◽  
Michael Schwarz ◽  
Bernd Freisleben ◽  
...  

AbstractSequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires effcient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an effcient workflow management system. We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix).


2020 ◽  
Vol 16 (4) ◽  
Author(s):  
Marcel Friedrichs ◽  
Alban Shoshi ◽  
Piotr Jaroslaw Chmura ◽  
Jon Ison ◽  
Veit Schwämmle ◽  
...  

AbstractJIB.tools 2.0 is a new approach to more closely embed the curation process in the publication process. This website hosts the tools, software applications, databases and workflow systems published in the Journal of Integrative Bioinformatics (JIB). As soon as a new tool-related publication is published in JIB, the tool is posted to JIB.tools and can afterwards be easily transferred to bio.tools, a large information repository of software tools, databases and services for bioinformatics and the life sciences. In this way, an easily-accessible list of tools is provided which were published in JIB a well as status information regarding the underlying service. With newer registries like bio.tools providing these information on a bigger scale, JIB.tools 2.0 closes the gap between journal publications and registry publication. (Reference: https://jib.tools).


2019 ◽  
Vol 81 (7) ◽  
pp. 520-523 ◽  
Author(s):  
Jesper Haglund ◽  
Konrad J. Schönborn

Thermal imagery provides new opportunities to study concepts and processes in biology. Examples include using infrared (IR) cameras in educational activities to explore energy transfer and transformation in human physiology, animal thermoregulation, and plant metabolism. The user-friendly and visually intuitive nature of IR technology is well suited to the study of rapidly changing temperatures on biological surfaces, due to such energy transfers. IR cameras are therefore potentially helpful pedagogical tools for approaching the Energy and Matter crosscutting concept in the Life Sciences discipline of the Next Generation Science Standards.


2021 ◽  
Author(s):  
Pavel Vanacek ◽  
Michal Vasina ◽  
Jiri Hon ◽  
David Kovar ◽  
Hana Faldynova ◽  
...  

<p>Next-generation sequencing technologies enable doubling of the genomic databases every 2.5 years. Collected sequences represent a rich source of novel biocatalysts. However, the rate of accumulation of sequence data exceeds the rate of functional studies, calling for acceleration and miniaturization of biochemical assays. Here, we present an integrated platform employing bioinformatics, <a></a><a>microanalytics, </a>and microfluidics and its application for exploration of unmapped sequence space, using haloalkane dehalogenases as model enzymes. First, we employed bioinformatic analysis for identification of 2,905 putative dehalogenases and rational selection of 45 representative enzymes. Second, we expressed and experimentally characterized 24 enzymes showing sufficient solubility for microanalytical and microfluidic testing. Miniaturization increased the throughput to 20,000 reactions per day with 1000-fold lower protein consumption compared to conventional assays. A single run of the platform doubled dehalogenation toolbox of family members characterized over three decades. Importantly, the dehalogenase activities of nearly one-third of these novel biocatalysts far exceed that of most published HLDs. Two enzymes showed unusually narrow substrate specificity, never before reported for this enzyme family. The strategy is generally applicable to other enzyme families, paving the way towards the acceleration of the process of identification of novel biocatalysts for industrial applications but also for the collection of homogenous data for machine learning. The automated <i>in silico</i> workflow has been released as a user-friendly web-tool EnzymeMiner: https://loschmidt.chemi.muni.cz/enzymeminer/.</p>


2021 ◽  
Vol 1 ◽  
Author(s):  
Xi Zhang ◽  
Yining Hu ◽  
David Roy Smith

Gene duplication is an important evolutionary mechanism capable of providing new genetic material for adaptive and nonadaptive evolution. However, bioinformatics tools for identifying duplicate genes are often limited to the detection of paralogs in multiple species or to specific types of gene duplicates, such as retrocopies. Here, we present a user-friendly, BLAST-based web tool, called HSDFinder, which can identify, annotate, categorize, and visualize highly similar duplicate genes (HSDs) in eukaryotic nuclear genomes. HSDFinder includes an online heatmap plotting option, allowing users to compare HSDs among different species and visualize the results in different Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway functional categories. The external software requirements are BLAST, InterProScan, and KEGG. The utility of HSDFinder was tested on various model eukaryotic species, including Chlamydomonas reinhardtii, Arabidopsis thaliana, Oryza sativa, and Zea mays as well as the psychrophilic green alga Chlamydomonas sp. UWO241, and was proven to be a practical and accurate tool for gene duplication analyses. The web tool is free to use at http://hsdfinder.com. Documentation and tutorials can be found via the GitHub: https://github.com/zx0223winner/HSDFinder.


2017 ◽  
Author(s):  
Julian Garneau ◽  
Florence Depardieu ◽  
Louis-Charles Fortier ◽  
David Bikard ◽  
Marc Monot

ABSTRACTBacteriophages are the most abundant viruses on earth and display an impressive genetic as well as morphologic diversity. Among those, the most common order of phages is the Caudovirales, whose viral particles packages linear double stranded DNA (dsDNA). In this study we investigated how the information gathered by high throughput sequencing technologies can be used to determine the DNA termini and packaging mechanisms of dsDNA phages. The wet-lab procedures traditionally used for this purpose rely on the identification and cloning of restriction fragment which can be delicate and cumbersome. Here, we developed a theoretical and statistical framework to analyze DNA termini and phage packaging mechanisms using next-generation sequencing data. Our methods, implemented in the PhageTerm software, work with sequencing reads in fastq format and the corresponding assembled phage genome.PhageTerm was validated on a set of phages with well-established packaging mechanisms representative of the termini diversity: 5’cos (lambda), 3’cos (HK97), pac (P1), headful without a pac site (T4), DTR (T7) and host fragment (Mu). In addition, we determined the termini of 9Clostridium difficilephages and 6 phages whose sequences where retrieved from the sequence read archive (SRA).A direct graphical interface is available as a Galaxy wrapper version athttps://galaxy.pasteur.frand a standalone version is accessible athttps://sourceforge.net/projects/phageterm/.


Sign in / Sign up

Export Citation Format

Share Document