scholarly journals Increasing the power of interpretation for soil metaproteomics data

Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Virginie Jouffret ◽  
Guylaine Miotello ◽  
Karen Culotta ◽  
Sophie Ayrault ◽  
Olivier Pible ◽  
...  

Abstract Background Soil and sediment microorganisms are highly phylogenetically diverse but are currently largely under-represented in public molecular databases. Their functional characterization by means of metaproteomics is usually performed using metagenomic sequences acquired for the same sample. However, such hugely diverse metagenomic datasets are difficult to assemble; in parallel, theoretical proteomes from isolates available in generic databases are of high quality. Both these factors advocate for the use of theoretical proteomes in metaproteomics interpretation pipelines. Here, we examined a number of database construction strategies with a view to increasing the outputs of metaproteomics studies performed on soil samples. Results The number of peptide-spectrum matches was found to be of comparable magnitude when using public or sample-specific metagenomics-derived databases. However, numbers were significantly increased when a combination of both types of information was used in a two-step cascaded search. Our data also indicate that the functional annotation of the metaproteomics dataset can be maximized by using a combination of both types of databases. Conclusions A two-step strategy combining sample-specific metagenome database and public databases such as the non-redundant NCBI database and a massive soil gene catalog allows maximizing the metaproteomic interpretation both in terms of ratio of assigned spectra and retrieval of function-derived information.

2021 ◽  
Vol 26 (6) ◽  
Author(s):  
Pooja Rani ◽  
Sebastiano Panichella ◽  
Manuel Leuenberger ◽  
Mohammad Ghafari ◽  
Oscar Nierstrasz

Abstract Context Previous studies have characterized code comments in various programming languages, showing how high quality of code comments is crucial to support program comprehension activities, and to improve the effectiveness of maintenance tasks. However, very few studies have focused on understanding developer practices to write comments. None of them has compared such developer practices to the standard comment guidelines to study the extent to which developers follow the guidelines. Objective Therefore, our goal is to investigate developer commenting practices and compare them to the comment guidelines. Method This paper reports the first empirical study investigating commenting practices in Pharo Smalltalk. First, we analyze class comment evolution over seven Pharo versions. Then, we quantitatively and qualitatively investigate the information types embedded in class comments. Finally, we study the adherence of developer commenting practices to the official class comment template over Pharo versions. Results Our results show that there is a rapid increase in class comments in the initial three Pharo versions, while in subsequent versions developers added comments to both new and old classes, thus maintaining a similar code to comment ratio. We furthermore found three times as many information types in class comments as those suggested by the template. However, the information types suggested by the template tend to be present more often than other types of information. Additionally, we find that a substantial proportion of comments follow the writing style of the template in writing these information types, but they are written and formatted in a non-uniform way. Conclusion The results suggest the need to standardize the commenting guidelines for formatting the text, and to provide headers for the different information types to ensure a consistent style and to identify the information easily. Given the importance of high-quality code comments, we draw numerous implications for developers and researchers to improve the support for comment quality assessment tools.


2021 ◽  
Author(s):  
Saumya Agrawal ◽  
Tanvir Alam ◽  
Masaru Koido ◽  
Ivan V. Kulakovskiy ◽  
Jessica Severin ◽  
...  

AbstractTranscription of the human genome yields mostly long non-coding RNAs (lncRNAs). Systematic functional annotation of lncRNAs is challenging due to their low expression level, cell type-specific occurrence, poor sequence conservation between orthologs, and lack of information about RNA domains. Currently, 95% of human lncRNAs have no functional characterization. Using chromatin conformation and Cap Analysis of Gene Expression (CAGE) data in 18 human cell types, we systematically located genomic regions in spatial proximity to lncRNA genes and identified functional clusters of interacting protein-coding genes, lncRNAs and enhancers. Using these clusters we provide a cell type-specific functional annotation for 7,651 out of 14,198 (53.88%) lncRNAs. LncRNAs tend to have specialized roles in the cell type in which it is first expressed, and to incorporate more general functions as its expression is acquired by multiple cell types during evolution. By analyzing RNA-binding protein and RNA-chromatin interaction data in the context of the spatial genomic interaction map, we explored mechanisms by which these lncRNAs can act.


2015 ◽  
Vol 3 (5) ◽  
Author(s):  
Beibei Ge ◽  
Yan Liu ◽  
Binghua Liu ◽  
Kecheng Zhang

We report the first high-quality draft genome sequence of an antibiotic (wuyiencin)-producing strain, Streptomyces ahygroscopicus subsp. wuyiensis CK-15, isolated from soil samples collected from Fujian Province, China. The 9.41-Mb genome comprises 8,311 protein-coding sequences, encodes 89 structural RNAs, and shows a G+C content of 72.25%.


Processes ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 1196
Author(s):  
Karel Sedlar ◽  
Marketa Nykrynova ◽  
Matej Bezdicek ◽  
Barbora Branska ◽  
Martina Lengerova ◽  
...  

Clostridium beijerinckii is a relatively widely studied, yet non-model, bacterium. While 246 genome assemblies of its various strains are available currently, the diversity of the whole species has not been studied, and it has only been analyzed in part for a missing genome of the type strain. Here, we sequenced and assembled the complete genome of the type strain Clostridium beijerinckii DSM 791T, composed of a circular chromosome and a circular megaplasmid, and used it for a comparison with other genomes to evaluate diversity and capture the evolution of the whole species. We found that strains WB53 and HUN142 were misidentified and did not belong to the Clostridium beijerinckii species. Additionally, we filtered possibly misassembled genomes, and we used the remaining 237 high-quality genomes to define the pangenome of the whole species. By its functional annotation, we showed that the core genome contains genes responsible for basic metabolism, while the accessory genome has genes affecting final phenotype that may vary among different strains. We used the core genome to reconstruct the phylogeny of the species and showed its great diversity, which complicates the identification of particular strains, yet hides possibilities to reveal hitherto unreported phenotypic features and processes utilizable in biotechnology.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 23-24
Author(s):  
Kimberly M Davenport ◽  
Derek M Bickhart ◽  
Kim Worley ◽  
Shwetha C Murali ◽  
Noelle Cockett ◽  
...  

Abstract Sheep are an important agricultural species used for both food and fiber in the United States and globally. A high-quality reference genome enhances the ability to discover genetic and biological mechanisms influencing important traits, such as meat and wool quality. The rapid advances in genome assembly algorithms and emergence of increasingly long sequence read length provide the opportunity for an improved de novo assembly of the sheep reference genome. Tissue was collected postmortem from an adult Rambouillet ewe selected by USDA-ARS for the Ovine Functional Annotation of Animal Genomes project. Short-read (55x coverage), long-read PacBio (75x coverage), and Hi-C data from this ewe were retrieved from public databases. We generated an additional 50x coverage of Oxford Nanopore data and assembled the combined long-read data with canu v1.9. The assembled contigs were polished with Nanopolish v0.12.5 and scaffolded using Hi-C data with Salsa v2.2. Gaps were filled with PBsuite v15.8.24 and polished with Nanopolish v0.12.5 followed by removal of duplicate contigs with PurgeDups v1.0.1. Chromosomes were oriented by identifying centromeres and telomeres with RepeatMasker v4.1.1, indicating a need to reverse the orientation of chromosome 11 relative to Oar_rambouillet_v1.0. Final polishing was performed with two rounds of a pipeline which consisted of freebayes v1.3.1 to call variants, Merfin to validate them, and BCFtools to generate the consensus fasta. The ARS-UI_Ramb_v2.0 assembly has improved continuity (contig N50 of 43.19 Mb) with a 19-fold and 38-fold decrease in the number of scaffolds compared with Oar_rambouillet_v1.0 and Oar_v4.0. ARS-UI_Ramb_v2.0 has greater per-base accuracy and fewer insertions and deletions identified from mapped RNA sequence than previous assemblies. This significantly improved reference assembly, public at NCBI GenBank under accession number GCA_016772045, will optimize the functional annotation of the sheep genome and facilitate improved mapping accuracy of genetic variant and expression data for traits relevant the sheep industry.


2008 ◽  
Vol 45 (7) ◽  
pp. 1018-1024 ◽  
Author(s):  
Han-Eng Low ◽  
Kok-Kwang Phoon

A series of one-dimensional consolidation tests were performed under varying pretreatments on high quality soil samples collected from a Singapore upper marine clay layer in an attempt to evaluate the effect of cementation by amorphous materials on its compressibility. The findings from this study seem to suggest that cementation by ethylene-diamine tetraacetic acid (EDTA) removable amorphous materials may only partially contribute to the development of soil microstructure and overconsolidation in Singapore upper marine clay.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 148 ◽  
Author(s):  
Victoria Dominguez Del Angel ◽  
Erik Hjerde ◽  
Lieven Sterck ◽  
Salvadors Capella-Gutierrez ◽  
Cederic Notredame ◽  
...  

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).


2020 ◽  
Vol 4 (9) ◽  
Author(s):  
Kun Dang ◽  
Xiaolong Jiang

In the context of the current rapid innovation of electronic information technology, various schools have set up majors related to electronic information in order to cultivate high-quality talents required by social positions. As the electronic information course requires students to learn through knowledge, they can quickly grasp the content of the integration and processing of electronic systems and various types of information, and while having higher professional skills, they can participate in vocational skill competition activities and achieve better results. Results. Therefore, this article mainly discusses how to make the electronic information major innovate in the background of the current fierce development of vocational skill competitions, hoping that in the process of ensuring the high-efficiency development of teaching activities, students can be encouraged to participate in the competition activities and get more Good development.


Sign in / Sign up

Export Citation Format

Share Document