structural annotation
Recently Published Documents


TOTAL DOCUMENTS

91
(FIVE YEARS 47)

H-INDEX

15
(FIVE YEARS 5)

2022 ◽  
pp. 132134
Author(s):  
Wenjing Liu ◽  
Wei Li ◽  
Peijie Zhang ◽  
Xingcheng Gong ◽  
Pengfei Tu ◽  
...  

Author(s):  
David B Neale ◽  
Aleksey V Zimin ◽  
Sumaira Zaman ◽  
Alison D Scott ◽  
Bikash Shrestha ◽  
...  

Abstract Sequencing, assembly, and annotation of the 26.5 Gbp hexaploid genome of coast redwood (Sequoia sempervirens) was completed leading toward discovery of genes related to climate adaptation and investigation of the origin of the hexaploid genome. Deep-coverage short-read Illumina sequencing data from haploid tissue from a single seed were combined with long-read Oxford Nanopore Technologies sequencing data from diploid needle tissue to create an initial assembly, which was then scaffolded using proximity ligation data to produce a highly contiguous final assembly, SESE 2.1, with a scaffold N50 size of 44.9 Mbp. The assembly included several scaffolds that span entire chromosome arms, confirmed by the presence of telomere and centromere sequences on the ends of the scaffolds. The structural annotation produced 118,906 genes with 113 containing introns that exceed 500 Kbp in length and one reaching 2 Mb. Nearly 19 Gbp of the genome represented repetitive content with the vast majority characterized as long terminal repeats, with a 2.9:1 ratio of Copia to Gypsy elements that may aid in gene expression control. Comparison of coast redwood to other conifers revealed species-specific expansions for a plethora of abiotic and biotic stress response genes, including those involved in fungal disease resistance, detoxification, and physical injury/structural remodeling and others supporting flavonoid biosynthesis. Analysis of multiple genes that exist in triplicate in coast redwood but only once in its diploid relative, giant sequoia, supports a previous hypothesis that the hexaploidy is the result of autopolyploidy rather than any hybridizations with separate but closely related conifer species.


2021 ◽  
Author(s):  
MaKayla Foster ◽  
Markace Rainey ◽  
Chandler Watson ◽  
James N Dodds ◽  
Facundo Fernandez ◽  
...  

The identification of xenobiotics in nontargeted metabolomic analyses is a vital step in understanding human exposure. Xenobiotic metabolism, excretion, and co-existence with other endogenous molecules however greatly complicate nontargeted studies. While mass spectrometry (MS)-based platforms are commonly used in metabolomic measurements, deconvoluting endogenous metabolites and xenobiotics is often challenged by the lack of xenobiotic parent and metabolite standards as well as the numerous isomers possible for each small molecule m/z feature. Here, we evaluate the use of ion mobility spectrometry coupled with MS (IMS-MS) and mass defect filtering in a xenobiotic structural annotation workflow to reduce large metabolomic feature lists and uncover potential xenobiotic classes and species detected in the metabolomic studies. To evaluate the workflow, xenobiotics having known high toxicities including per- and polyfluoroalkyl substances (PFAS), polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs) and polybrominated diphenyl ethers (PBDEs) were examined. Initially, to address the lack of available IMS collision cross section (CCS) values for per- and polyfluoroalkyl substances (PFAS), 88 PFAS standards were evaluated with IMS-MS to both develop a targeted PFAS CCS library and for use in machine learning predictions. The CCS values for biomolecules and xenobiotics were then plotted versus m/z, clearly distinguishing the biomolecules and halogenated xenobiotics. The xenobiotic structural annotation workflow was then used to annotate potential PFAS features in NIST human serum. The workflow reduced the 2,423 detected LC-IMS-MS features to 80 possible PFAS with 17 confidently identified through targeted analyses and 48 additional features correlating with possible CompTox entries.


2021 ◽  
Author(s):  
Ronald Nieuwenhuis ◽  
Thamara Hesselinkk ◽  
Hetty C. van den Broeck ◽  
Jan Cordewener ◽  
Elio Schijlen ◽  
...  

We present the first annotated genome assembly of the allopolyploid okra (Abelmoschus esculentus). Analysis of telomeric repeats and gene rich regions suggested we obtained whole chromosome and chromosomal arm scaffolds. Besides long distal blocks we also detected short interstitial TTTAGGG telomeric repeats, possibly representing hallmarks of chromosomal speciation upon polyploidization of okra. Ribosomal RNA genes are organized in 5S clusters separated from the 18S-5.8S-28S units, clearly indicating an S-type rRNA gene arrangement. The assembly is consistent with cytogenetic and cytometry observations, identifying 65 chromosomes and 1.45Gb of expected genome size in a haploid sibling. Approximately 57% of the genome consists of repetitive sequence. BUSCO scores and A50 plot statistics indicated a nearly complete genome. Kmer distribution analysis suggests that approximately 75% has a diploid nature, and at least 15% of the genome is heterozygous. We did not observe aberrant meiotic configurations, suggesting there is no recombination among the sub-genomes. BUSCO configurations pointed to the presence of at least 3 sub-genomes. These observations are indicative for an allopolyploid nature of the okra genome. Structural annotation using gene models derived from mapped transcriptome data, generated over 130,000 putative genes. The discovered genes appeared to be located predominantly at the distal ends of scaffolds, gradually decreasing in abundance toward more centrally positioned scaffold domains. In contrast, LTR retrotransposons were more abundant in centrally located scaffold domains, while less frequently represented in the distal ends. This gene and LTR-retrotransposon distribution is consistent with the observed heterochromatin organization of pericentromeric heterochromatin and distal euchromatin. The derived amino acid queries of putative genes were subsequently used for phenol biosynthesis pathway annotation in okra. Comparison against manually curated reference KEGG pathways from related Malvaceae species revealed the genetic basis for putative enzyme coding genes that likely enable metabolic reactions involved in the biosynthesis of dietary and therapeutic compounds in okra.


2021 ◽  
Author(s):  
Daniela Strenkert ◽  
Matthew Mingay ◽  
Stefan Schmollinger ◽  
Cindy Chen ◽  
Ronan C O'Malley ◽  
...  

The eukaryotic green alga Chromochloris zofingiensis is a reference organism for studying carbon partitioning and a promising candidate for the production of biofuel precursors. Recent transcriptome profiling transformed our understanding of its biology and generally algal biology, but epigenetic regulation remains understudied and represents a fundamental gap in our understanding of algal gene expression. Chromatin Immunoprecipitation followed by deep sequencing (ChIP-Seq) is a powerful tool for the discovery of such mechanisms, by identifying genome-wide histone modification patterns and transcription factor-binding sites alike. Here, we established a ChIP-Seq framework for Chr. zofingiensis yielding over 20 million high quality reads per sample. The most critical steps in a ChIP experiment were optimized, including DNA shearing to obtain an average DNA fragment size of 250 bp and assessment of the recommended formaldehyde concentration for optimal DNA-protein crosslinking. We used this ChIP-Seq framework to generate a genome-wide map of the H3K4me3 distribution pattern and to integrate these data with matching RNA-Seq data. In line with observations from other organisms, H3K4me3 marks predominantly transcription start sites of genes. Our H3K4me3 ChIP-Seq data will pave the way for improved genome structural annotation in the emerging reference alga Chr. zofingiensis.


Genes ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1645
Author(s):  
Anna Vlasova ◽  
Toni Hermoso Pulido ◽  
Francisco Camara ◽  
Julia Ponomarenko ◽  
Roderic Guigó

Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine. The pipeline integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG. It starts from a protein sequence FASTA file and, optionally, a structural annotation file in GFF format, and produces several files, such as GO assignments, output summaries of the abovementioned programs and final annotation reports. The pipeline can be broken easily into smaller processes for the purpose of parallelization and easily deployed in a Linux computational environment, thanks to software containerization, thus helping to ensure full reproducibility.


Author(s):  
Martin A. Hoffmann ◽  
Louis-Félix Nothias ◽  
Marcus Ludwig ◽  
Markus Fleischauer ◽  
Emily C. Gentry ◽  
...  

AbstractUntargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.


2021 ◽  
Author(s):  
Daniela Strenkert ◽  
Asli Yildirim ◽  
Juying Yan ◽  
Yuko Yoshinaga ◽  
Matteo Pellegrini ◽  
...  

Chromatin modifications are key epigenetic regulatory features with roles in various cellular events, yet histone mark identification, gene wide distribution and relationship to gene expression remains understudied in green algae. Histone lysine methylation is regarded as an active chromatin mark in many organisms, and is implicated in mediating active euchromatin. We interrogated the genome wide distribution pattern of mono- and trimethylated H3K4 using Chromatin-Immunoprecipitation followed by deep-sequencing (ChIP-Seq) during key phases of the Chlamydomonas cell cycle: early G1 phase (ZT1) when cells initiate biomass accumulation, S/M phase (ZT13) when cells are undergoing DNA replication and mitosis, and late G0 phase (ZT23) when they are quiescent. Tri-methylated H3K4 was predominantly enriched at TSSs of the majority of protein coding genes (85%). The likelihood of a gene being marked by H3K4me3 correlated with it being transcribed at one or more time points during the cell cycle but not necessarily by continuous active transcription. This finding even applied to early zygotic genes whose expression may be dormant for hundreds or thousands of generations between sexual cycles; but core meiotic genes were completely missing H3K4me3 peaks at their TSS. In addition, bi-directional promoters regulating expression of replication dependent histone genes, had transient H3K4me3 peaks that were present only during S/M phase when their expression peaked. In agreement with biochemical studies, mono-methylated H3K4 was the default state for the vast majority of histones that were outside of TSS and terminator regions of genes. A small fraction of the genome which was depleted of any H3 lysine methylation was enriched for DNA cytosine methylation and the genes within these DNA methylation islands were poorly expressed. Genome wide H3K4me3 ChIP-Seq data will be a valuable resource, facilitating gene structural annotation, as exemplified by our validation of hundreds of long non-coding RNA genes.


2021 ◽  
Author(s):  
Igor Filipović ◽  
Gordana Rašić ◽  
James Hereward ◽  
Maria Gharuka ◽  
Gregor J Devine ◽  
...  

Background: An optimal starting point for relating genome function to organismal biology is a high-quality nuclear genome assembly, and long-read sequencing is revolutionizing the production of this genomic resource in insects. Despite this, nuclear genome assemblies have been under-represented for agricultural insect pests, particularly from the order Coleoptera. Here we present a de novo genome assembly and structural annotation for the coconut rhinoceros beetle, Oryctes rhinoceros (Coleoptera: Scarabaeidae), based on Oxford Nanopore Technologies (ONT) long-read data generated from a wild-caught female, as well as the assembly process that also led to the recovery of the complete circular genome assemblies of the beetle's mitochondrial genome and that of the biocontrol agent, Oryctes rhinoceros nudivirus (OrNV). As an invasive pest of palm trees, O. rhinoceros is undergoing an expansion in its range across the Pacific Islands, requiring new approaches to management that may include strategies facilitated by genome assembly and annotation. Results: High-quality DNA isolated from an adult female was used to create four ONT libraries that were sequenced using four MinION flow cells, producing a total of 27.2 Gb of high-quality long-read sequences. We employed an iterative assembly process and polishing with one lane of high-accuracy Illumina reads, obtaining a final size of the assembly of 377.36 Mb that had high contiguity (fragment N50 length = 12 Mb) and accuracy, as evidenced by the exceptionally high completeness of the benchmarked set of conserved single-copy orthologous genes (BUSCO completeness = 99.11%). These quality metrics place our assembly as the most complete of the published Coleopteran genomes. The structural annotation of the nuclear genome assembly contained a highly-accurate set of 16,371 protein-coding genes showing BUSCO completeness of 92.09%, as well as the expected number of non-coding RNAs and the number and structure of paralogous genes in a gene family like Sigma GST. Conclusions: The genomic resources produced in this study form a foundation for further functional genetic research and management programs that may inform the control and surveillance of O. rhinoceros populations, and we demonstrate the efficacy of de novo genome assembly using long-read ONT data from a single field-caught insect.


Sign in / Sign up

Export Citation Format

Share Document