genome annotation
Recently Published Documents


TOTAL DOCUMENTS

599
(FIVE YEARS 155)

H-INDEX

65
(FIVE YEARS 8)

2022 ◽  
Vol 18 (1) ◽  
pp. 1-13
Author(s):  
Ramanathan Sowdhamini ◽  

Saffron (Crocus sativus L.) is the low yielding plant of medicinal and economic importance. Therefore, it is of interest to report the draft genome sequence of C. sativus. The draft genome of C. sativus has been assembled using Illumina sequencing and is 3.01 Gb long covering 84.24% of genome. C. sativus genome annotation identified 53,546 functional genes (including 5726 transcription factors), 862,275 repeats and 964,231 SSR markers. The genes involved in the apocarotenoids biosynthesis pathway (crocin, crocetin, picrocrocin, and safranal) were found in the draft genome analysis.


2022 ◽  
Author(s):  
Caroline M. Weisman ◽  
Andrew M. Murray ◽  
Sean R Eddy

Comparisons of genomes of different species are used to identify lineage-specific genes, those genes that appear unique to one species or clade. Lineage-specific genes are often thought to represent genetic novelty that underlies unique adaptations. Identification of these genes depends not only on genome sequences, but also on inferred gene annotations. Comparative analyses typically use available genomes that have been annotated using different methods, increasing the risk that orthologous DNA sequences may be erroneously annotated as a gene in one species but not another, appearing lineage-specific as a result. To evaluate the impact of such 'annotation heterogeneity', we identified four clades of species with sequenced genomes with more than one publicly available gene annotation, allowing us to compare the number of lineage-specific genes inferred when differing annotation methods are used to those resulting when annotation method is uniform across the clade. In these case studies, annotation heterogeneity increases the apparent number of lineage-specific genes by up to 15-fold, suggesting that annotation heterogeneity is a substantial source of potential artifact.


2021 ◽  
Author(s):  
Enrique González-Tortuero ◽  
Revathy Krishnamurthi ◽  
Heather E. Allison ◽  
Ian B. Goodhead ◽  
Chloe E. James

The number of newly available viral genomes and metagenomes has increased exponentially since the development of high throughput sequencing platforms and genome analysis tools. Bioinformatic annotation pipelines are largely based on open reading frame (ORF) calling software, which identifies genes independently of the sequence taxonomical background. Although ORF-calling programs provide a rapid genome annotation, they can misidentify ORFs and start codons; errors that might be perpetuated and propagated over time. This study evaluated the performance of multiple ORF-calling programs for viral genome annotation against the complete RefSeq viral database. Programs outputs varied when considering the viral nucleic acid type versus the viral host. According to the number of ORFs, Prodigal and Metaprodigal were the most accurate programs for DNA viruses, while FragGeneScan and Prodigal generated the most accurate outputs for RNA viruses. Similarly, Prodigal outperformed the benchmark for viruses infecting prokaryotes, and GLIMMER and GeneMarkS produced the most accurate annotations for viruses infecting eukaryotes. When the coordinates of the ORFs were considered, Prodigal scored high for all scenarios except for RNA viruses, where GeneMarkS generated the most reliable results. Overall, the quality of the coordinates predicted for RNA viruses was poorer than for DNA viruses, suggesting the need for improved ORF-calling programs to deal with RNA viruses. Moreover, none of the ORF-calling programs reached 90% accuracy for annotation of DNA viruses. Any automatic annotation can still be improved by manual curation, especially when the presence of ORFs is validated with wet-lab experiments. However, our evaluation of the current ORF-calling programs is expected to be useful for the improvement of viral genome annotation pipelines and highlights the need for more expression data to improve the rigor of reference genomes.


2021 ◽  
Vol 6 ◽  
pp. 334
Author(s):  
Liam Crowley ◽  
◽  
◽  
◽  
◽  
...  

We present a genome assembly from an individual female Chrysoperla carnea (a common green lacewing; Arthropoda; Insecta; Neuroptera; Chrysopidae). The genome sequence is 560 megabases in span. The majority of the assembly (95.70%) is scaffolded into six chromosomal pseudomolecules, with the X sex chromosome assembled. Gene annotation of this assembly by the NCBI Eukaryotic Genome Annotation Pipeline has identified 12,985 protein coding genes.


2021 ◽  
Author(s):  
Loïc Meunier ◽  
Denis Baurain ◽  
Luc Cornet

AbstractSummaryTo support small and large-scale genome annotation projects, we present AMAW (Automated MAKER2 Annotation Wrapper), a program devised to annotate non-model unicellular eukaryotic genomes by automating the acquisition of evidence data (transcripts and proteins) and facilitating the use of MAKER2, a widely adopted software suite for the annotation of eukaryotic genomes. Moreover, AMAW exists as a Singularity container recipe easy to deploy on a grid computer, thereby overcoming the tricky installation of MAKER2.AvailabilityAMAW is released both as a Singularity container recipe and a standalone Perl script (https://bitbucket.org/phylogeno/amaw/)[email protected] or [email protected] informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Tim Nicholson-Shaw ◽  
Jens Lykke-Andersen

AbstractPost-transcriptional trimming and tailing of RNA 3’ ends play key roles in the processing and quality control of non-coding RNAs (ncRNAs). However, bioinformatic tools to examine changes in the RNA 3’ “tailome” are sparse and not standardized. Here we present Tailer, a bioinformatic pipeline in two parts that allows for robust quantification and analysis of tail information from next generation sequencing experiments that preserve RNA 3’ end information. The first part of Tailer, Tailer-Processing, uses genome annotation or reference FASTA gene sequences to quantify RNA 3’ ends from SAM-formatted alignment files or FASTQ sequence read files produced from sequencing experiments. The second part, Tailer-Analysis, uses the output of Tailer-Processing to identify statistically significant RNA targets of trimming and tailing and create graphs for data exploration. We apply Tailer to RNA 3’ end sequencing experiments from three published studies and find that it accurately and reproducibly recapitulates key findings. Thus, Tailer should be a useful and easily accessible tool to globally investigate tailing dynamics of non-polyadenylated RNAs and conditions that perturb them.


PHAGE ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 183-193
Author(s):  
Anastasiya Shen ◽  
Andrew Millard

2021 ◽  
Author(s):  
Zhicheng Zhang ◽  
Jing Guo ◽  
Xu Cai ◽  
Yufang Li ◽  
Xi Xi ◽  
...  

The species Brassica rapa includes several important vegetable crops. The draft reference genome of B. rapa ssp. pekinensis was completed in 2011, and it has since been updated twice. The pangenome with structural variations of 18 B. rapa accessions was published in 2021. Although extensive genomic analysis has been conducted on B. rapa, a comprehensive genome annotation including gene structure, alternative splicing events, and non-coding genes is still lacking. Therefore, we used the Pacific Biosciences (PacBio) single-molecular long-read technology to improve gene models and produced the annotated genome version 3.5. In total, we obtained 753,041 full-length non-chimeric (FLNC) reads and collapsed these into 92,810 non-redundant consensus isoforms, capturing 48% of the genes annotated in the B. rapa reference genome annotation v3.1. Based on the isoform data, we identified 830 novel protein-coding genes that were missed in previous genome annotations, defined the UTR regions of 20,340 annotated genes and corrected 886 wrongly-spliced genes. We also identified 28,564 alternative splicing (AS) events and 1,480 long non-coding RNAs (lncRNAs). We produced a relatively complete and high-quality reference transcriptome for B. rapa that can facilitate further functional genomic research.


2021 ◽  
pp. 7-30
Author(s):  
Dinesh Gupta ◽  
Rahila Sardar
Keyword(s):  

2021 ◽  
Author(s):  
Yanhua Shi ◽  
Weiping Lin ◽  
Guohui Wang ◽  
Punan Zhao ◽  
Guo-hua Huang ◽  
...  

Abstract Analysis of orthology is important for understanding protein conservation, function and phylogenomics. This study performed a comprehensive identification of Ascoviridae orthology based on identification of 366 ascoviridae protein homologue groups and phylogenetic analysis of 34 non-single copy proteins. Our fondings revealed 90 newly annotated proteins, five new identified Ascoviridae core proteins and 14 Ascovirus core proteins. Moreover, a phylogenomic tree of 11 ascoviridae species was inferred based on the concatenation of 35 of 45 Ascoviridae ortholog groups. In combination with phosphoproteomic results and conservation estimations, 30 conserved phosphorylation sites on 17 phosphoproteins were identified from a total of 176 phosphosites on 57 phosphoproteins from Heliothis virescens ascovirus 3h (HvAV-3h), supplying potential research targets for exploration of the detailed role of these protein in the regulation of viral infection mechanisms. This study would facilitates further Ascoviridae genome annotation and comparison and other functional genomic investigations.


Sign in / Sign up

Export Citation Format

Share Document