sequence annotation Latest Research Papers

Corrigendum to: Decombinator V4: an improved AIRR-C compliant software package for T-cell receptor sequence annotation

Bioinformatics ◽

10.1093/bioinformatics/btab550 ◽

2021 ◽

Author(s):

Thomas Peacock ◽

James M Heather ◽

Tahel Ronel ◽

Benny Chain

Keyword(s):

T Cell ◽

T Cell Receptor ◽

Software Package ◽

Cell Receptor ◽

Sequence Annotation

Draft genome sequence, annotation and SSR mining data of Oryctes rhinoceros Linn. (Coleoptera: Scarabaeidae), the coconut rhinoceros beetle

Data in Brief ◽

10.1016/j.dib.2021.107424 ◽

2021 ◽

Vol 38 ◽

pp. 107424

Author(s):

Rajesh M. K ◽

Ginny Antony ◽

Arvind Kumar ◽

Jeffrey Godwin ◽

Gangaraj K. P ◽

...

Keyword(s):

Genome Sequence ◽

Draft Genome ◽

Draft Genome Sequence ◽

Sequence Annotation ◽

Oryctes Rhinoceros ◽

Rhinoceros Beetle

A general framework to learn tertiary structure for protein sequence annotation

10.1101/2021.04.01.438098 ◽

2021 ◽

Author(s):

Mu Gao ◽

Jeffrey Skolnick

Keyword(s):

Deep Learning ◽

Protein Sequence ◽

General Framework ◽

Tertiary Structure ◽

Learning Algorithms ◽

Protein Structures ◽

Protein Sequences ◽

Input Sequence ◽

Sequence Annotation ◽

Deep Learning Neural Network

During the past five years, deep-learning algorithms have enabled ground-breaking progress towards the prediction of tertiary structure from a protein sequence. Very recently, we developed SAdLSA, a new computational algorithm for protein sequence comparison via deep-learning of protein structural alignments. SAdLSA shows significant improvement over established sequence alignment methods. In this contribution, we show that SAdLSA provides a general machine-learning framework for structurally characterizing protein sequences. By aligning a protein sequence against itself, SAdLSA generates a fold distogram for the input sequence, including challenging cases whose structural folds were not present in the training set. About 70% of the predicted distograms are statistically significant. Although at present the accuracy of the distogram predicted by SAdLSA self-alignment is not as good as deep-learning algorithms specifically trained for distogram prediction, it is remarkable that the prediction of single protein structures is encoded by an algorithm that learns ensembles of pairwise structural comparisons, without being explicitly trained to recognize individual structural folds. As such, SAdLSA can not only predict protein folds for individual sequences, but also detects subtle, yet significant, structural relationships between multiple protein sequences using the same deep-learning neural network. The former reduces to a special case in this general framework for protein sequence annotation.

Draft genome sequence, annotation, and SSR mining data of Elaeidobius kamerunicus Faust., an essential oil palm pollinating weevil

Data in Brief ◽

10.1016/j.dib.2021.106745 ◽

2021 ◽

Vol 34 ◽

pp. 106745

Author(s):

Ardha Apriyanto ◽

Van Basten Tambunan

Keyword(s):

Essential Oil ◽

Genome Sequence ◽

Oil Palm ◽

Draft Genome ◽

Draft Genome Sequence ◽

Sequence Annotation ◽

Elaeidobius Kamerunicus

Development and Evaluation of an Intravenous Infusion Sequence Annotation System

Applied Clinical Informatics ◽

10.1055/s-0041-1722871 ◽

2021 ◽

Vol 12 (01) ◽

pp. 065-072

Author(s):

Ji Qiu ◽

Tingting Deng ◽

Zhuo Wang ◽

Zhangwei Yang ◽

Ting Liu ◽

...

Keyword(s):

Intravenous Infusion ◽

Phase 1 ◽

Sequence Annotation ◽

Manual Annotation ◽

Phase 2 ◽

Phase 3 ◽

Chi Square ◽

Annotation System ◽

Intravenous Infusions ◽

Significant Difference

Abstract Objectives The sequence of intravenous infusions may impact the efficacy, safety, and cost of intravenous medications. The study describes and assesses a computerized clinical decision support annotation system capable of analyzing the sequence of intravenous infusions. Methods All intravenous medications on the hospital formulary were analyzed based on factors that impact intravenous infusion sequence. Eight pharmacy infusion knowledge databases were constructed based on Hospital Infusion Standards. These databases were incorporated into the computerized sequence annotation module within the electronic health record system. The annotation process was changed from pharmacists' manual annotation (phase 1) to computer-aided pharmacist manual annotation (phase 2) to automated computer annotation (phase 3). Results Comparing phase 2 to phase 1, there were significant differences in sequence annotation with regards to the percentage of hospital wards annotated (100% vs. 4.65%, chi-square = 180.95, p < 0.001), percentage of patients annotated (64.18% vs. 0.52%, chi-square = 90.46, p < 0.001), percentage of intravenous orders annotated (75.67% vs. 0.77%, chi-square = 118.78, p < 0.001), and the number of tubing flushes per ward per day (118.51 vs. 2,115.00, p < 0.001). Compared with phase 1, there were significant cost savings in tubing flushes in phase 2 and phase 3. Compared with phase 1, there was significant difference in the time nurses spent on tubing flushes in phase 2 and phase 3 (1,244.94 vs. 21,684.8 minutes, p < 0.001; 1,369.51 vs. 21,684.8 minutes, p < 0.001). Compared with phase 1, significantly less time was required for pharmacist annotation in phase 2 and phase 3 (90.6 vs. 4,753.57 minutes, p < 0.001; 0.05 vs. 4,753.57 minutes, p < 0.001). Conclusion A computerized infusion annotation system is efficient in sequence annotation and significant savings in tubing flushes can be achieved as a result.

Automatic building of a large and complete dataset for image-based table structure recognition

VNU Journal of Science Computer Science and Communication Engineering ◽

10.25073/2588-1086/vnucsce.293 ◽

2021 ◽

Vol 37 (2) ◽

Author(s):

Tran Quang Vinh ◽

◽

Nguyen Thi Ngoc Diep ◽

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Experimental Models ◽

Structured Data ◽

Sequence Annotation ◽

Learning Models ◽

Bounding Box ◽

Structure Recognition ◽

Complete Dataset ◽

Real World Applications

Table is one of the most common ways to represent structured data in documents. Existing researches on image-based table structure recognition often rely on limited datasets with the largest amount of 3,789 human-labeled tables as ICDAR 19 Track B dataset. A recent TableBank dataset for table structures contains 145K tables, however, the tables are labeled in an HTML tag sequence format, which impedes the development of image-based recognition methods. In this paper, we propose several processing methods that automatically convert an HTML tag sequence annotation into bounding box annotation for table cells in one table image. By ensembling these methods, we could convert 42,028 tables with high correctness, which is 11 times larger than the largest existing dataset (ICDAR 19). We then demonstrate that using these bounding box annotations, a straightforward representation of objects in images, we can achieve much higher F1-scores of table structure recognition at many high IoU thresholds using only off-the-shelf deep learning models: F1-score of 0.66 compared to the state-of-the-art of 0.44 for ICDAR19 dataset. A further experiment on using explicit bounding box annotation for image-based table structure recognition results in higher accuracy (70.6%) than implicit text sequence annotation (only 33.8%). The experiments show the effectiveness of our largest-to-date dataset to open up opportunities to generalize on real-world applications. Our dataset and experimental models are publicly available at shorturl.at/hwHY3

MultiPhATE2: Code for Functional Annotation and Comparison of Bacteriophage Genomes

10.1101/2020.10.05.324566 ◽

2020 ◽

Author(s):

Carol L. Ecale Zhou ◽

Jeffrey Kimbrel ◽

Robert Edwards ◽

Katlyn McNair ◽

Brian A. Souza ◽

...

Keyword(s):

Comparative Genomics ◽

Functional Annotation ◽

Input Data ◽

Search Space ◽

Search Algorithms ◽

Third Party ◽

Data Sets ◽

Sequence Annotation ◽

Command Line ◽

Link Type

AbstractTo address the need for improved tools for annotation and comparative genomics of bacteriophage genomes, we developed multiPhATE2. As an extension of the multiPhATE code, multiPhATE2 performs gene finding and functional sequence annotation of predicted gene and protein sequences, and additional search algorithms and databases extend the search space of the original functional annotation subsystem. MultiPhATE2 includes comparative genomics codes for gene matching among sets of input bacteriophage genomes, and scales well to large input data sets with the incorporation of multiprocessing in the functional annotation and comparative genomics subsystems. MultiPhATE2 was implemented in Python 3.7 and runs as a command-line code under Linux or MAC-OS. MultiPhATE2 is freely available under an open-source GPL-3 license at https://github.com/carolzhou/multiPhATE2. Instructions for acquiring the databases and third party codes used by multiPhATE2 are found in the README file included with the distribution. Users may report bugs by submitting issues to the project GitHub repository webpage. Contact: [email protected] or [email protected]. Supplementary materials, which demonstrate the outputs of multiPhATE2, are available in a GitHub repository, at https://github.com/carolzhou/multiPhATE2_supplementaryData/.

Draft genome of Meyerozyma guilliermondii strain vka1: a yeast strain with composting potential

Journal of Genetic Engineering and Biotechnology ◽

10.1186/s43141-020-00074-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Ravisankar Valsalan ◽

Deepu Mathew

Keyword(s):

De Novo ◽

Draft Genome ◽

Hybrid Approach ◽

Sequence Annotation ◽

Illumina Hiseq ◽

Meyerozyma Guilliermondii ◽

Genome Wide ◽

Number Of Genes ◽

Proteome Profile ◽

Organic Compost

Abstract Background Meyerozyma guilliermondii is a yeast which could be isolated from a variety of environments. The vka1 strain isolated and purified from the organic compost was found to have composting potential. To better understand the genes assisting the composting potential in this yeast, whole genome sequencing and sequence annotation were performed. Results The genome of M. guilliermondii vka1 strain was sequenced using a hybrid approach, on Illumina Hiseq-2500 platform at 100× coverage followed by Nanopore platform at 20× coverage. The de novo assembly using dual-fold approach had given draft genome of 10.8 Mb size. The genome was found to contain 5385 genes. The annotation of the genes was performed, and the enzymes identified to have roles in the degradation of macromolecules are discussed in relation to its composting potential. Annotation of the genome assembly of the related strains had revealed the unique biodegradation related genes in this strain. Phylogenetic analysis using the rDNA region has confirmed the position of this strain in the Ascomycota family. Raw reads are made public, and the genome wide proteome profile is presented to facilitate further studies on this organism. Conclusions Meyerozyma guilliermondii vka1 strain was sequenced through hybrid approach and the reads were de novo assembled. Draft genome size and the number of genes in the strain were assessed and discussed in relation to the related strains. Scientific insights into the composting potential of this strain are also presented in relation to the unique genes identified in this strain.

DNA Features Viewer: a sequence annotation formatting and plotting library for Python

Bioinformatics ◽

10.1093/bioinformatics/btaa213 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4350-4352 ◽

Cited By ~ 2

Author(s):

Valentin Zulkower ◽

Susan Rosser

Keyword(s):

Computational Biology ◽

Open Source ◽

Programming Language ◽

Source Code ◽

Supplementary Information ◽

Use Case ◽

Sequence Annotation ◽

Supplementary Data ◽

Python Programming Language ◽

Python Programming

Abstract Motivation Although the Python programming language counts many Bioinformatics and Computational Biology libraries; none offers customizable sequence annotation visualizations with layout optimization. Results DNA Features Viewer is a sequence annotation plotting library which optimizes plot readability while letting users tailor other visual aspects (colors, labels, highlights etc.) to their particular use case. Availability and implementation Open-source code and documentation are available on Github under the MIT license (https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer). Supplementary information Supplementary data are available at Bioinformatics online.

Phigaro: high-throughput prophage sequence annotation

Bioinformatics ◽

10.1093/bioinformatics/btaa250 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3882-3884 ◽

Cited By ~ 2

Author(s):

Elizaveta V Starikova ◽

Polina O Tikhonova ◽

Nikita A Prianichnikov ◽

Chris M Rands ◽

Evgeny M Zdobnov ◽

...

Keyword(s):

Test Data ◽

High Throughput ◽

Source Code ◽

Supplementary Information ◽

Sequence Annotation ◽

Command Line ◽

Supplementary Data ◽

Genome Maps ◽

Transposon Insertion ◽

Prophage Sequence

Abstract Summary Phigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated ‘prophage genome maps’ and marks possible transposon insertion spots inside prophages. It is applicable for mining prophage regions from large metagenomic datasets. Availability and implementation Source code for Phigaro is freely available for download at https://github.com/bobeobibo/phigaro along with test data. The code is written in Python. Supplementary information Supplementary data are available at Bioinformatics online.

sequence annotation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Corrigendum to: Decombinator V4: an improved AIRR-C compliant software package for T-cell receptor sequence annotation

Draft genome sequence, annotation and SSR mining data of Oryctes rhinoceros Linn. (Coleoptera: Scarabaeidae), the coconut rhinoceros beetle

A general framework to learn tertiary structure for protein sequence annotation

Draft genome sequence, annotation, and SSR mining data of Elaeidobius kamerunicus Faust., an essential oil palm pollinating weevil

Development and Evaluation of an Intravenous Infusion Sequence Annotation System

Automatic building of a large and complete dataset for image-based table structure recognition

MultiPhATE2: Code for Functional Annotation and Comparison of Bacteriophage Genomes

Draft genome of Meyerozyma guilliermondii strain vka1: a yeast strain with composting potential

DNA Features Viewer: a sequence annotation formatting and plotting library for Python

Phigaro: high-throughput prophage sequence annotation

Export Citation Format

sequence annotationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Corrigendum to: Decombinator V4: an improved AIRR-C compliant software package for T-cell receptor sequence annotation

Draft genome sequence, annotation and SSR mining data of Oryctes rhinoceros Linn. (Coleoptera: Scarabaeidae), the coconut rhinoceros beetle

A general framework to learn tertiary structure for protein sequence annotation

Draft genome sequence, annotation, and SSR mining data of Elaeidobius kamerunicus Faust., an essential oil palm pollinating weevil

Development and Evaluation of an Intravenous Infusion Sequence Annotation System

Automatic building of a large and complete dataset for image-based table structure recognition

MultiPhATE2: Code for Functional Annotation and Comparison of Bacteriophage Genomes

Draft genome of Meyerozyma guilliermondii strain vka1: a yeast strain with composting potential

DNA Features Viewer: a sequence annotation formatting and plotting library for Python

Phigaro: high-throughput prophage sequence annotation

sequence annotation
Recently Published Documents