sequence annotation
Recently Published Documents


TOTAL DOCUMENTS

91
(FIVE YEARS 17)

H-INDEX

20
(FIVE YEARS 3)

Data in Brief ◽  
2021 ◽  
Vol 38 ◽  
pp. 107424
Author(s):  
Rajesh M. K ◽  
Ginny Antony ◽  
Arvind Kumar ◽  
Jeffrey Godwin ◽  
Gangaraj K. P ◽  
...  

2021 ◽  
Author(s):  
Mu Gao ◽  
Jeffrey Skolnick

During the past five years, deep-learning algorithms have enabled ground-breaking progress towards the prediction of tertiary structure from a protein sequence. Very recently, we developed SAdLSA, a new computational algorithm for protein sequence comparison via deep-learning of protein structural alignments. SAdLSA shows significant improvement over established sequence alignment methods. In this contribution, we show that SAdLSA provides a general machine-learning framework for structurally characterizing protein sequences. By aligning a protein sequence against itself, SAdLSA generates a fold distogram for the input sequence, including challenging cases whose structural folds were not present in the training set. About 70% of the predicted distograms are statistically significant. Although at present the accuracy of the distogram predicted by SAdLSA self-alignment is not as good as deep-learning algorithms specifically trained for distogram prediction, it is remarkable that the prediction of single protein structures is encoded by an algorithm that learns ensembles of pairwise structural comparisons, without being explicitly trained to recognize individual structural folds. As such, SAdLSA can not only predict protein folds for individual sequences, but also detects subtle, yet significant, structural relationships between multiple protein sequences using the same deep-learning neural network. The former reduces to a special case in this general framework for protein sequence annotation.


2021 ◽  
Vol 12 (01) ◽  
pp. 065-072
Author(s):  
Ji Qiu ◽  
Tingting Deng ◽  
Zhuo Wang ◽  
Zhangwei Yang ◽  
Ting Liu ◽  
...  

Abstract Objectives The sequence of intravenous infusions may impact the efficacy, safety, and cost of intravenous medications. The study describes and assesses a computerized clinical decision support annotation system capable of analyzing the sequence of intravenous infusions. Methods All intravenous medications on the hospital formulary were analyzed based on factors that impact intravenous infusion sequence. Eight pharmacy infusion knowledge databases were constructed based on Hospital Infusion Standards. These databases were incorporated into the computerized sequence annotation module within the electronic health record system. The annotation process was changed from pharmacists' manual annotation (phase 1) to computer-aided pharmacist manual annotation (phase 2) to automated computer annotation (phase 3). Results Comparing phase 2 to phase 1, there were significant differences in sequence annotation with regards to the percentage of hospital wards annotated (100% vs. 4.65%, chi-square  = 180.95, p < 0.001), percentage of patients annotated (64.18% vs. 0.52%, chi-square = 90.46, p < 0.001), percentage of intravenous orders annotated (75.67% vs. 0.77%, chi-square = 118.78, p < 0.001), and the number of tubing flushes per ward per day (118.51 vs. 2,115.00, p < 0.001). Compared with phase 1, there were significant cost savings in tubing flushes in phase 2 and phase 3. Compared with phase 1, there was significant difference in the time nurses spent on tubing flushes in phase 2 and phase 3 (1,244.94 vs. 21,684.8 minutes, p < 0.001; 1,369.51 vs. 21,684.8 minutes, p < 0.001). Compared with phase 1, significantly less time was required for pharmacist annotation in phase 2 and phase 3 (90.6 vs. 4,753.57 minutes, p < 0.001; 0.05 vs. 4,753.57 minutes, p < 0.001). Conclusion A computerized infusion annotation system is efficient in sequence annotation and significant savings in tubing flushes can be achieved as a result.


Author(s):  
Tran Quang Vinh ◽  
◽  
Nguyen Thi Ngoc Diep ◽  

Table is one of the most common ways to represent structured data in documents. Existing researches on image-based table structure recognition often rely on limited datasets with the largest amount of 3,789 human-labeled tables as ICDAR 19 Track B dataset. A recent TableBank dataset for table structures contains 145K tables, however, the tables are labeled in an HTML tag sequence format, which impedes the development of image-based recognition methods. In this paper, we propose several processing methods that automatically convert an HTML tag sequence annotation into bounding box annotation for table cells in one table image. By ensembling these methods, we could convert 42,028 tables with high correctness, which is 11 times larger than the largest existing dataset (ICDAR 19). We then demonstrate that using these bounding box annotations, a straightforward representation of objects in images, we can achieve much higher F1-scores of table structure recognition at many high IoU thresholds using only off-the-shelf deep learning models: F1-score of 0.66 compared to the state-of-the-art of 0.44 for ICDAR19 dataset. A further experiment on using explicit bounding box annotation for image-based table structure recognition results in higher accuracy (70.6%) than implicit text sequence annotation (only 33.8%). The experiments show the effectiveness of our largest-to-date dataset to open up opportunities to generalize on real-world applications. Our dataset and experimental models are publicly available at shorturl.at/hwHY3


2020 ◽  
Author(s):  
Carol L. Ecale Zhou ◽  
Jeffrey Kimbrel ◽  
Robert Edwards ◽  
Katlyn McNair ◽  
Brian A. Souza ◽  
...  

AbstractTo address the need for improved tools for annotation and comparative genomics of bacteriophage genomes, we developed multiPhATE2. As an extension of the multiPhATE code, multiPhATE2 performs gene finding and functional sequence annotation of predicted gene and protein sequences, and additional search algorithms and databases extend the search space of the original functional annotation subsystem. MultiPhATE2 includes comparative genomics codes for gene matching among sets of input bacteriophage genomes, and scales well to large input data sets with the incorporation of multiprocessing in the functional annotation and comparative genomics subsystems. MultiPhATE2 was implemented in Python 3.7 and runs as a command-line code under Linux or MAC-OS. MultiPhATE2 is freely available under an open-source GPL-3 license at https://github.com/carolzhou/multiPhATE2. Instructions for acquiring the databases and third party codes used by multiPhATE2 are found in the README file included with the distribution. Users may report bugs by submitting issues to the project GitHub repository webpage. Contact: [email protected] or [email protected]. Supplementary materials, which demonstrate the outputs of multiPhATE2, are available in a GitHub repository, at https://github.com/carolzhou/multiPhATE2_supplementaryData/.


Author(s):  
Ravisankar Valsalan ◽  
Deepu Mathew

Abstract Background Meyerozyma guilliermondii is a yeast which could be isolated from a variety of environments. The vka1 strain isolated and purified from the organic compost was found to have composting potential. To better understand the genes assisting the composting potential in this yeast, whole genome sequencing and sequence annotation were performed. Results The genome of M. guilliermondii vka1 strain was sequenced using a hybrid approach, on Illumina Hiseq-2500 platform at 100× coverage followed by Nanopore platform at 20× coverage. The de novo assembly using dual-fold approach had given draft genome of 10.8 Mb size. The genome was found to contain 5385 genes. The annotation of the genes was performed, and the enzymes identified to have roles in the degradation of macromolecules are discussed in relation to its composting potential. Annotation of the genome assembly of the related strains had revealed the unique biodegradation related genes in this strain. Phylogenetic analysis using the rDNA region has confirmed the position of this strain in the Ascomycota family. Raw reads are made public, and the genome wide proteome profile is presented to facilitate further studies on this organism. Conclusions Meyerozyma guilliermondii vka1 strain was sequenced through hybrid approach and the reads were de novo assembled. Draft genome size and the number of genes in the strain were assessed and discussed in relation to the related strains. Scientific insights into the composting potential of this strain are also presented in relation to the unique genes identified in this strain.


2020 ◽  
Vol 36 (15) ◽  
pp. 4350-4352 ◽  
Author(s):  
Valentin Zulkower ◽  
Susan Rosser

Abstract Motivation Although the Python programming language counts many Bioinformatics and Computational Biology libraries; none offers customizable sequence annotation visualizations with layout optimization. Results DNA Features Viewer is a sequence annotation plotting library which optimizes plot readability while letting users tailor other visual aspects (colors, labels, highlights etc.) to their particular use case. Availability and implementation Open-source code and documentation are available on Github under the MIT license (https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3882-3884 ◽  
Author(s):  
Elizaveta V Starikova ◽  
Polina O Tikhonova ◽  
Nikita A Prianichnikov ◽  
Chris M Rands ◽  
Evgeny M Zdobnov ◽  
...  

Abstract Summary Phigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated ‘prophage genome maps’ and marks possible transposon insertion spots inside prophages. It is applicable for mining prophage regions from large metagenomic datasets. Availability and implementation Source code for Phigaro is freely available for download at https://github.com/bobeobibo/phigaro along with test data. The code is written in Python. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document