scholarly journals PolyA: a tool for adjudicating competing annotations of biological sequences

2021 ◽  
Author(s):  
Kaitlin M. Carey ◽  
Robert Hubley ◽  
George T. Lesica ◽  
Daniel Olson ◽  
Jack W. Roddy ◽  
...  

AbstractAnnotation of a biological sequence is usually performed by aligning that sequence to a database of known sequence elements. When that database contains elements that are highly similar to each other, the proper annotation may be ambiguous, because several entries in the database produce high-scoring alignments. Typical annotation methods work by assigning a label based on the candidate annotation with the highest alignment score; this can overstate annotation certainty, mislabel boundaries, and fails to identify large scale rearrangements or insertions within the annotated sequence. Here, we present a new software tool, PolyA, that adjudicates between competing alignment-based annotations by computing estimates of annotation confidence, identifying a trace with maximal confidence, and recursively splicing/stitching inserted elements. PolyA communicates annotation certainty, identifies large scale rearrangements, and detects boundaries between neighboring elements.

2020 ◽  
Author(s):  
Eli N. Weinstein ◽  
Debora S. Marks

AbstractLarge-scale sequencing has revealed extraordinary diversity among biological sequences, produced over the course of evolution and within the lifetime of individual organisms. Existing methods for building statistical models of sequences often pre-process the data using multiple sequence alignment, an unreliable approach for many genetic elements (antibodies, disordered proteins, etc.) that is subject to fundamental statistical pathologies. Here we introduce a structured emission distribution (the MuE distribution) that accounts for mutational variability (substitutions and indels) and use it to construct generative and predictive hierarchical Bayesian models (H-MuE models). Our framework enables the application of arbitrary continuous-space vector models (e.g. linear regression, factor models, image neural-networks) to unaligned sequence data. Theoretically, we show that the MuE generalizes classic probabilistic alignment models. Empirically, we show that H-MuE models can infer latent representations and features for immune repertoires, predict functional unobserved members of disordered protein families, and forecast the future evolution of pathogens.


2009 ◽  
Vol 25 (5) ◽  
pp. 662-663 ◽  
Author(s):  
Olivier Martin ◽  
Armand Valsesia ◽  
Amalio Telenti ◽  
Ioannis Xenarios ◽  
Brian J. Stevenson

Author(s):  
Guillermo Restrepo

: The deluge of biological sequences ranging from those of proteins, DNA and RNA to genomes has increased the models for their representation, which are further used to contrast those sequences. Here we present a brief bibliometric description of the research area devoted to representation of biological sequences and highlight the semiotic reaches of this process. Finally, we argue that this research area needs further research according to the evolution of mathematical chemistry and its drawbacks are required to be overcome.


2002 ◽  
Vol 184 (1) ◽  
pp. 171-176 ◽  
Author(s):  
Patrick Mavingui ◽  
Margarita Flores ◽  
Xianwu Guo ◽  
Guillermo Dávila ◽  
Xavier Perret ◽  
...  

ABSTRACT Bacterial genomes are usually partitioned in several replicons, which are dynamic structures prone to mutation and genomic rearrangements, thus contributing to genome evolution. Nevertheless, much remains to be learned about the origins and dynamics of the formation of bacterial alternative genomic states and their possible biological consequences. To address these issues, we have studied the dynamics of the genome architecture in Rhizobium sp. strain NGR234 and analyzed its biological significance. NGR234 genome consists of three replicons: the symbiotic plasmid pNGR234a (536,165 bp), the megaplasmid pNGR234b (>2,000 kb), and the chromosome (>3,700 kb). Here we report that genome analyses of cell siblings showed the occurrence of large-scale DNA rearrangements consisting of cointegrations and excisions between the three replicons. As a result, four new genomic architectures have emerged. Three consisted of the cointegrates between two replicons: chromosome-pNGR234a, chromosome-pNGR234b, and pNGR234a-pNGR234b. The other consisted of a cointegrate of the three replicons (chromosome-pNGR234a-pNGR234b). Cointegration and excision of pNGR234a with either the chromosome or pNGR234b were studied and found to proceed via a Campbell-type mechanism, mediated by insertion sequence elements. We provide evidence showing that changes in the genome architecture did not alter the growth and symbiotic proficiency of Rhizobium derivatives.


2021 ◽  
Author(s):  
Weiqian Cao ◽  
Siyuan Kong ◽  
Wenfeng Zeng ◽  
Pengyun Gong ◽  
Biyun Jiang ◽  
...  

Interpreting large-scale glycoproteomic data for intact glycopeptide identification has been tremendously advanced by software tools. However, software tools for quantitative analysis of intact glycopeptides remain lagging behind, which greatly hinders exploring the differential expression and functions of site-specific glycosylation in organisms. Here, we report pGlycoQuant, a generic software tool for accurate and convenient quantitative intact glycopeptide analysis, supporting both primary and tandem mass spectrometry quantitation for multiple quantitative strategies. pGlycoQuant enables intact glycopeptide quantitation with very low missing values via a deep residual network, thus greatly expanding the quantitative function of several powerful search engines, currently including pGlyco 2.0, pGlyco3, Byonic and MSFragger-Glyco. The pGlycoQuant-based site-specific N-glycoproteomic study conducted here quantifies 6435 intact N-glycopeptides in three hepatocellular carcinoma cell lines with different metastatic potentials and, together with in vitro molecular biology experiments, illustrates core fucosylation at site 979 of the L1 cell adhesion molecule (L1CAM) as a potential regulator of HCC metastasis. pGlycoQuant is freely available at https://github.com/expellir-arma/pGlycoQuant/releases/. We have demonstrated pGlycoQuant to be a powerful tool for the quantitative analysis of site-specific glycosylation and the exploration of potential glycosylation-related biomarker candidates, and we expect further applications in glycoproteomic studies.


2018 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Roland Erwin Suri ◽  
Mohamed El-Saad

PurposeChanges in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for long-term preservation, such as the PDF/A format. Since only few customers submit PDF/A files, digital archives may consider converting submitted files to the PDF/A format. The paper aims to discuss these issues.Design/methodology/approachThe authors evaluated three software tools for batch conversion of common file formats to PDF/A-1b: LuraTech PDF Compressor, Adobe Acrobat XI Pro and 3-HeightsTMDocument Converter by PDF Tools. The test set consisted of 80 files, with 10 files each of the eight file types JPEG, MS PowerPoint, PDF, PNG, MS Word, MS Excel, MSG and “web page.”FindingsBatch processing was sometimes hindered by stops that required manual interference. Depending on the software tool, three to four of these stops occurred during batch processing of the 80 test files. Furthermore, the conversion tools sometimes failed to produce output files even for supported file formats: three (Adobe Pro) up to seven (LuraTech and 3-HeightsTM) PDF/A-1b files were not produced. Since Adobe Pro does not convert e-mails, a total of 213 PDF/A-1b files were produced. The faithfulness of each conversion was investigated by comparing the visual appearance of the input document with that of the produced PDF/A-1b document on a computer screen. Meticulous visual inspection revealed that the conversion to PDF/A-1b impaired the information content in 24 of the converted 213 files (11 percent). These reproducibility errors included loss of links, loss of other document content (unreadable characters, missing text, document part missing), updated fields (reflecting time and folder of conversion), vector graphics issues and spelling errors.Originality/valueThese results indicate that large-scale batch conversions of heterogeneous files to PDF/A-1b cause complex issues that need to be addressed for each individual file. Even with considerable efforts, some information loss seems unavoidable if large numbers of files from heterogeneous sources are migrated to the PDF/A-1b format.


Author(s):  
Jun Wang ◽  
Pu-Feng Du ◽  
Xin-Yu Xue ◽  
Guang-Ping Li ◽  
Yuan-Ke Zhou ◽  
...  

Abstract Summary Many efforts have been made in developing bioinformatics algorithms to predict functional attributes of genes and proteins from their primary sequences. One challenge in this process is to intuitively analyze and to understand the statistical features that have been selected by heuristic or iterative methods. In this paper, we developed VisFeature, which aims to be a helpful software tool that allows the users to intuitively visualize and analyze statistical features of all types of biological sequence, including DNA, RNA and proteins. VisFeature also integrates sequence data retrieval, multiple sequence alignments and statistical feature generation functions. Availability and implementation VisFeature is a desktop application that is implemented using JavaScript/Electron and R. The source codes of VisFeature are freely accessible from the GitHub repository (https://github.com/wangjun1996/VisFeature). The binary release, which includes an example dataset, can be freely downloaded from the same GitHub repository (https://github.com/wangjun1996/VisFeature/releases). Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Energies ◽  
2020 ◽  
Vol 13 (3) ◽  
pp. 541 ◽  
Author(s):  
Sourav Khanna ◽  
Victor Becerra ◽  
Adib Allahham ◽  
Damian Giaouris ◽  
Jamie M. Foster ◽  
...  

Residential variable energy price schemes can be made more effective with the use of a demand response (DR) strategy along with smart appliances. Using DR, the electricity bill of participating customers/households can be minimised, while pursuing other aims such as demand-shifting and maximising consumption of locally generated renewable-electricity. In this article, a two-stage optimization method is used to implement a price-based implicit DR scheme. The model considers a range of novel smart devices/technologies/schemes, connected to smart-meters and a local DR-Controller. A case study with various decarbonisation scenarios is used to analyse the effects of deploying the proposed DR-scheme in households located in the west area of the Isle of Wight (Southern United Kingdom). There are approximately 15,000 households, of which 3000 are not connected to the gas-network. Using a distribution network model along with a load flow software-tool, the secondary voltages and apparent-power through transformers at the relevant substations are computed. The results show that in summer, participating households could export up to 6.4 MW of power, which is 10% of installed large-scale photovoltaics (PV) capacity on the island. Average carbon dioxide equivalent (CO2e) reductions of 7.1 ktons/annum and a reduction in combined energy/transport fuel-bills of 60%/annum could be achieved by participating households.


2011 ◽  
Vol 12 (1) ◽  
pp. 34 ◽  
Author(s):  
Tania Dottorini ◽  
Nicola Senin ◽  
Giorgio Mazzoleni ◽  
Kalle Magnusson ◽  
Andrea Crisanti

Sign in / Sign up

Export Citation Format

Share Document