structure annotation
Recently Published Documents


TOTAL DOCUMENTS

62
(FIVE YEARS 11)

H-INDEX

13
(FIVE YEARS 0)

2022 ◽  
Vol 1 ◽  
Author(s):  
Zhi-Hao Guo ◽  
Li Yuan ◽  
Ya-Lan Tan ◽  
Ben-Gong Zhang ◽  
Ya-Zhou Shi

The 3D architectures of RNAs are essential for understanding their cellular functions. While an accurate scoring function based on the statistics of known RNA structures is a key component for successful RNA structure prediction or evaluation, there are few tools or web servers that can be directly used to make comprehensive statistical analysis for RNA 3D structures. In this work, we developed RNAStat, an integrated tool for making statistics on RNA 3D structures. For given RNA structures, RNAStat automatically calculates RNA structural properties such as size and shape, and shows their distributions. Based on the RNA structure annotation from DSSR, RNAStat provides statistical information of RNA secondary structure motifs including canonical/non-canonical base pairs, stems, and various loops. In particular, the geometry of base-pairing/stacking can be calculated in RNAStat by constructing a local coordinate system for each base. In addition, RNAStat also supplies the distribution of distance between any atoms to the users to help build distance-based RNA statistical potentials. To test the usability of the tool, we established a non-redundant RNA 3D structure dataset, and based on the dataset, we made a comprehensive statistical analysis on RNA structures, which could have the guiding significance for RNA structure modeling. The python code of RNAStat, the dataset used in this work, and corresponding statistical data files are freely available at GitHub (https://github.com/RNA-folding-lab/RNAStat).



Author(s):  
Jan Wira Gotama Putra ◽  
Kana Matsumura ◽  
Simone Teufel ◽  
Takenobu Tokunaga

AbstractDiscourse structure annotation aims at analysing how discourse units (e.g. sentences or clauses) relate to each other and what roles they play in the overall discourse. Several annotation tools for discourse structure have been developed. However, they often only support specific annotation schemes, making their usage limited to new schemes. This article presents TIARA 2.0, an annotation tool for discourse structure and text improvement. Departing from our specific needs, we extend an existing tool to accommodate four levels of annotation: discourse structure, argumentative structure, sentence rearrangement and content alteration. The latter two are particularly unique compared to existing tools. TIARA is implemented on standard web technologies and can be easily customised. It deals with the visual complexity during the annotation process by systematically simplifying the layout and by offering interactive visualisation, including clutter-reducing features and dual-view display. TIARA’s text-view allows annotators to focus on the analysis of logical sequencing between sentences. The tree-view allows them to review their analysis in terms of the overall discourse structure. Apart from being an annotation tool, it is also designed to be useful for educational purposes in the teaching of argumentation; this gives it an edge over other existing tools.



Author(s):  
Martin A. Hoffmann ◽  
Louis-Félix Nothias ◽  
Marcus Ludwig ◽  
Markus Fleischauer ◽  
Emily C. Gentry ◽  
...  

AbstractUntargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Linyu Wang ◽  
Xiaodan Zhong ◽  
Shuo Wang ◽  
Yuanning Liu

Abstract Background Studies have proven that the same family of non-coding RNAs (ncRNAs) have similar functions, so predicting the ncRNAs family is helpful to the research of ncRNAs functions. The existing calculation methods mainly fall into two categories: the first type is to predict ncRNAs family by learning the features of sequence or secondary structure, and the other type is to predict ncRNAs family by the alignment among homologs sequences. In the first type, some methods predict ncRNAs family by learning predicted secondary structure features. The inaccuracy of predicted secondary structure may cause the low accuracy of those methods. Different from that, ncRFP directly learning the features of ncRNA sequences to predict ncRNAs family. Although ncRFP simplifies the prediction process and improves the performance, there is room for improvement in ncRFP performance due to the incomplete features of its input data. In the secondary type, the homologous sequence alignment method can achieve the highest performance at present. However, due to the need for consensus secondary structure annotation of ncRNA sequences, and the helplessness for modeling pseudoknots, the use of the method is limited. Results In this paper, a novel method “ncDLRES”, which according to learning the sequence features, is proposed to predict the family of ncRNAs based on Dynamic LSTM (Long Short-term Memory) and ResNet (Residual Neural Network). Conclusions ncDLRES extracts the features of ncRNA sequences based on Dynamic LSTM and then classifies them by ResNet. Compared with the homologous sequence alignment method, ncDLRES reduces the data requirement and expands the application scope. By comparing with the first type of methods, the performance of ncDLRES is greatly improved.



2021 ◽  
pp. 1-27
Author(s):  
Jan Wira Gotama Putra ◽  
Simone Teufel ◽  
Takenobu Tokunaga

Abstract Argument mining (AM) aims to explain how individual argumentative discourse units (e.g. sentences or clauses) relate to each other and what roles they play in the overall argumentation. The automatic recognition of argumentative structure is attractive as it benefits various downstream tasks, such as text assessment, text generation, text improvement, and summarization. Existing studies focused on analyzing well-written texts provided by proficient authors. However, most English speakers in the world are non-native, and their texts are often poorly structured, particularly if they are still in the learning phase. Yet, there is no specific prior study on argumentative structure in non-native texts. In this article, we present the first corpus containing argumentative structure annotation for English-as-a-foreign-language (EFL) essays, together with a specially designed annotation scheme. The annotated corpus resulting from this work is called “ICNALE-AS” and contains 434 essays written by EFL learners from various Asian countries. The corpus presented here is particularly useful for the education domain. On the basis of the analysis of argumentation-related problems in EFL essays, educators can formulate ways to improve them so that they more closely resemble native-level productions. Our argument annotation scheme is demonstrably stable, achieving good inter-annotator agreement and near-perfect intra-annotator agreement. We also propose a set of novel document-level agreement metrics that are able to quantify structural agreement from various argumentation aspects, thus providing a more holistic analysis of the quality of the argumentative structure annotation. The metrics are evaluated in a crowd-sourced meta-evaluation experiment, achieving moderate to good correlation with human judgments.



2021 ◽  
Vol 22 (16) ◽  
pp. 8553
Author(s):  
Reeki Emrizal ◽  
Hazrina Yusof Hamdani ◽  
Mohd Firdaus-Raih

The increasing number and complexity of structures containing RNA chains in the Protein Data Bank (PDB) have led to the need for automated structure annotation methods to replace or complement expert visual curation. This is especially true when searching for tertiary base motifs and substructures. Such base arrangements and motifs have diverse roles that range from contributions to structural stability to more direct involvement in the molecule’s functions, such as the sites for ligand binding and catalytic activity. We review the utility of computational approaches in annotating RNA tertiary base motifs in a dataset of PDB structures, particularly the use of graph theoretical algorithms that can search for such base motifs and annotate them or find and annotate clusters of hydrogen-bond-connected bases. We also demonstrate how such graph theoretical algorithms can be integrated into a workflow that allows for functional analysis and comparisons of base arrangements and sub-structures, such as those involved in ligand binding. The capacity to carry out such automatic curations has led to the discovery of novel motifs and can give new context to known motifs as well as enable the rapid compilation of RNA 3D motifs into a database.





2021 ◽  
Author(s):  
Martin A. Hoffmann ◽  
Louis-Félix Nothias ◽  
Marcus Ludwig ◽  
Markus Fleischauer ◽  
Emily C. Gentry ◽  
...  

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. As biological interpretation relies on accurate structure annotations, the ability to assign confidence to such annotations is a key outstanding problem. We introduce the COSMIC workflow that combines structure database generation, in silico annotation, and a confidence score consisting of kernel density p-value estimation and a Support Vector Machine with enforced directionality of features. In evaluation, COSMIC annotates a substantial number of hits at small false discovery rates, and outperforms spectral library search for this purpose. To demonstrate that COSMIC can annotate structures never reported before, we annotated twelve novel bile acid conjugates; nine structures were confirmed by manual evaluation and two structures using synthetic standards. Second, we annotated and manually evaluated 315 molecular structures in human samples currently absent from the Human Metabolome Database. Third, we applied COSMIC to 17,400 experimental runs and annotated 1,715 structures with high confidence that were absent from spectral libraries.



2021 ◽  
Author(s):  
Romain MAGNY ◽  
Anne Regazzetti ◽  
Karima Kessal ◽  
Christophe Baudouin ◽  
Stéphane Mélik-Parsadaniantz ◽  
...  

The in-depth knowledge of lipid biological functions calls for a comprehensive lipid structure annotation that implies implementing a method to locate fatty acids unsaturations. To address this challenge we have associated Grubbs' cross metathesis reaction and liquid chromatography hyphenated to tandem mass spectrometry. The pretreatment of lipids containing samples by Grubbs' catalyst and an appropriate alken generates substituted lipids through cross-metathesis reaction under mild, chemoselective and highly reproducible conditions. A systematic LC-MS/MS analysis of the reaction mixture allows locating unambigouslt the double bounds in fatty acid side chains. This method has en successfully applied at a nanomole scale to commerical standard mixtures as well as in lipid extracts from an in vitro model of corneal toxicity.<br><div><a></a><a></a><a></a><a></a></div><div><a></a><a></a><a></a> </div>



2021 ◽  
Author(s):  
Romain MAGNY ◽  
Anne Regazzetti ◽  
Karima Kessal ◽  
Christophe Baudouin ◽  
Stéphane Mélik-Parsadaniantz ◽  
...  

The in-depth knowledge of lipid biological functions calls for a comprehensive lipid structure annotation that implies implementing a method to locate fatty acids unsaturations. To address this challenge we have associated Grubbs' cross metathesis reaction and liquid chromatography hyphenated to tandem mass spectrometry. The pretreatment of lipids containing samples by Grubbs' catalyst and an appropriate alken generates substituted lipids through cross-metathesis reaction under mild, chemoselective and highly reproducible conditions. A systematic LC-MS/MS analysis of the reaction mixture allows locating unambigouslt the double bounds in fatty acid side chains. This method has en successfully applied at a nanomole scale to commerical standard mixtures as well as in lipid extracts from an in vitro model of corneal toxicity.<br><div><a></a><a></a><a></a><a></a></div><div><a></a><a></a><a></a> </div>



Sign in / Sign up

Export Citation Format

Share Document