sequence feature
Recently Published Documents


TOTAL DOCUMENTS

66
(FIVE YEARS 25)

H-INDEX

12
(FIVE YEARS 4)

2021 ◽  
Author(s):  
Wen Wang ◽  
Yang Cao ◽  
Jing Zhang ◽  
Fengxiang He ◽  
Zheng-Jun Zha ◽  
...  

Genomics ◽  
2021 ◽  
Author(s):  
Qiao-Ying Ji ◽  
Xiu-Jun Gong ◽  
Hao-Min Li ◽  
Pu-Feng Du
Keyword(s):  

2021 ◽  
Author(s):  
Yuka Yoshimura ◽  
Akifumi Hamada ◽  
Yohann Augey ◽  
Manato Akiyama ◽  
Yasubumi Sakakibara

Motivation: Biological sequence classification is the most fundamental task in bioinformatics analysis. For example, in metagenome analysis, binning is a typical type of DNA sequence classification. In order to classify sequences, it is necessary to define sequence features. The k-mer frequency, base composition, and alignment-based metrics are commonly used. In contrast, in the field of image recognition using machine learning, image classification is broadly divided into those based on shape and those based on style. A style matrix was introduced as a method of expressing the style of an image (e.g., color usage and texture). Results: We propose a novel sequence feature, called genomic style, inspired by image classification approaches, for classifying and clustering DNA sequences. As with the style of images, the DNA sequence is considered to have a genomic style unique to the bacterial species, and the style matrix concept is applied to the DNA sequence. Our main aim is to introduce the genomics style as yet another basic sequence feature for metagenome binning problem in replace of the most commonly used sequence feature k-mer frequency. Performance evaluations show that our method using style matrix achieves the superior accuracy than state-of-the-art binning tools based on k-mer frequency.


Author(s):  
Xianfang Wang ◽  
Yifeng Liu ◽  
Zhiyong Du ◽  
Mingdong Zhu ◽  
Aman Chandra Kaushik ◽  
...  

2021 ◽  
Vol 22 ◽  
Author(s):  
Ying Liang ◽  
Niannian Liu ◽  
Le Yang ◽  
Jianjun Tang ◽  
Yinglong Wang ◽  
...  

: Circular RNA (circRNA) is a non-coding molecule produced through alternative splicing of one or more exons of a gene in the presence of RNA-induced silencing complex (RISC). Its formation depends on complementary intron sequences on both sides of the circularized sequence. CircRNA functions as a sponge for miRNA, playing the role of transcriptional regulator or potential biomarker. It has an impact on fetal growth and on synaptic facilitation in the brain. In this review, we illustrate biogenesis mechanisms, characteristics and functions of cirRNAs. We also summarize methods using sequence feature and RNA next-generation sequencing data for circRNA prediction. Finally, we discuss the state of the research on circRNA in diseases, which will bring new contributions to future disease treatments.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248861
Author(s):  
Xiaogeng Wan ◽  
Xinying Tan

In this paper, we use network approaches to analyze the relations between protein sequence features for the top hierarchical classes of CATH and SCOP. We use fundamental connectivity measures such as correlation (CR), normalized mutual information rate (nMIR), and transfer entropy (TE) to analyze the pairwise-relationships between the protein sequence features, and use centrality measures to analyze weighted networks constructed from the relationship matrices. In the centrality analysis, we find both commonalities and differences between the different protein 3D structural classes. Results show that all top hierarchical classes of CATH and SCOP present strong non-deterministic interactions for the composition and arrangement features of Cystine (C), Methionine (M), Tryptophan (W), and also for the arrangement features of Histidine (H). The different protein 3D structural classes present different preferences in terms of their centrality distributions and significant features.


2020 ◽  
Vol 15 (6) ◽  
pp. 574-580
Author(s):  
Tianjiao Zhang ◽  
Rongjie Wang ◽  
Qinghua Jiang ◽  
Yadong Wang

Background: Enhancers are cis-regulatory elements that enhance gene expression on DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult to identify them. As other regulatory elements, the regions around enhancers contain a variety of features, which can help in enhancer recognition. Objective: The classification power of features differs significantly, the performances of existing methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating the classification power of each feature can improve the predictive performance of enhancers. Methods: We present an evaluation method based on Information Gain (IG) that captures the entropy change of enhancer recognition according to features. To validate the performance of our method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on each feature. Results: The average IG values of the sequence feature, transcriptional feature and epigenetic feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647, respectively. The verification results are consistent with our evaluation results. Conclusion: This IG-based method can effectively evaluate the classification power of features for identifying enhancers. Compared with sequence features, epigenetic features are more effective for recognizing enhancers.


Sign in / Sign up

Export Citation Format

Share Document