iPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features

Abstract Background Pseudouridine modification is most commonly found among various kinds of RNA modification occurred in both prokaryotes and eukaryotes. This biochemical event has been proved to occur in multiple types of RNAs, including rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, gaining a holistic understanding of pseudouridine modification can contribute to the development of drug discovery and gene therapies. Although some laboratory techniques have come up with moderately good outcomes in pseudouridine identification, they are costly and required skilled work experience. We propose iPseU-NCP – an efficient computational framework to predict pseudouridine sites using the Random Forest (RF) algorithm combined with nucleotide chemical properties (NCP) generated from RNA sequences. The benchmark dataset collected from Chen et al. (2016) was used to develop iPseU-NCP and fairly compare its performances with other methods. Results Under the same experimental settings, comparing with three state-of-the-art methods including iPseU-CNN, PseUI, and iRNA-PseU, the Matthew’s correlation coefficient (MCC) of our model increased by about 20.0%, 55.0%, and 109.0% when tested on the H. sapiens (H_200) dataset and by about 6.5%, 35.0%, and 150.0% when tested on the S. cerevisiae (S_200) dataset, respectively. This significant growth in MCC is very important since it ensures the stability and performance of our model. With those two independent test datasets, our model also presented higher accuracy with a success rate boosted by 7.0%, 13.0%, and 20.0% and 2.0%, 9.5%, and 25.0% when compared to iPseU-CNN, PseUI, and iRNA-PseU, respectively. For majority of other evaluation metrics, iPseU-NCP demonstrated superior performance as well. Conclusions iPseU-NCP combining the RF and NPC-encoded features showed better performances than other existing state-of-the-art methods in the identification of pseudouridine sites. This also shows an optimistic view in addressing biological issues related to human diseases.

Download Full-text

IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features

10.26686/wgtn.14315870 ◽

2021 ◽

Author(s):

TH Nguyen-Vo ◽

QH Nguyen ◽

TTT Do ◽

TN Nguyen ◽

S Rahardja ◽

...

Keyword(s):

Random Forest ◽

Work Experience ◽

State Of The Art ◽

Chemical Properties ◽

Superior Performance ◽

Rna Modification ◽

Rna Sequences ◽

Holistic Understanding ◽

Art Methods ◽

Matthew’S Correlation Coefficient

© 2019 Nguyen-Vo et al. Background: Pseudouridine modification is most commonly found among various kinds of RNA modification occurred in both prokaryotes and eukaryotes. This biochemical event has been proved to occur in multiple types of RNAs, including rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, gaining a holistic understanding of pseudouridine modification can contribute to the development of drug discovery and gene therapies. Although some laboratory techniques have come up with moderately good outcomes in pseudouridine identification, they are costly and required skilled work experience. We propose iPseU-NCP - an efficient computational framework to predict pseudouridine sites using the Random Forest (RF) algorithm combined with nucleotide chemical properties (NCP) generated from RNA sequences. The benchmark dataset collected from Chen et al. (2016) was used to develop iPseU-NCP and fairly compare its performances with other methods. Results: Under the same experimental settings, comparing with three state-of-the-art methods including iPseU-CNN, PseUI, and iRNA-PseU, the Matthew's correlation coefficient (MCC) of our model increased by about 20.0%, 55.0%, and 109.0% when tested on the H. sapiens (H_200) dataset and by about 6.5%, 35.0%, and 150.0% when tested on the S. cerevisiae (S_200) dataset, respectively. This significant growth in MCC is very important since it ensures the stability and performance of our model. With those two independent test datasets, our model also presented higher accuracy with a success rate boosted by 7.0%, 13.0%, and 20.0% and 2.0%, 9.5%, and 25.0% when compared to iPseU-CNN, PseUI, and iRNA-PseU, respectively. For majority of other evaluation metrics, iPseU-NCP demonstrated superior performance as well. Conclusions: iPseU-NCP combining the RF and NPC-encoded features showed better performances than other existing state-of-the-art methods in the identification of pseudouridine sites. This also shows an optimistic view in addressing biological issues related to human diseases.

Download Full-text

IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features

10.26686/wgtn.14315870.v1 ◽

2021 ◽

Author(s):

TH Nguyen-Vo ◽

QH Nguyen ◽

TTT Do ◽

TN Nguyen ◽

S Rahardja ◽

...

Keyword(s):

Random Forest ◽

Work Experience ◽

State Of The Art ◽

Chemical Properties ◽

Superior Performance ◽

Rna Modification ◽

Rna Sequences ◽

Holistic Understanding ◽

Art Methods ◽

Matthew’S Correlation Coefficient

Download Full-text

Control of DC Motors to Guide Unmanned Underwater Vehicles

Applied Sciences ◽

10.3390/app11052144 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2144

Author(s):

Timothy Sands

Keyword(s):

Artificial Intelligence ◽

State Of The Art ◽

Underwater Vehicles ◽

Modeling Method ◽

Superior Performance ◽

Alternative Methods ◽

Self Awareness ◽

Square Wave ◽

Unmanned Underwater Vehicles ◽

Art Methods

Many research manuscripts propose new methodologies, while others compare several state-of-the-art methods to ascertain the best method for a given application. This manuscript does both by introducing deterministic artificial intelligence (D.A.I.) to control direct current motors used by unmanned underwater vehicles (amongst other applications), and directly comparing the performance of three state-of-the-art nonlinear adaptive control techniques. D.A.I. involves the assertion of self-awareness statements and uses optimal (in a 2-norm sense) learning to compensate for the deleterious effects of error sources. This research reveals that deterministic artificial intelligence yields 4.8% lower mean and 211% lower standard deviation of tracking errors as compared to the best modeling method investigated (indirect self-tuner without process zero cancellation and minimum phase plant). The improved performance cannot be attributed to superior estimation. Coefficient estimation was merely on par with the best alternative methods; some coefficients were estimated more accurately, others less. Instead, the superior performance seems to be attributable to the modeling method. One noteworthy feature is that D.A.I. very closely followed a challenging square wave without overshoot—successfully settling at each switch of the square wave—while all of the other state-of-the-art methods were unable to do so.

Download Full-text

HIGH QUALITY FACADE SEGMENTATION BASED ON STRUCTURED RANDOM FOREST, REGION PROPOSAL NETWORK AND RECTANGULAR FITTING

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-2-223-2018 ◽

2018 ◽

Vol IV-2 ◽

pp. 223-230 ◽

Cited By ~ 2

Author(s):

K. Rahmani ◽

H. Mayer

Keyword(s):

Neural Network ◽

Random Forest ◽

Convolutional Neural Network ◽

State Of The Art ◽

Semantic Segmentation ◽

High Quality ◽

Current State ◽

Forest Region ◽

Building Facades ◽

Art Methods

In this paper we present a pipeline for high quality semantic segmentation of building facades using Structured Random Forest (SRF), Region Proposal Network (RPN) based on a Convolutional Neural Network (CNN) as well as rectangular fitting optimization. Our main contribution is that we employ features created by the RPN as channels in the SRF.We empirically show that this is very effective especially for doors and windows. Our pipeline is evaluated on two datasets where we outperform current state-of-the-art methods. Additionally, we quantify the contribution of the RPN and the rectangular fitting optimization on the accuracy of the result.

Download Full-text

Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences

Briefings in Bioinformatics ◽

10.1093/bib/bbz112 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1676-1696 ◽

Cited By ~ 15

Author(s):

Zhen Chen ◽

Pei Zhao ◽

Fuyi Li ◽

Yanan Wang ◽

A Ian Smith ◽

...

Keyword(s):

Feature Selection ◽

Computational Methods ◽

State Of The Art ◽

Model Performance ◽

Experimental Methods ◽

Receiver Operating Curve ◽

Rna Modification ◽

Computational Techniques ◽

Rna Sequences ◽

Site Prediction

Abstract RNA post-transcriptional modifications play a crucial role in a myriad of biological processes and cellular functions. To date, more than 160 RNA modifications have been discovered; therefore, accurate identification of RNA-modification sites is fundamental for a better understanding of RNA-mediated biological functions and mechanisms. However, due to limitations in experimental methods, systematic identification of different types of RNA-modification sites remains a major challenge. Recently, more than 20 computational methods have been developed to identify RNA-modification sites in tandem with high-throughput experimental methods, with most of these capable of predicting only single types of RNA-modification sites. These methods show high diversity in their dataset size, data quality, core algorithms, features extracted and feature selection techniques and evaluation strategies. Therefore, there is an urgent need to revisit these methods and summarize their methodologies, in order to improve and further develop computational techniques to identify and characterize RNA-modification sites from the large amounts of sequence data. With this goal in mind, first, we provide a comprehensive survey on a large collection of 27 state-of-the-art approaches for predicting N1-methyladenosine and N6-methyladenosine sites. We cover a variety of important aspects that are crucial for the development of successful predictors, including the dataset quality, operating algorithms, sequence and genomic features, feature selection, model performance evaluation and software utility. In addition, we also provide our thoughts on potential strategies to improve the model performance. Second, we propose a computational approach called DeepPromise based on deep learning techniques for simultaneous prediction of N1-methyladenosine and N6-methyladenosine. To extract the sequence context surrounding the modification sites, three feature encodings, including enhanced nucleic acid composition, one-hot encoding, and RNA embedding, were used as the input to seven consecutive layers of convolutional neural networks (CNNs), respectively. Moreover, DeepPromise further combined the prediction score of the CNN-based models and achieved around 43% higher area under receiver-operating curve (AUROC) for m1A site prediction and 2–6% higher AUROC for m6A site prediction, respectively, when compared with several existing state-of-the-art approaches on the independent test. In-depth analyses of characteristic sequence motifs identified from the convolution-layer filters indicated that nucleotide presentation at proximal positions surrounding the modification sites contributed most to the classification, whereas those at distal positions also affected classification but to different extents. To maximize user convenience, a web server was developed as an implementation of DeepPromise and made publicly available at http://DeepPromise.erc.monash.edu/, with the server accepting both RNA sequences and genomic sequences to allow prediction of two types of putative RNA-modification sites.

Download Full-text

MSqRob takes the missing hurdle: uniting intensity- and count-based proteomics

10.1101/782466 ◽

2019 ◽

Author(s):

Ludger J.E. Goeminne ◽

Adriaan Sticker ◽

Lennart Martens ◽

Kris Gevaert ◽

Lieven Clement

Keyword(s):

Mass Spectrometry ◽

Quantitative Data ◽

Missing Values ◽

State Of The Art ◽

Superior Performance ◽

Hurdle Model ◽

Model Component ◽

Art Methods ◽

Innovative Solution ◽

Dependent Mass

ABSTRACTMissing values are a major issue in quantitative data-dependent mass spectrometry-based proteomics. We therefore present an innovative solution to this key issue by introducing a hurdle model, which is a mixture between a binomial peptide count and a peptide intensity-based model component. It enables dramatically enhanced quantification of proteins with many missing values without having to resort to harmful assumptions for missingness. We demonstrate the superior performance of our method by comparing it with state-of-the-art methods in the field.

Download Full-text

Multi-hop assortativities for network classification

Journal of Complex Networks ◽

10.1093/comnet/cny034 ◽

2018 ◽

Vol 7 (4) ◽

pp. 603-622 ◽

Cited By ~ 1

Author(s):

Leonardo Gutiérrez-Gómez ◽

Jean-Charles Delvenne

Keyword(s):

Machine Learning ◽

Scientific Collaboration ◽

State Of The Art ◽

Medical Engineering ◽

Research Field ◽

Classification Task ◽

Collaboration Network ◽

Structural Patterns ◽

Art Methods

Abstract Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when it is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field of a scientific collaboration network. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as the number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of the nodes situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of ‘fingerprints’ to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task.

Download Full-text

Automatic Detection of Discrimination Actions from Social Images

Electronics ◽

10.3390/electronics10030325 ◽

2021 ◽

Vol 10 (3) ◽

pp. 325

Author(s):

Zhihao Wu ◽

Baopeng Zhang ◽

Tianchen Zhou ◽

Yan Li ◽

Jianping Fan

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Automatic Detection ◽

Experimental Results ◽

Practical Approach ◽

Detection And Identification ◽

Art Methods ◽

Image Set ◽

Social Images ◽

Relationship Identification

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.

Download Full-text

A Deep Learning Approach to Predict Autism Spectrum Disorder Using Multisite Resting-State fMRI

Applied Sciences ◽

10.3390/app11083636 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3636

Author(s):

Faria Zarin Subah ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Takeshi Koshiba

Keyword(s):

Autism Spectrum Disorder ◽

Resting State ◽

State Of The Art ◽

Resting State Fmri ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Bootstrap Analysis ◽

Proposed Model ◽

Art Methods ◽

The Mean

Autism spectrum disorder (ASD) is a complex and degenerative neuro-developmental disorder. Most of the existing methods utilize functional magnetic resonance imaging (fMRI) to detect ASD with a very limited dataset which provides high accuracy but results in poor generalization. To overcome this limitation and to enhance the performance of the automated autism diagnosis model, in this paper, we propose an ASD detection model using functional connectivity features of resting-state fMRI data. Our proposed model utilizes two commonly used brain atlases, Craddock 200 (CC200) and Automated Anatomical Labelling (AAL), and two rarely used atlases Bootstrap Analysis of Stable Clusters (BASC) and Power. A deep neural network (DNN) classifier is used to perform the classification task. Simulation results indicate that the proposed model outperforms state-of-the-art methods in terms of accuracy. The mean accuracy of the proposed model was 88%, whereas the mean accuracy of the state-of-the-art methods ranged from 67% to 85%. The sensitivity, F1-score, and area under receiver operating characteristic curve (AUC) score of the proposed model were 90%, 87%, and 96%, respectively. Comparative analysis on various scoring strategies show the superiority of BASC atlas over other aforementioned atlases in classifying ASD and control.

Download Full-text

A contour property based approach to segment nuclei in cervical cytology images

BMC Medical Imaging ◽

10.1186/s12880-020-00533-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Iram Tazim Hoque ◽

Nabil Ibtehaz ◽

Saumitra Chakravarty ◽

M. Saifur Rahman ◽

M. Sohel Rahman

Keyword(s):

Pap Smear ◽

State Of The Art ◽

Cervical Cytology ◽

Average Intensity ◽

Nucleus Size ◽

Real Dataset ◽

Art Methods ◽

Convolution Filter ◽

Cervical Cells ◽

Nucleus Segmentation

Abstract Background Segmentation of nuclei in cervical cytology pap smear images is a crucial stage in automated cervical cancer screening. The task itself is challenging due to the presence of cervical cells with spurious edges, overlapping cells, neutrophils, and artifacts. Methods After the initial preprocessing steps of adaptive thresholding, in our approach, the image passes through a convolution filter to filter out some noise. Then, contours from the resultant image are filtered by their distinctive contour properties followed by a nucleus size recovery procedure based on contour average intensity value. Results We evaluate our method on a public (benchmark) dataset collected from ISBI and also a private real dataset. The results show that our algorithm outperforms other state-of-the-art methods in nucleus segmentation on the ISBI dataset with a precision of 0.978 and recall of 0.933. A promising precision of 0.770 and a formidable recall of 0.886 on the private real dataset indicate that our algorithm can effectively detect and segment nuclei on real cervical cytology images. Tuning various parameters, the precision could be increased to as high as 0.949 with an acceptable decrease of recall to 0.759. Our method also managed an Aggregated Jaccard Index of 0.681 outperforming other state-of-the-art methods on the real dataset. Conclusion We have proposed a contour property-based approach for segmentation of nuclei. Our algorithm has several tunable parameters and is flexible enough to adapt to real practical scenarios and requirements.

Download Full-text