DNA4mC-LIP: a linear integration method to identify N4-methylcytosine site in multiple species

Abstract Motivation DNA N4-methylcytosine (4mC) is a crucial epigenetic modification. However, the knowledge about its biological functions is limited. Effective and accurate identification of 4mC sites will be helpful to reveal its biological functions and mechanisms. Since experimental methods are cost and ineffective, a number of machine learning-based approaches have been proposed to detect 4mC sites. Although these methods yielded acceptable accuracy, there is still room for the improvement of the prediction performance and the stability of existing methods in practical applications. Results In this work, we first systematically assessed the existing methods based on an independent dataset. And then, we proposed DNA4mC-LIP, a linear integration method by combining existing predictors to identify 4mC sites in multiple species. The results obtained from independent dataset demonstrated that DNA4mC-LIP outperformed existing methods for identifying 4mC sites. To facilitate the scientific community, a web server for DNA4mC-LIP was developed. We anticipated that DNA4mC-LIP could serve as a powerful computational technique for identifying 4mC sites and facilitate the interpretation of 4mC mechanism. Availability and implementation http://i.uestc.edu.cn/DNA4mC-LIP/. Contact [email protected] or [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Evaluation of different computational methods on 5-methylcytosine sites identification

Briefings in Bioinformatics ◽

10.1093/bib/bbz048 ◽

2019 ◽

Vol 21 (3) ◽

pp. 982-995 ◽

Cited By ~ 34

Author(s):

Hao Lv ◽

Zi-Mei Zhang ◽

Shi-Hao Li ◽

Jiu-Xin Tan ◽

Wei Chen ◽

...

Keyword(s):

Large Scale ◽

Homo Sapiens ◽

Great Increase ◽

Computational Prediction ◽

Extraction Methods ◽

Biological Functions ◽

Accurate Identification ◽

The Past ◽

Reliable Model ◽

Multiple Species

Abstract 5-Methylcytosine (m5C) plays an extremely important role in the basic biochemical process. With the great increase of identified m5C sites in a wide variety of organisms, their epigenetic roles become largely unknown. Hence, accurate identification of m5C site is a key step in understanding its biological functions. Over the past several years, more attentions have been paid on the identification of m5C sites in multiple species. In this work, we firstly summarized the current progresses in computational prediction of m5C sites and then constructed a more powerful and reliable model for identifying m5C sites. To train the model, we collected experimentally confirmed m5C data from Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Arabidopsis thaliana, and compared the performances of different feature extraction methods and classification algorithms for optimizing prediction model. Based on the optimal model, a novel predictor called iRNA-m5C was developed for the recognition of m5C sites. Finally, we critically evaluated the performance of iRNA-m5C and compared it with existing methods. The result showed that iRNA-m5C could produce the best prediction performance. We hope that this paper could provide a guide on the computational identification of m5C site and also anticipate that the proposed iRNA-m5C will become a powerful tool for large scale identification of m5C sites.

Download Full-text

MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model

Bioinformatics ◽

10.1093/bioinformatics/btz556 ◽

2019 ◽

Cited By ~ 7

Author(s):

Cong Pian ◽

Guangle Zhang ◽

Fei Li ◽

Xiaodan Fan

Keyword(s):

Markov Model ◽

Epigenetic Modification ◽

Transition Probability ◽

Chemical Properties ◽

Previous Method ◽

Supplementary Information ◽

Sequence Information ◽

Biological Functions ◽

New Classification ◽

Eukaryotic Organisms

Abstract Motivation Recent studies have shown that DNA N6-methyladenine (6mA) plays an important role in epigenetic modification of eukaryotic organisms. It has been found that 6mA is closely related to embryonic development, stress response and so on. Developing a new algorithm to quickly and accurately identify 6mA sites in genomes is important for explore their biological functions. Results In this paper, we proposed a new classification method called MM-6mAPred based on a Markov model which makes use of the transition probability between adjacent nucleotides to identify 6mA site. The sensitivity and specificity of our method are 89.32% and 90.11%, respectively. The overall accuracy of our method is 89.72%, which is 6.59% higher than that of the previous method i6mA-Pred. It indicated that, compared with the 41 nucleotide chemical properties used by i6mA-Pred, the transition probability between adjacent nucleotides can capture more discriminant sequence information. Availability and implementation The web server of MM-6mAPred is freely accessible at http://www.insect-genome.com/MM-6mAPred/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Bioinformatics ◽

10.1093/bioinformatics/btab083 ◽

2021 ◽

Author(s):

Yanrong Ji ◽

Zhihan Zhou ◽

Han Liu ◽

Ramana V Davuluri

Keyword(s):

Dna Sequences ◽

Regulatory Elements ◽

Ease Of Use ◽

Fine Tuning ◽

Supplementary Information ◽

Sequence Motifs ◽

Semantic Relationship ◽

Accurate Identification ◽

Conserved Sequence ◽

Genome Wide

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network

Genes ◽

10.3390/genes12020296 ◽

2021 ◽

Vol 12 (2) ◽

pp. 296

Author(s):

Zeeshan Abbas ◽

Hilal Tayara ◽

Kil To Chong

Keyword(s):

Mouse Genome ◽

Matthews Correlation Coefficient ◽

Biological Functions ◽

Independent Dataset ◽

Validation Test ◽

Single Feature ◽

Proposed Model ◽

Dna Modifications ◽

Feature Encoding ◽

Feature Encoding Scheme

Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme—one-hot encoding—we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics.

Download Full-text

A Bayesian approach for analysis of whole-genome bisulphite sequencing data identifies disease-associated changes in DNA methylation

10.1101/041715 ◽

2016 ◽

Author(s):

Owen J.L. Rackham ◽

Sarah R. Langley ◽

Thomas Oates ◽

Eleni Vradi ◽

Nathan Harmston ◽

...

Keyword(s):

Dna Methylation ◽

Epigenetic Modification ◽

Promoter Hypermethylation ◽

Whole Genome ◽

Sequencing Data ◽

Accurate Identification ◽

Bisulphite Sequencing ◽

Consistent Change ◽

Rat Strain ◽

Differential Transcription

ABSTRACTDNA methylation is a key epigenetic modification involved in gene regulation whose contribution to disease susceptibility remains to be fully understood. Here, we present a novel Bayesian smoothing approach (called ABBA) to detect differentially methylated regions (DMRs) from whole-genome bisulphite sequencing (WGBS). We also show how this approach can be leveraged to identify disease-associated changes in DNA methylation, suggesting mechanisms through which these alterations might affect disease. From a data modeling perspective, ABBA has the distinctive feature of automatically adapting to different correlation structures in CpG methylation levels across the genome whilst taking into account the distance between CpG sites as a covariate. Our simulation study shows that ABBA has greater power to detect DMRs than existing methods, providing an accurate identification of DMRs in the large majority of simulated cases. To empirically demonstrate the method’s efficacy in generating biological hypotheses, we performed WGBS of primary macrophages derived from an experimental rat system of glomerulonephritis and used ABBA to identify >1,000 disease-associated DMRs. Investigation of these DMRs revealed differential DNA methylation localized to a 600bp region in the promoter of the Ifitm3 gene. This was confirmed by ChIP-seq and RNA-seq analyses, showing differential transcription factor binding at the Ifitm3 promoter by JunD (an established determinant of glomerulonephritis) and a consistent change in Ifitm3 expression. Our ABBA analysis allowed us to propose a new role for Ifitm3 in the pathogenesis of glomerulonephritis via a mechanism involving promoter hypermethylation that is associated with Ifitm3 repression in the rat strain susceptible to glomerulonephritis.

Download Full-text

m5CPred-SVM: A Novel Method for Predicting m5C Sites of RNA

10.21203/rs.3.rs-39526/v2 ◽

2020 ◽

Author(s):

Xiao Chen ◽

Yi Xiong ◽

Yinbo Liu ◽

Yuqing Chen ◽

Shoudong Bi ◽

...

Keyword(s):

Cell Fate ◽

Prediction Accuracy ◽

Cytosine Methylation ◽

Low Cost ◽

Computational Method ◽

Selection Strategy ◽

Support Vector ◽

Feature Subset ◽

Biological Functions ◽

Accurate Identification

Abstract Background: As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement. Results: In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.Conclusion: In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at http://zhulab.ahu.edu.cn/m5CPred-SVM.

Download Full-text

Role of Main RNA Methylation in Hepatocellular Carcinoma: N6-Methyladenosine, 5-Methylcytosine, and N1-Methyladenosine

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.767668 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yating Xu ◽

Menggang Zhang ◽

Qiyao Zhang ◽

Xiao Yu ◽

Zongzong Sun ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Cellular Differentiation ◽

Molecular Mechanisms ◽

Gene Sequence ◽

Epigenetic Modification ◽

Biological Processes ◽

Biological Functions ◽

Rna Detection ◽

Rna Methylation

RNA methylation is considered a significant epigenetic modification, a process that does not alter gene sequence but may play a necessary role in multiple biological processes, such as gene expression, genome editing, and cellular differentiation. With advances in RNA detection, various forms of RNA methylation can be found, including N6-methyladenosine (m6A), N1-methyladenosine (m1A), and 5-methylcytosine (m5C). Emerging reports confirm that dysregulation of RNA methylation gives rise to a variety of human diseases, particularly hepatocellular carcinoma. We will summarize essential regulators of RNA methylation and biological functions of these modifications in coding and noncoding RNAs. In conclusion, we highlight complex molecular mechanisms of m6A, m5C, and m1A associated with hepatocellular carcinoma and hope this review might provide therapeutic potent of RNA methylation to clinical research.

Download Full-text

III/V-on-Si MQW lasers by using a novel photonic integration method of regrowth on a bonding template

Light Science & Applications ◽

10.1038/s41377-019-0202-6 ◽

2019 ◽

Vol 8 (1) ◽

Cited By ~ 9

Author(s):

Yingtao Hu ◽

Di Liang ◽

Kunal Mukherjee ◽

Youli Li ◽

Chong Zhang ◽

...

Keyword(s):

Dislocation Density ◽

High Performance ◽

Flip Chip ◽

Continuous Wave ◽

Integration Method ◽

Light Emission ◽

Threshold Current ◽

Photonic Integration ◽

Practical Applications ◽

Low Dislocation Density

Abstract Silicon photonics is becoming a mainstream data-transmission solution for next-generation data centers, high-performance computers, and many emerging applications. The inefficiency of light emission in silicon still requires the integration of a III/V laser chip or optical gain materials onto a silicon substrate. A number of integration approaches, including flip-chip bonding, molecule or polymer wafer bonding, and monolithic III/V epitaxy, have been extensively explored in the past decade. Here, we demonstrate a novel photonic integration method of epitaxial regrowth of III/V on a III/V-on-SOI bonding template to realize heterogeneous lasers on silicon. This method decouples the correlated root causes, i.e., lattice, thermal, and domain mismatches, which are all responsible for a large number of detrimental dislocations in the heteroepitaxy process. The grown multi-quantum well vertical p–i–n diode laser structure shows a significantly low dislocation density of 9.5 × 104 cm−2, two orders of magnitude lower than the state-of-the-art conventional monolithic growth on Si. This low dislocation density would eliminate defect-induced laser lifetime concerns for practical applications. The fabricated lasers show room-temperature pulsed and continuous-wave lasing at 1.31 μm, with a minimal threshold current density of 813 A/cm2. This generic concept can be applied to other material systems to provide higher integration density, more functionalities and lower total cost for photonics as well as microelectronics, MEMS, and many other applications.

Download Full-text

Accurate and efficient gene function prediction using a multi-bacterial network

Bioinformatics ◽

10.1093/bioinformatics/btaa885 ◽

2020 ◽

Author(s):

Jeffrey N Law ◽

Shiv D Kale ◽

T M Murali

Keyword(s):

Gene Function ◽

Bacterial Species ◽

Heterogeneous Data ◽

Function Prediction ◽

Label Propagation ◽

Supplementary Information ◽

Gene Function Prediction ◽

Functional Annotations ◽

A Genome ◽

Multiple Species

Abstract Motivation Nearly 40% of the genes in sequenced genomes have no experimentally or computationally derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally based functional annotations and systematically transfer them to newly sequenced organisms on a genome-wide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. Results We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under 3 min for 200 bacterial species. Availability and implementation An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Advances in the role of m6A RNA modification in cancer metabolic reprogramming

Cell & Bioscience ◽

10.1186/s13578-020-00479-z ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Xiu Han ◽

Lin Wang ◽

Qingzhen Han

Keyword(s):

Cancer Cells ◽

Epigenetic Modification ◽

Metabolic Reprogramming ◽

Regulatory Mechanisms ◽

Rna Modification ◽

Biological Functions ◽

Rna Transcription ◽

Cellular Processes ◽

Metabolic Genes

Abstract N6-methyladenosine (m6A) modification is the most common internal modification of eukaryotic mRNA and is widely involved in many cellular processes, such as RNA transcription, splicing, nuclear transport, degradation, and translation. m6A has been shown to plays important roles in the initiation and progression of various cancers. The altered metabolic programming of cancer cells promotes their cell-autonomous proliferation and survival, leading to an indispensable hallmark of cancers. Accumulating evidence has demonstrated that this epigenetic modification exerts extensive effects on the cancer metabolic network by either directly regulating the expression of metabolic genes or modulating metabolism-associated signaling pathways. In this review, we summarized the regulatory mechanisms and biological functions of m6A and its role in cancer metabolic reprogramming.

Download Full-text