MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model

Author(s):  
Cong Pian ◽  
Guangle Zhang ◽  
Fei Li ◽  
Xiaodan Fan

Abstract Motivation Recent studies have shown that DNA N6-methyladenine (6mA) plays an important role in epigenetic modification of eukaryotic organisms. It has been found that 6mA is closely related to embryonic development, stress response and so on. Developing a new algorithm to quickly and accurately identify 6mA sites in genomes is important for explore their biological functions. Results In this paper, we proposed a new classification method called MM-6mAPred based on a Markov model which makes use of the transition probability between adjacent nucleotides to identify 6mA site. The sensitivity and specificity of our method are 89.32% and 90.11%, respectively. The overall accuracy of our method is 89.72%, which is 6.59% higher than that of the previous method i6mA-Pred. It indicated that, compared with the 41 nucleotide chemical properties used by i6mA-Pred, the transition probability between adjacent nucleotides can capture more discriminant sequence information. Availability and implementation The web server of MM-6mAPred is freely accessible at http://www.insect-genome.com/MM-6mAPred/ Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (14) ◽  
pp. 4103-4105 ◽  
Author(s):  
Jiali Yang ◽  
Kun Lang ◽  
Guangle Zhang ◽  
Xiaodan Fan ◽  
Yuanyuan Chen ◽  
...  

Abstract Motivation DNA N4-methylcytosine (4mC) modification is an important epigenetic modification in prokaryotic DNA due to its role in regulating DNA replication and protecting the host DNA against degradation. An efficient algorithm to identify 4mC sites is needed for downstream analyses. Results In this study, we propose a new prediction method named SOMM4mC based on a second-order Markov model, which makes use of the transition probability between adjacent nucleotides to identify 4mC sites. The results show that the first-order and second-order Markov model are superior to the three existing algorithms in all six species (Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterruneus and Geobacter pickeringii) where benchmark datasets are available. However, the classification performance of SOMM4mC is more outstanding than that of first-order Markov model. Especially, for E.coli and C.elegans, the overall accuracy of SOMM4mC are 91.8% and 87.6%, which are 8.5% and 6.1% higher than those of the latest method 4mcPred-SVM, respectively. This shows that more discriminant sequence information is captured by SOMM4mC through the dependency between adjacent nucleotides. Availability and implementation The web server of SOMM4mC is freely accessible at www.insect-genome.com/SOMM4mC. Contact [email protected] or [email protected]


2020 ◽  
Vol 36 (11) ◽  
pp. 3327-3335 ◽  
Author(s):  
Qiang Tang ◽  
Juanjuan Kang ◽  
Jiaqing Yuan ◽  
Hua Tang ◽  
Xianhai Li ◽  
...  

Abstract Motivation DNA N4-methylcytosine (4mC) is a crucial epigenetic modification. However, the knowledge about its biological functions is limited. Effective and accurate identification of 4mC sites will be helpful to reveal its biological functions and mechanisms. Since experimental methods are cost and ineffective, a number of machine learning-based approaches have been proposed to detect 4mC sites. Although these methods yielded acceptable accuracy, there is still room for the improvement of the prediction performance and the stability of existing methods in practical applications. Results In this work, we first systematically assessed the existing methods based on an independent dataset. And then, we proposed DNA4mC-LIP, a linear integration method by combining existing predictors to identify 4mC sites in multiple species. The results obtained from independent dataset demonstrated that DNA4mC-LIP outperformed existing methods for identifying 4mC sites. To facilitate the scientific community, a web server for DNA4mC-LIP was developed. We anticipated that DNA4mC-LIP could serve as a powerful computational technique for identifying 4mC sites and facilitate the interpretation of 4mC mechanism. Availability and implementation http://i.uestc.edu.cn/DNA4mC-LIP/. Contact [email protected] or [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Juan Xiong ◽  
Qiyu Fang ◽  
Jialing Chen ◽  
Yingxin Li ◽  
Huiyi Li ◽  
...  

Background: Postpartum depression (PPD) has been recognized as a severe public health problem worldwide due to its high incidence and the detrimental consequences not only for the mother but for the infant and the family. However, the pattern of natural transition trajectories of PPD has rarely been explored. Methods: In this research, a quantitative longitudinal study was conducted to explore the PPD progression process, providing information on the transition probability, hazard ratio, and the mean sojourn time in the three postnatal mental states, namely normal state, mild PPD, and severe PPD. The multi-state Markov model was built based on 912 depression status assessments in 304 Chinese primiparous women over multiple time points of six weeks postpartum, three months postpartum, and six months postpartum. Results: Among the 608 PPD status transitions from one visit to the next visit, 6.2% (38/608) showed deterioration of mental status from the level at the previous visit; while 40.0% (243/608) showed improvement at the next visit. A subject in normal state who does transition then has a probability of 49.8% of worsening to mild PPD, and 50.2% to severe PPD. A subject with mild PPD who does transition has a 20.0% chance of worsening to severe PPD. A subject with severe PPD is more likely to improve to mild PPD than developing to the normal state. On average, the sojourn time in the normal state, mild PPD, and severe PPD was 64.12, 6.29, and 9.37 weeks, respectively. Women in normal state had 6.0%, 8.5%, 8.7%, and 8.8% chances of progress to severe PPD within three months, nine months, one year, and three years, respectively. Increased all kinds of supports were associated with decreased risk of deterioration from normal state to severe PPD (hazard ratio, HR: 0.42–0.65); and increased informational supports, evaluation of support, and maternal age were associated with alleviation from severe PPD to normal state (HR: 1.46–2.27). Conclusions: The PPD state transition probabilities caused more attention and awareness about the regular PPD screening for postnatal women and the timely intervention for women with mild or severe PPD. The preventive actions on PPD should be conducted at the early stages, and three yearly; at least one yearly screening is strongly recommended. Emotional support, material support, informational support, and evaluation of support had significant positive associations with the prevention of PPD progression transitions. The derived transition probabilities and sojourn time can serve as an importance reference for health professionals to make proactive plans and target interventions for PPD.


2015 ◽  
Vol 32 (6) ◽  
pp. 835-842 ◽  
Author(s):  
Filippo Utro ◽  
Valeria Di Benedetto ◽  
Davide F.V. Corona ◽  
Raffaele Giancarlo

Abstract Motivation: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. Results: We contribute to close this important methodological gap between the two models by providing three very simple formulas for the sequence specific one. They are all based on well-known formulas in Computer Science and Bioinformatics, and they give different quantifications of how complex a sequence is. In view of how remarkably well they perform, it is very surprising that measures of sequence complexity have not even been considered as candidates to close the mentioned gap. We provide experimental evidence that the intrinsic level of combinatorial organization and information-theoretic content of subsequences within a genome are strongly correlated to the level of DNA encoded nucleosome organization discovered by Kaplan et al. Our results establish an important connection between the intrinsic complexity of subsequences in a genome and the intrinsic, i.e. DNA encoded, nucleosome organization of eukaryotic genomes. It is a first step towards a mathematical characterization of this latter ‘encoding’. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: [email protected].


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259670
Author(s):  
Albertas Dvirnas ◽  
Callum Stewart ◽  
Vilhelm Müller ◽  
Santosh Kumar Bikkarolla ◽  
Karolin Frykholm ◽  
...  

Large-scale genomic alterations play an important role in disease, gene expression, and chromosome evolution. Optical DNA mapping (ODM), commonly categorized into sparsely-labelled ODM and densely-labelled ODM, provides sequence-specific continuous intensity profiles (DNA barcodes) along single DNA molecules and is a technique well-suited for detecting such alterations. For sparsely-labelled barcodes, the possibility to detect large genomic alterations has been investigated extensively, while densely-labelled barcodes have not received as much attention. In this work, we introduce HMMSV, a hidden Markov model (HMM) based algorithm for detecting structural variations (SVs) directly in densely-labelled barcodes without access to sequence information. We evaluate our approach using simulated data-sets with 5 different types of SVs, and combinations thereof, and demonstrate that the method reaches a true positive rate greater than 80% for randomly generated barcodes with single variations of size 25 kilobases (kb). Increasing the length of the SV further leads to larger true positive rates. For a real data-set with experimental barcodes on bacterial plasmids, we successfully detect matching barcode pairs and SVs without any particular assumption of the types of SVs present. Instead, our method effectively goes through all possible combinations of SVs. Since ODM works on length scales typically not reachable with other techniques, our methodology is a promising tool for identifying arbitrary combinations of genomic alterations.


2020 ◽  
Author(s):  
Sandra Rizk ◽  
Petra Henke ◽  
Carlos Santana-Molina ◽  
Gesa Martens ◽  
Marén Gnädig ◽  
...  

AbstractHopanoids and carotenoids are two of the major isoprenoid-derived lipid classes in prokaryotes that have been proposed to have similar membrane ordering properties as sterols. Methylobacterium extorquens contains hopanoids and carotenoids in their outer membrane, making them an ideal system to investigate whether isoprenoid lipids play a complementary role in outer membrane ordering and cellular fitness. By genetically knocking out hpnE, and crtB we disrupted the production of squalene, and phytoene in Methylobacterium extorquens PA1, which are the presumed precursors for hopanoids and carotenoids, respectively. Deletion of hpnE unexpectedly revealed that carotenoid biosynthesis utilizes squalene as a precursor resulting in a pigmentation with a C30 backbone, rather than the previously predicted C40 phytoene-derived pathway. We demonstrate that hopanoids but not carotenoids are essential for growth at high temperature. However, disruption of either carotenoid or hopanoid synthesis leads to opposing effects on outer membrane lipid packing. These observations show that hopanoids and carotenoids may serve complementary biophysical roles in the outer membrane. Phylogenetic analysis suggests that M. extorquens may have acquired the C30 pathway through lateral gene transfer with Planctomycetes. This suggests that the C30 carotenoid pathway may have provided an evolutionary advantage to M. extorquens.ImportanceAll cells have a membrane that delineates the boundary between life and its environment. To function properly, membranes must maintain a delicate balance of physical and chemical properties. Lipids play a crucial role in tuning membrane properties. In eukaryotic organisms from yeast to mammals, sterols are essential for assembling a cell surface membrane that can support life. However, bacteria generally do not make sterols, so how do they solve this problem? Hopanoids and carotenoids are two major bacterial lipids, that are proposed as sterol surrogates. In this study we explore the bacterium M. extorquens for studying the role of hopanoids and carotenoids in surface membrane properties and cellular growth. Our findings suggest that hopanoids and carotenoids may serve complementary roles balancing outer membrane properties, and provide a foundation for elucidating the principles of surface membrane adaptation.


Author(s):  
Yating Xu ◽  
Menggang Zhang ◽  
Qiyao Zhang ◽  
Xiao Yu ◽  
Zongzong Sun ◽  
...  

RNA methylation is considered a significant epigenetic modification, a process that does not alter gene sequence but may play a necessary role in multiple biological processes, such as gene expression, genome editing, and cellular differentiation. With advances in RNA detection, various forms of RNA methylation can be found, including N6-methyladenosine (m6A), N1-methyladenosine (m1A), and 5-methylcytosine (m5C). Emerging reports confirm that dysregulation of RNA methylation gives rise to a variety of human diseases, particularly hepatocellular carcinoma. We will summarize essential regulators of RNA methylation and biological functions of these modifications in coding and noncoding RNAs. In conclusion, we highlight complex molecular mechanisms of m6A, m5C, and m1A associated with hepatocellular carcinoma and hope this review might provide therapeutic potent of RNA methylation to clinical research.


2005 ◽  
Vol 69 (2) ◽  
pp. 306-325 ◽  
Author(s):  
Elvira Khalikova ◽  
Petri Susi ◽  
Timo Korpela

SUMMARY Dextran is a chemically and physically complex polymer, breakdown of which is carried out by a variety of endo- and exodextranases. Enzymes in many groups can be classified as dextranases according to function: such enzymes include dextranhydrolases, glucodextranases, exoisomaltohydrolases, exoisomaltotriohydrases, and branched-dextran exo-1,2-α-glucosidases. Cycloisomalto-oligosaccharide glucanotransferase does not formally belong to the dextranases even though its side reaction produces hydrolyzed dextrans. A new classification system for glycosylhydrolases and glycosyltransferases, which is based on amino acid sequence similarities, divides the dextranases into five families. However, this classification is still incomplete since sequence information is missing for many of the enzymes that have been biochemically characterized as dextranases. Dextran-degrading enzymes have been isolated from a wide range of microorganisms. The major characteristics of these enzymes, the methods for analyzing their activities and biological roles, analysis of primary sequence data, and three-dimensional structures of dextranases have been dealt with in this review. Dextranases are promising for future use in various scientific and biotechnological applications.


2019 ◽  
Vol 36 (1) ◽  
pp. 272-279 ◽  
Author(s):  
Hannah F Löchel ◽  
Dominic Eger ◽  
Theodor Sperlea ◽  
Dominik Heider

AbstractMotivationClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.ResultsWe could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.Availability and implementationhttps://cran.r-project.org/.Supplementary informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document