scholarly journals Accurate Prediction of B-form/A-form DNA Conformation Propensity from Primary Sequence: A Machine Learning and Free energy Handshake

Author(s):  
Abhijit Gupta ◽  
Mandar Kulkarni ◽  
Arnab Mukherjee

<div> <div> <p>DNA carries the genetic code of life. Different conformations of DNA are associated with various biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. Although a few efforts were made in this regard, currently there exists no method that can accurately predict the conformation of right-handed DNA solely from the sequence. In this study, we present a novel approach based on machine learning that predicts A-DNA and B-DNA conformational propensities of a sequence with high accuracy (~<a>93</a>%). In addition, we show that the impact of the dinucleotide steps in determining the conformation agrees qualitatively with the free energy cost for A-DNA formation in water. We are hopeful that our methodology can be employed on segments of the genomic sequence to understand the prospective biological roles played by the A-form of DNA.</p><p> </p><div> <br><div><div> </div> </div> </div> </div> </div>

2020 ◽  
Author(s):  
Abhijit Gupta ◽  
Mandar Kulkarni ◽  
Arnab Mukherjee

<div> <div> <div> <p>DNA carries the genetic code of life. Different conformations of DNA are associated with various biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. Although a few efforts were made in this regard, currently there exists no method that can accurately predict the conformation of right- handed DNA solely from the sequence. In this study, we present a novel approach based on machine learning that predicts A-DNA and B-DNA conformational propensities of a sequence with high accuracy (~95%). In addition, we show that the impact of the dinucleotide steps in determining the conformation agrees qualitatively with the free energy cost for A-DNA formation in water. This method enables us to examine the genomic sequence to understand the prospective biological roles played by the A-form of DNA. </p> </div> </div> </div>


2020 ◽  
Author(s):  
Abhijit Gupta ◽  
Mandar Kulkarni ◽  
Arnab Mukherjee

<div> <div> <p>DNA carries the genetic code of life. Different conformations of DNA are associated with various biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. Although a few efforts were made in this regard, currently there exists no method that can accurately predict the conformation of right-handed DNA solely from the sequence. In this study, we present a novel approach based on machine learning that predicts A-DNA and B-DNA conformational propensities of a sequence with high accuracy (~<a>93</a>%). In addition, we show that the impact of the dinucleotide steps in determining the conformation agrees qualitatively with the free energy cost for A-DNA formation in water. We are hopeful that our methodology can be employed on segments of the genomic sequence to understand the prospective biological roles played by the A-form of DNA.</p><p> </p><div> <br><div><div> </div> </div> </div> </div> </div>


2020 ◽  
Author(s):  
Abhijit Gupta ◽  
Mandar Kulkarni ◽  
Arnab Mukherjee

<div> <div> <div> <p>DNA carries the genetic code of life. Different conformations of DNA are associated with various biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. Although a few efforts were made in this regard, currently there exists no method that can accurately predict the conformation of right- handed DNA solely from the sequence. In this study, we present a novel approach based on machine learning that predicts A-DNA and B-DNA conformational propensities of a sequence with high accuracy (~95%). In addition, we show that the impact of the dinucleotide steps in determining the conformation agrees qualitatively with the free energy cost for A-DNA formation in water. This method enables us to examine the genomic sequence to understand the prospective biological roles played by the A-form of DNA. </p> </div> </div> </div>


2019 ◽  
Vol 2019 (2) ◽  
pp. 47-65
Author(s):  
Balázs Pejó ◽  
Qiang Tang ◽  
Gergely Biczók

Abstract Machine learning algorithms have reached mainstream status and are widely deployed in many applications. The accuracy of such algorithms depends significantly on the size of the underlying training dataset; in reality a small or medium sized organization often does not have the necessary data to train a reasonably accurate model. For such organizations, a realistic solution is to train their machine learning models based on their joint dataset (which is a union of the individual ones). Unfortunately, privacy concerns prevent them from straightforwardly doing so. While a number of privacy-preserving solutions exist for collaborating organizations to securely aggregate the parameters in the process of training the models, we are not aware of any work that provides a rational framework for the participants to precisely balance the privacy loss and accuracy gain in their collaboration. In this paper, by focusing on a two-player setting, we model the collaborative training process as a two-player game where each player aims to achieve higher accuracy while preserving the privacy of its own dataset. We introduce the notion of Price of Privacy, a novel approach for measuring the impact of privacy protection on the accuracy in the proposed framework. Furthermore, we develop a game-theoretical model for different player types, and then either find or prove the existence of a Nash Equilibrium with regard to the strength of privacy protection for each player. Using recommendation systems as our main use case, we demonstrate how two players can make practical use of the proposed theoretical framework, including setting up the parameters and approximating the non-trivial Nash Equilibrium.


2019 ◽  
Author(s):  
Charles Curnin ◽  
Rachel L. Goldfeder ◽  
Shruti Marwaha ◽  
Devon Bonner ◽  
Daryl Waggott ◽  
...  

AbstractInsertions and deletions (indels) make a critical contribution to human genetic variation. While indel calling has improved significantly, it lags dramatically in performance relative to single-nucleotide variant calling, something of particular concern for clinical genomics where larger scale disruption of the open reading frame can commonly cause disease. Here, we present a machine learning-based approach to the detection of indel breakpoints called Scotch. This novel approach improves sensitivity to larger variants dramatically by leveraging sequencing metrics and signatures of poor read alignment. We also introduce a meta-analytic indel caller, called Metal, that performs a “smart intersection” of Scotch and currently available tools to be maximally sensitive to large variants. We use new benchmark datasets and Sanger sequencing to compare Scotch and Metal to current gold standard indel callers, achieving unprecedented levels of precision and recall. We demonstrate the impact of these improvements by applying this tool to a cohort of patients with undiagnosed disease, generating plausible novel candidates in 21 out of 26 undiagnosed cases. We highlight the diagnosis of one patient with a 498-bp deletion in HNRNPA1 missed by traditional indel-detection tools.


Author(s):  
Dennis Collaris ◽  
Jarke J. van Wijk

Abstract The field of explainable artificial intelligence aims to help experts understand complex machine learning models. One key approach is to show the impact of a feature on the model prediction. This helps experts to verify and validate the predictions the model provides. However, many challenges remain open. For example, due to the subjective nature of interpretability, a strict definition of concepts such as the contribution of a feature remains elusive. Different techniques have varying underlying assumptions, which can cause inconsistent and conflicting views. In this work, we introduce local and global contribution-value plots as a novel approach to visualize feature impact on predictions and the relationship with feature value. We discuss design decisions and show an exemplary visual analytics implementation that provides new insights into the model. We conducted a user study and found the visualizations aid model interpretation by increasing correctness and confidence and reducing the time taken to obtain an insight. Graphic Abstract


Author(s):  
Munsif Ali Jatoi ◽  
Fayaz Ali Dharejo ◽  
Sadam Hussain Teevino

Background: The Brain is the most complex organ of human body with millions of connections and activations. The electromagnetic signals are generated inside the brain due to a mental or physical task performed. These signals excite a bunch of neurons within a particular lobe depending upon nature of task performed. To localize this activity, certain machine learning (ML) techniques in conjunction with a neuroimaging technique (M/EEG, fMRI, PET) are developed. Different ML techniques are provided in literature for brain source localization. Among them, the most common are: minimum norm estimation (MNE), low resolution brain electromagnetic tomography (LORETA) and Bayesian framework based multiple sparse priors (MSP). Aims: In this research work, EEG is used as a neuroimaging technique. Methods: EEG data is synthetically generated at SNR=5dB. Afterwards, ML techniques are applied to estimate the active sources. Each dataset is run for multiple trials (>40). The performance is analyzed using free energy and localization error as performance indicators. Furthermore, MSP is applied with variant number of patches to observe the impact of patches on source localization. Results: It is observed that with increased number of patches, the sources are localized with more precision and accuracy as expressed in terms of free energy and localization error respectively. Conclusion: The patches optimization within Bayesian Framework produces improved results in terms of free energy and localization error.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


Sign in / Sign up

Export Citation Format

Share Document