Mapping Local Conformational Landscapes of Proteins in Solution

Mapping Intimacies ◽

10.1101/273607 ◽

2018 ◽

Author(s):

M. ElGamacy ◽

M. Riss ◽

H. Zhu ◽

V. Truffault ◽

M. Coles

Keyword(s):

Model Building ◽

De Novo ◽

Experimental Spectrum ◽

Signal Decomposition ◽

Conformational Space ◽

Sufficient Information ◽

Initial Model ◽

Physiological Conditions ◽

Noesy Experiment ◽

Noesy Spectra

SummaryThe ability of proteins to adopt multiple conformational states is essential to their function and elucidating the details of such diversity under physiological conditions has been a major challenge. Here we present a generalized method for mapping protein population landscapes by NMR spectroscopy. Experimental NOESY spectra are directly compared to a set of expectation spectra back-calculated across an arbitrary conformational space. Signal decomposition of the experimental spectrum then directly yields the relative populations of local conformational microstates. In this way, averaged descriptions of conformation can be eliminated. As the method quantitatively compares experimental and expectation spectra, it inherently delivers an R-factor expressing how well structural models explain the input data. We demonstrate that our method extracts sufficient information from a single 3D NOESY experiment to perform initial model building, refinement and validation, thus offering a complete de novo structure determination protocol.

An Unbiased Predictive Model to Detect DNA Methylation Propensity of CpG Islands in the Human Genome

Current Bioinformatics ◽

10.2174/1574893615999200724145835 ◽

2020 ◽

Vol 15 ◽

Author(s):

Dicle Yalcin ◽

Hasan H. Otu

Keyword(s):

Model Building ◽

De Novo ◽

Cpg Islands ◽

Treatment Strategies ◽

Area Under The Curve ◽

Global Methylation ◽

Sequence Features ◽

A Genome ◽

Combined Features ◽

Epigenetic Repression

Background: Epigenetic repression mechanisms play an important role in gene regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility or resistance to methylation are shown to be contributed by local DNA sequence features. Objective: To develop unbiased machine learning models–individually and combined for different biological features–that predict the methylation propensity of a CGI. Methods: We developed our model consisting of CGI sequence features on a dataset of 75 sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested our model on two independent datasets that are chromosome (132 sequences) and disease (70 sequences) specific. Results: We provided improvements in prediction accuracy over previous models. Our results indicate that combined features better predict the methylation propensity of a CGI (area under the curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select sequences that better represent their classes in the training set. We report certain de novo motifs and transcription factor binding site (TFBS) motifs that are consistently better in separating prone and resistant CGIs. Conclusion: Predictive models for the methylation propensity of CGIs lead to a better understanding of disease mechanisms and can be used to classify genes based on their tendency to contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB and Python™ scripts used for model building, prediction, and downstream analyses are available at https://github.com/dicleyalcin/methylProp_predictor.

nanotatoR: a tool for enhanced annotation of genomic structural variants

BMC Genomics ◽

10.1186/s12864-020-07182-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Surajit Bhattacharya ◽

Hayk Barseghyan ◽

Emmanuèle C. Délot ◽

Eric Vilain

Keyword(s):

De Novo ◽

Genome Mapping ◽

Gene List ◽

Sufficient Information ◽

Rna Seq ◽

Structural Variants ◽

De Novo Genome Assembly ◽

Pathogenic Variants ◽

Increased Sensitivity

Abstract Background Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. Results We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. Conclusions The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.

S-SAD phasing on SOLEIL Beamline PROXIMA 1

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314093905 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C609-C609

Author(s):

Patrick Gourhant ◽

Beatriz Guimaraes ◽

Tatiana Isabet ◽

Sebastian Klinke ◽

Pierre Legrand ◽

...

Keyword(s):

Model Building ◽

De Novo ◽

Signal To Noise Ratio ◽

Beam Line ◽

Synchrotron Source ◽

Combining Data ◽

Low Energies ◽

Sad Phasing ◽

Multiple Sample ◽

Data Collections

"PROXIMA 1, a beamline for macro-molecular crystallography at the 3rd generation synchrotron source SOLEIL, is equipped with a multi-circle goniometer (alpha 50 degrees) as well as a PILATUS 6M detector. These features, along with the extended energy range of the beam line towards the low energies (down to 5.5 keV) and the possibility to adapt the source size to the sample in order to optimize signal to noise ratio, have made the beam line very attractive for S-SAD phasing with more than seven examples of successful de novo phasing achieved over the last two years. The use of low energies has also proved a significant aid in assisting with MODEL building. The technical capabilities of the beam line for low energy data collections will be presented, along with a number of examples of the successful use of low wavelengths on the beam line. The importance of combining data from multiple sample orientations in order to achieve ""true multiplicity"" will be highlighted, as well as the importance of combining data from multiple crystals in order to achieve high multiplicity."

iBitter-Fuse: A Novel Sequence-Based Bitter Peptide Predictor by Fusing Multi-View Features

International Journal of Molecular Sciences ◽

10.3390/ijms22168958 ◽

2021 ◽

Vol 22 (16) ◽

pp. 8958

Author(s):

Phasit Charoenkwan ◽

Chanin Nantasenamat ◽

Md. Mehedi Hasan ◽

Mohammad Ali Moni ◽

Pietro Lio’ ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

De Novo ◽

Predictive Performance ◽

Support Vector ◽

Sufficient Information ◽

Self Assessment ◽

Accurate Identification ◽

Bitter Peptides ◽

Accurate Performance

Accurate identification of bitter peptides is of great importance for better understanding their biochemical and biophysical properties. To date, machine learning-based methods have become effective approaches for providing a good avenue for identifying potential bitter peptides from large-scale protein datasets. Although few machine learning-based predictors have been developed for identifying the bitterness of peptides, their prediction performances could be improved. In this study, we developed a new predictor (named iBitter-Fuse) for achieving more accurate identification of bitter peptides. In the proposed iBitter-Fuse, we have integrated a variety of feature encoding schemes for providing sufficient information from different aspects, namely consisting of compositional information and physicochemical properties. To enhance the predictive performance, the customized genetic algorithm utilizing self-assessment-report (GA-SAR) was employed for identifying informative features followed by inputting optimal ones into a support vector machine (SVM)-based classifier for developing the final model (iBitter-Fuse). Benchmarking experiments based on both 10-fold cross-validation and independent tests indicated that the iBitter-Fuse was able to achieve more accurate performance as compared to state-of-the-art methods. To facilitate the high-throughput identification of bitter peptides, the iBitter-Fuse web server was established and made freely available online. It is anticipated that the iBitter-Fuse will be a useful tool for aiding the discovery and de novo design of bitter peptides

SUPERMAN attenuates positive INNER NO OUTER autoregulation to maintain polar development of Arabidopsis ovule outer integuments

Development ◽

10.1242/dev.129.18.4281 ◽

2002 ◽

Vol 129 (18) ◽

pp. 4281-4289

Author(s):

Robert J. Meister ◽

Louren M. Kotow ◽

Charles S. Gasser

Keyword(s):

De Novo ◽

Outer Integument ◽

Negative Regulator ◽

Sufficient Information ◽

Coding Region ◽

Adaxial Side ◽

Abaxial Side ◽

Protein Coding ◽

Plant Parts ◽

Reporter Gene Analysis

The outer integument of Arabidopsis ovules exhibits marked polarity in its development, growing extensively from the abaxial side, but only to a very limited extent from the adaxial side of the ovule. Mutations in two genes affect this asymmetric growth. In strong inner no outer (ino) mutants outer integument growth is eliminated, whereas in superman (sup) mutants integument growth on the adaxial side is nearly equal to wild-type growth on the abaxial side. Through complementation and reporter gene analysis, a region of INO 5′-flanking sequences was identified that contains sufficient information for appropriate expression of INO. Using this INO promoter (P-INO) we show that INO acts as a positive regulator of transcription from P-INO, but is not sufficient for de novo initiation of transcription in other plant parts. Protein fusions demonstrate nuclear localization of INO, consistent with a proposed role as a transcription factor for this member of the YABBY protein family. Through its ability to inhibit expression of the endogenous INO gene and transgenes driven by P-INO, SUP is shown to be a negative regulator of INO transcription. Substitution of another YABBY protein coding region (CRABS CLAW) for INO overcomes this negative regulation, indicating that SUP suppresses INO transcription through attenuation of the INO positive autoregulatory loop.

Geological Structure-Guided Initial Model Building for Prestack AVO/AVA Inversion

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2020.2998044 ◽

2020 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Guangtan Huang ◽

Xiaohong Chen ◽

Cong Luo ◽

Yangkang Chen

Keyword(s):

Model Building ◽

Geological Structure ◽

Initial Model

Re-evaluation of low-resolution crystal structuresviainteractive molecular-dynamics flexible fitting (iMDFF): a case study in complement C4

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798316012201 ◽

2016 ◽

Vol 72 (9) ◽

pp. 1006-1016 ◽

Cited By ~ 14

Author(s):

Tristan Ian Croll ◽

Gregers Rom Andersen

Keyword(s):

Molecular Dynamics ◽

Model Building ◽

Current Model ◽

Data Bank ◽

Error Rates ◽

Conformational Space ◽

Low Resolution ◽

Acceptable Error ◽

Complement C4 ◽

Chain Conformations

While the rapid proliferation of high-resolution structures in the Protein Data Bank provides a rich set of templates for starting models, it remains the case that a great many structures both past and present are built at least in part by hand-threading through low-resolution and/or weak electron density. With current model-building tools this task can be challenging, and thede factostandard for acceptable error rates (in the form of atomic clashes and unfavourable backbone and side-chain conformations) in structures based on data withdmaxnot exceeding 3.5 Å reflects this. When combined with other factors such as model bias, these residual errors can conspire to make more serious errors in the protein fold difficult or impossible to detect. The three recently published 3.6–4.2 Å resolution structures of complement C4 (PDB entries 4fxg, 4fxk and 4xam) rank in the top quartile of structures of comparable resolution both in terms ofRfreeandMolProbityscore, yet, as shown here, contain register errors in six β-strands. By applying a molecular-dynamics force field that explicitly models interatomic forces and hence excludes most physically impossible conformations, the recently developed interactive molecular-dynamics flexible fitting (iMDFF) approach significantly reduces the complexity of the conformational space to be searched during manual rebuilding. This substantially improves the rate of detection and correction of register errors, and allows user-guided model building in maps with a resolution lower than 3.5 Å to converge to solutions with a stereochemical quality comparable to atomic resolution structures. Here, iMDFF has been used to individually correct and re-refine these three structures toMolProbityscores of <1.7, and strategies for working with such challenging data sets are suggested. Notably, the improved model allowed the resolution for complement C4b to be extended from 4.2 to 3.5 Å as demonstrated by paired refinement.

de Novo Protein Structure Prediction using Fragment Based Potential and Conformational Space Annealing

Biophysical Journal ◽

10.1016/j.bpj.2009.12.2503 ◽

2010 ◽

Vol 98 (3) ◽

pp. 461a ◽

Cited By ~ 1

Author(s):

Juyong Lee ◽

Masaki Sasai ◽

Chaok Seok ◽

Jooyoung Lee

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

De Novo ◽

Conformational Space

An Acid-activated Nucleobase Transporter from Leishmania major

Journal of Biological Chemistry ◽

10.1074/jbc.m109.006718 ◽

2009 ◽

Vol 284 (24) ◽

pp. 16164-16169 ◽

Cited By ~ 24

Author(s):

Diana Ortiz ◽

Marco A. Sanchez ◽

Hans P. Koch ◽

H. Peter Larsson ◽

Scott M. Landfear

Keyword(s):

Leishmania Major ◽

De Novo ◽

Low Ph ◽

Acidic Ph ◽

Physiological Conditions ◽

Neutral Ph ◽

Amastigote Stage ◽

Electrode Voltage ◽

Ph Conditions ◽

Micromolar Range

Parasitic protozoa are unable to synthesize purines de novo and must import preformed purine nucleobases or nucleosides from their hosts. Leishmania major expresses two purine nucleobase transporters, LmaNT3 and LmaNT4. Previous studies revealed that at neutral pH, LmaNT3 is a broad specificity, high affinity nucleobase transporter, whereas LmaNT4 mediates the uptake of only adenine. Because LmaNT4 is required for optimal viability of the amastigote stage of the parasite that lives within acidified phagolysomal vesicles of mammalian macrophages, the function of this permease was examined under acidic pH conditions. At acidic pH, LmaNT4 acquires the ability to transport adenine, hypoxanthine, guanine, and xanthine with Km values in the micromolar range, indicating that this transporter is activated at low pH. Thus, LmaNT4 is an acid-activated purine nucleobase transporter that functions optimally under the physiological conditions the parasite is exposed to in the macrophage phagolysosome. In contrast, LmaNT3 functions optimally at neutral pH. Two-electrode voltage clamp experiments performed on LmaNT3 and LmaNT4 expressed in Xenopus oocytes revealed substrate-induced inward directed currents at acidic pH, and application of substrates induced acidification of the oocyte cytosol. These observations imply that LmaNT3 and LmaNT4 are nucleobase/proton symporters.

nanotatoR: A tool for enhanced annotation of genomic structural variants

10.1101/2020.08.18.254680 ◽

2020 ◽

Author(s):

Surajit Bhattacharya ◽

Hayk Barseghyan ◽

Emmanuèle C. Délot ◽

Eric Vilain

Keyword(s):

De Novo ◽

Genome Mapping ◽

Gene List ◽

Sufficient Information ◽

Rna Seq ◽

Structural Variants ◽

De Novo Genome Assembly ◽

Pathogenic Variants ◽

Increased Sensitivity

AbstractWhole genome sequencing is effective at identification of small variants but, because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity.We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.).nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.