scholarly journals Elucidating Common Structural Features of Human Pathogenic Variations Using Large-Scale Atomic-Resolution Protein Networks

2014 ◽  
Vol 35 (5) ◽  
pp. 585-593 ◽  
Author(s):  
Jishnu Das ◽  
Hao Ran Lee ◽  
Adithya Sagar ◽  
Robert Fragoza ◽  
Jin Liang ◽  
...  
2020 ◽  
Author(s):  
Mingchen Chen ◽  
Xun Chen ◽  
Shikai Jin ◽  
Wei Lu ◽  
Xingcheng Lin ◽  
...  

1AbstractRecent advances in machine learning, bioinformatics and the understanding of the folding problem have enabled efficient predictions of protein structures with moderate accuracy, even for targets when there is little information from templates. All-atom molecular dynamics simulations provide a route to refine such predicted structures, but unguided atomistic simulations, even when lengthy in time, often fail to eliminate incorrect structural features that would allow the structure to become more energetically favorable owing to the necessity of making large scale motions and overcoming energy barriers for side chain repacking. In this study, we show that localizing packing frustration at atomic resolution by examining the statistics of the energetic changes that occur when the local environment of a site is changed allows one to identify the most likely locations of incorrect contacts. The global statistics of atomic resolution frustration in structures that have been predicted using various algorithms provide strong indicators of structural quality when tested over a database of 20 targets from previous CASP experiments. Residues that are more correctly located turn out to be more minimally frustrated than more poorly positioned sites. These observations provide a diagnosis of both global and local quality of predicted structures, and thus can be used as guidance in all-atom refinement simulations of the 20 targets. Refinement simulations guided by atomic packing frustration turn out to be quite efficient and significantly improve the quality of the structures.


2019 ◽  
Author(s):  
Mohammad Atif Faiz Afzal ◽  
Mojtaba Haghighatlari ◽  
Sai Prasad Ganesh ◽  
Chong Cheng ◽  
Johannes Hachmann

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>


2020 ◽  
Author(s):  
Xinhao Li ◽  
Denis Fourches

<p>Deep neural networks can directly learn from chemical structures without extensive, user-driven selection of descriptors in order to predict molecular properties/activities with high reliability. But these approaches typically require large training sets to learn the endpoint-specific structural features and ensure reasonable prediction accuracy. Even though large datasets are becoming the new normal in drug discovery, especially when it comes to high-throughput screening or metabolomics datasets, one should also consider smaller datasets with challenging endpoints to model and forecast. Thus, it would be highly relevant to better utilize the tremendous compendium of unlabeled compounds from publicly-available datasets for improving the model performances for the user’s particular series of compounds. In this study, we propose the <b>Mol</b>ecular <b>P</b>rediction <b>Mo</b>del <b>Fi</b>ne-<b>T</b>uning (<b>MolPMoFiT</b>) approach, an effective transfer learning method based on self-supervised pre-training + task-specific fine-tuning for QSPR/QSAR modeling. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manner, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with specific endpoints. Herein, the method is evaluated on four benchmark datasets (lipophilicity, FreeSolv, HIV, and blood-brain barrier penetration). The results showed the method can achieve strong performances for all four datasets compared to other state-of-the-art machine learning modeling techniques reported in the literature so far. <br></p>


2017 ◽  
Vol 44 (2) ◽  
pp. 203-229 ◽  
Author(s):  
Javier D Fernández ◽  
Miguel A Martínez-Prieto ◽  
Pablo de la Fuente Redondo ◽  
Claudio Gutiérrez

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.


Metals ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 1604
Author(s):  
Mir Sayed Shah Danish ◽  
Arnab Bhattacharya ◽  
Diana Stepanova ◽  
Alexey Mikhaylov ◽  
Maria Luisa Grilli ◽  
...  

Energy is the fundamental requirement of all physical, chemical, and biological processes which are utilized for better living standards. The toll that the process of development takes on the environment and economic activity is evident from the arising concerns about sustaining the industrialization that has happened in the last centuries. The increase in carbon footprint and the large-scale pollution caused by industrialization has led researchers to think of new ways to sustain the developmental activities, whilst simultaneously minimizing the harming effects on the enviroment. Therefore, decarbonization strategies have become an important factor in industrial expansion, along with the invention of new catalytic methods for carrying out non-thermal reactions, energy storage methods and environmental remediation through the removal or breakdown of harmful chemicals released during manufacturing processes. The present article discusses the structural features and photocatalytic applications of a variety of metal oxide-based materials. Moreover, the practical applicability of these materials is also discussed, as well as the transition of production to an industrial scale. Consequently, this study deals with a concise framework to link metal oxide application options within energy, environmental and economic sustainability, exploring the footprint analysis as well.


2008 ◽  
Vol 375-376 ◽  
pp. 700-704
Author(s):  
Yu Gui Li ◽  
Li Feng Ma ◽  
Qing Xue Huang ◽  
Si Qin Pang

According to revised Cailikefu’s rolling shear force formula, motion path equation of spatial seven-bar path is built, and mechanical model, with such new structural features as asymmetric and negative offset, is thus successfully established for 2800 mm heavy shear of some Iron&Steel Company. Shear force and bar force of steel plate, before and after adoption of asymmetric and negative offset structure, are analyzed, as well as horizontal force component of mechanism that influences pure rolling shear and back-wall push force that keeps blade clearance. The discovery is that back-wall push force could be kept large enough at rolling start-up (i.e. the time that the maximum rolling shear produces), meanwhile, back-wall push force is the most approximate to side forces with adoption of 60mm~100mm offset. Theoretical results and on-site shear quality both indicate that new structural features such as asymmetric and negative offset plays an important role in ensuring pure rolling shear and keeping blade clearance constant, which provide an effective means to improve quality of steel plate.


2019 ◽  
Vol 26 (1) ◽  
pp. 97-124 ◽  
Author(s):  
Anna Sverdlik ◽  
Nathan C Hall

Doctoral students’ well-being and motivation are important factors that are both shaped by, and shape students’ academic experiences in their programs. Existing literature consistently highlights the concerning well-being and maladaptive motivational patterns in doctoral students, but no research to date attempted to explore some of the structural features associated with these wellness and achievement factors in a large-scale study. The present study examined whether doctoral program phase (i.e. coursework, comprehensive examination, or dissertation phase had an effect on 3004 doctoral students’ well-being levels (stress, depression, program satisfaction, and illness symptoms) and motivation (self-determined motivation and self-efficacy). Results revealed doctoral students to report the highest well-being and internal motivation during the coursework phase, while the comprehensive examination phase was found to be the most challenging for most students as indicated by the lowest wellness and motivation scores. A discussion of the present results and their theoretical and practical implications ensues.


Molecules ◽  
2019 ◽  
Vol 24 (1) ◽  
pp. 179 ◽  
Author(s):  
Dariusz Mrozek ◽  
Tomasz Dąbek ◽  
Bożena Małysiak-Mrozek

Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data stored in text files. To perform these operations on large scale in the face of the growing amount of macromolecular data in public repositories, we propose to perform them in the distributed environment of Azure Data Lake and scale the calculations on the Cloud. In this paper, we present dedicated data extractors for PDB files that can be used in various types of calculations performed over protein and nucleic acids structures in the Azure Data Lake. Results of our tests show that the Cloud storage space occupied by the macromolecular data can be successfully reduced by using compression of PDB files without significant loss of data processing efficiency. Moreover, our experiments show that the performed calculations can be significantly accelerated when using large sequential files for storing macromolecular data and by parallelizing the calculations and data extractions that precede them. Finally, the paper shows how all the calculations can be performed in a declarative way in U-SQL scripts for Data Lake Analytics.


2020 ◽  
Vol 498 (2) ◽  
pp. 2196-2218
Author(s):  
David Specht ◽  
Eamonn Kerins ◽  
Supachai Awiphan ◽  
Annie C Robin

ABSTRACT Galactic microlensing datasets now comprise in excess of 104 events and, with the advent of next-generation microlensing surveys that may be undertaken with facilities such as the Rubin Observatory (formerly LSST) and Roman Space Telescope (formerly WFIRST), this number will increase significantly. So too will the fraction of events with measurable higher order information, such as finite-source effects and lens–source relative proper motion. Analysing such data requires a more sophisticated Galactic microlens modelling approach. We present a new second-generation Manchester–Besançon Microlensing Simulator (MaBμlS-2), which uses a version of the Besançon population synthesis Galactic model that provides good agreement with stellar kinematics observed by the Hubble Space Telescope (HST) towards the bulge. MaBμlS-2 provides high-fidelity signal-to-noise limited maps of the microlensing optical depth, rate and average time-scale towards a 400 deg2 region of the Galactic bulge in several optical to near-infrared pass-bands. The maps take full account of the unresolved stellar background, as well as limb-darkened source profiles. Comparing MaBμlS-2 with the efficiency-corrected OGLE-IV 8000 event sample shows a much improved agreement over the previous version of MaBμlS and succeeds in matching even small-scale structural features in the OGLE-IV event rate map. However, evidence remains for a small underprediction of the event rate per source and overprediction of the time-scale. MaBμlS-2 is available online (www.mabuls.net, Specht & Kerins) to provide on-the-fly maps for user-supplied cuts in survey magnitude, event time-scale and relative proper motion.


Author(s):  
Bibhuti Prasad Barik

Animal actin is a diverse and evolutionarily ancient protein. Actin genes and their corresponding protein sequences were used to infer phylogenetic affiliations. The study indicated that several species appear to be polyphyletic and several unrelated species appear to share the same clade. Consensus actin RNA secondary structures showed that the structural features of all forms were quite distinct and different from each other. This observation supports the phylogenetic inference in which similarly named species clustered together based on their lifestyles. Consideration of actin gene geneology and consensus RNA secondary structures could be used as a possible phylogenetic marker among diverse species of the animal kingdom for large scale data analysis. In-silico study revealed variations among the groups. The percentages of long disordered regions in proteins were found to be very high in all forms. Such findings suggest that the complexity and ability to adapt in diverse habitats by species may be due to higher percentage of disordered proteins.


Sign in / Sign up

Export Citation Format

Share Document