scholarly journals IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds

2021 ◽  
Vol 22 (23) ◽  
pp. 13066
Author(s):  
Viviana Quevedo-Tumailli ◽  
Bernabe Ortega-Tenezaca ◽  
Humberto González-Díaz

The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information - Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj= cajand cdataj= cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj= cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon’s entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium.

Metals ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 2000
Author(s):  
Marcelo Roldán ◽  
Fernando José Sánchez ◽  
Pilar Fernández ◽  
Christophe J. Ortiz ◽  
Adrián Gómez-Herrero ◽  
...  

In the present investigation, high-energy self-ion irradiation experiments (20 MeV Fe+4) were performed on two types of pure Fe samples to evaluate the formation of dislocation loops as a function of material volume. The choice of model material, namely EFDA pure Fe, was made to emulate experiments simulated with computational models that study defect evolution. The experimental conditions were an ion fluence of 4.25 and 8.5 × 1015 ions/cm2 and an irradiation temperature of 350 and 450 °C, respectively. First, the ions pass through the samples, which are thin films of less than 100 nm. With this procedure, the formation of the accumulated damage zone, which is the peak where the ions stop, and the injection of interstitials are prevented. As a result, the effect of two free surfaces on defect formation can be studied. In the second type of experiments, the same irradiations were performed on bulk samples to compare the creation of defects in the first 100 nm depth with the microstructure found in the whole thickness of the thin films. Apparent differences were found between the thin foil irradiation and the first 100 nm in bulk specimens in terms of dislocation loops, even with a similar primary knock-on atom (PKA) spectrum. In thin films, the most loops identified in all four experimental conditions were b ±a0<100>{200} type with sizes of hundreds of nm depending on the experimental conditions, similarly to bulk samples where practically no defects were detected. These important results would help validate computational simulations about the evolution of defects in alpha iron thin films irradiated with energetic ions at large doses, which would predict the dislocation nucleation and growth.


2020 ◽  
Author(s):  
Gabriel Wright ◽  
Anabel Rodriguez ◽  
Jun Li ◽  
Patricia L. Clark ◽  
Tijana Milenković ◽  
...  

AbstractImproved computational modeling of protein translation rates, including better prediction of where translational slowdowns along an mRNA sequence may occur, is critical for understanding co-translational folding. Because codons within a synonymous codon group are translated at different rates, many computational translation models rely on analyzing synonymous codons. Some models rely on genome-wide codon usage bias (CUB), believing that globally rare and common codons are the most informative of slow and fast translation, respectively. Others use the CUB observed only in highly expressed genes, which should be under selective pressure to be translated efficiently (and whose CUB may therefore be more indicative of translation rates). No prior work has analyzed these models for their ability to predict translational slowdowns. Here, we evaluate five models for their association with slowly translated positions as denoted by two independent ribosome footprint (RFP) count experiments from S. cerevisiae, because RFP data is often considered as a “ground truth” for translation rates across mRNA sequences. We show that all five considered models strongly associate with the RFP data and therefore have potential for estimating translational slowdowns. However, we also show that there is a weak correlation between RFP counts for the same genes originating from independent experiments, even when their experimental conditions are similar. This raises concerns about the efficacy of using current RFP experimental data for estimating translation rates and highlights a potential advantage of using computational models to understand translation rates instead.


2008 ◽  
pp. 1643-1673
Author(s):  
Jilin Han ◽  
Le Gruenwald ◽  
Tyrrell Conway

The study of gene expression levels under defined experimental conditions is an important approach to understand how a living cell works. High-throughput microarray technology is a very powerful tool for simultaneously studying thousands of genes in a single experiment. This revolutionary technology results in an extensive amount of data, which raises an important question: how to extract meaningful biological information from these data? In this chapter, we survey data mining techniques that have been used for clustering, classification and association rules for gene expression data analysis. In addition, we provide a comprehensive list of currently available commercial and academic data mining software together with their features. Lastly, we suggest future research directions.


2019 ◽  
Vol 374 (1774) ◽  
pp. 20180370 ◽  
Author(s):  
Salva Duran-Nebreda ◽  
George W. Bassel

Information processing and storage underpins many biological processes of vital importance to organism survival. Like animals, plants also acquire, store and process environmental information relevant to their fitness, and this is particularly evident in their decision-making. The control of plant organ growth and timing of their developmental transitions are carefully orchestrated by the collective action of many connected computing agents, the cells, in what could be addressed as distributed computation. Here, we discuss some examples of biological information processing in plants, with special interest in the connection to formal computational models drawn from theoretical frameworks. Research into biological processes with a computational perspective may yield new insights and provide a general framework for information processing across different substrates.This article is part of the theme issue ‘Liquid brains, solid brains: How distributed cognitive architectures process information’.


2019 ◽  
Vol 47 (16) ◽  
pp. e95-e95 ◽  
Author(s):  
Jurrian K de Kanter ◽  
Philip Lijnzaad ◽  
Tito Candelli ◽  
Thanasis Margaritis ◽  
Frank C P Holstege

Abstract Cell type identification is essential for single-cell RNA sequencing (scRNA-seq) studies, currently transforming the life sciences. CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate cell type identification algorithm that is rapid and selective, including the possibility of intermediate or unassigned categories. Evidence for assignment is based on a classification tree of previously available scRNA-seq reference data and includes a confidence score based on the variance in gene expression per cell type. For cell types represented in the reference data, CHETAH’s accuracy is as good as existing methods. Its specificity is superior when cells of an unknown type are encountered, such as malignant cells in tumor samples which it pinpoints as intermediate or unassigned. Although designed for tumor samples in particular, the use of unassigned and intermediate types is also valuable in other exploratory studies. This is exemplified in pancreas datasets where CHETAH highlights cell populations not well represented in the reference dataset, including cells with profiles that lie on a continuum between that of acinar and ductal cell types. Having the possibility of unassigned and intermediate cell types is pivotal for preventing misclassification and can yield important biological information for previously unexplored tissues.


2017 ◽  
Vol 13 (4-2) ◽  
pp. 546-552 ◽  
Author(s):  
Hasan Basri ◽  
Jimmy Deswidawansyah Nasution ◽  
Ardiyansyah Syahrom ◽  
Mohd Ayub Sulong ◽  
Amir Putra Md. Saad ◽  
...  

This paper proposes an improved modeling approach for bone scaffolds biodegradation. In this study, the numerical analysis procedure and computer-based simulation were performed for the bone scaffolds with varying porosities in determining the wall shear stresses and the permeabilities along with their influences on the scaffolds biodegradation process while the bio-fluids flow through within followed with the change in the flow rates. Based on the experimental study by immersion testing from 0 to 72 hours of the time period, the specimens with different morphologies of the commercial bone scaffolds were collected into three groups samples of 30%, 41%, and 55% porosities. As the representative of the cancellous bone morphology, the morphological degradation was observed by using 3-D CAD scaffold models based on microcomputed tomography images. By applying the boundary conditions to the computational fluid dynamics (CFD) and the fluid-structure interaction (FSI) models, the wall shear stresses within the scaffolds due to fluid flow rates variation had been simulated and determined before and after degradation. The increase of fluid flow rates tends to raise the pressure drop for scaffold models with porosities lower than 50% before degradation. As the porosities increases, the pressure drop decreases with an increase in permeability within the scaffold. The flow rates have significant effects on scaffolds with higher pressure drops by introducing the wall shear stresses with the highest values and lower permeability. These findings indicate the importance of using accurate computational models to estimate shear stress and determine experimental conditions in perfusion bioreactors for tissue engineering more accurate results will be achieved to indicate the natural distributions of fluid flow velocity, wall shear stress, and pressure.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Zhen-Hao Guo ◽  
Zhu-Hong You ◽  
Yan-Bin Wang ◽  
De-Shuang Huang ◽  
Hai-Cheng Yi ◽  
...  

Abstract Background The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems. Results We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572. Conclusions Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.


Author(s):  
Jilin Han ◽  
Le Gruenwald ◽  
Tyrrell Conway

The study of gene expression levels under defined experimental conditions is an important approach to understand how a living cell works. High-throughput microarray technology is a very powerful tool for simultaneously studying thousands of genes in a single experiment. This revolutionary technology results in an extensive amount of data, which raises an important question: how to extract meaningful biological information from these data? In this chapter, we survey data mining techniques that have been used for clustering, classification and association rules for gene expression data analysis. In addition, we provide a comprehensive list of currently available commercial and academic data mining software together with their features. Lastly, we suggest future research directions.


2017 ◽  
Vol 14 (131) ◽  
pp. 20170150 ◽  
Author(s):  
Anna Konstorum ◽  
Anthony T. Vella ◽  
Adam J. Adler ◽  
Reinhard C. Laubenbacher

The goal of cancer immunotherapy is to boost a patient's immune response to a tumour. Yet, the design of an effective immunotherapy is complicated by various factors, including a potentially immunosuppressive tumour microenvironment, immune-modulating effects of conventional treatments and therapy-related toxicities. These complexities can be incorporated into mathematical and computational models of cancer immunotherapy that can then be used to aid in rational therapy design. In this review, we survey modelling approaches under the umbrella of the major challenges facing immunotherapy development, which encompass tumour classification, optimal treatment scheduling and combination therapy design. Although overlapping, each challenge has presented unique opportunities for modellers to make contributions using analytical and numerical analysis of model outcomes, as well as optimization algorithms. We discuss several examples of models that have grown in complexity as more biological information has become available, showcasing how model development is a dynamic process interlinked with the rapid advances in tumour–immune biology. We conclude the review with recommendations for modellers both with respect to methodology and biological direction that might help keep modellers at the forefront of cancer immunotherapy development.


2021 ◽  
Author(s):  
Fuyu Hu ◽  
Chunping Ouyang ◽  
Yongbin Liu ◽  
Zheng Gao ◽  
Yaping Wan

Abstract Background: Predicting interactions between drugs and target proteins is a key task in drug discovery. Although the method of validation via wet-lab experiments has become available, experimental methods for drug-target interactions (DTIs) identification remain either time consuming or heavily dependent on domain expertise. Therefore, various computational models have been proposed to predict possible interactions between drugs and target proteins. Usually, we construct a heterogeneous network with drugs and target proteins to calculate the relationship between them. However, most calculation methods do not consider the topological structure of the relationship between drugs and target proteins. Fortunately, Network Embedding Learning provides new and powerful graph analytical approaches for predicting drug-target interaction, which is considering both content and topology of network.Results: In this article, we propose a relational topology-based heterogeneous network embedding method to predict DITs, abbreviated as RTHNE_DTI. We use the ideas of word embeddings to turn heterogeneous network with drugs and target proteins into dense, low-dimensional real-valued vectors. Furthermore, according to two different topological structure of the relationship between the nodes, we represent them separately by training two different models. Then the meaningful vectors represented for drugs and target proteins can be used to calculate the interaction of them easily. Results show that by considering topological structure and different relationship type of drugs and target proteins, RTHNE_DTI outperforms other state-of-the-art methods on both labeled network and unlabeled network.Conclusions: This work proposes heterogeneous network representation learning for DITs prediction. To the best of our knowledge, this study first introduces relation classification to heterogeneous network embedding to improve predicting DTIs efficiently.


Sign in / Sign up

Export Citation Format

Share Document