Analysing omics data sets with weighted nodes networks (WNNets)

AbstractCurrent trends in biomedical research indicate data integration as a fundamental step towards precision medicine. In this context, network models allow representing and analysing complex biological processes. However, although effective in unveiling network properties, these models fail in considering the individual, biochemical variations occurring at molecular level. As a consequence, the analysis of these models partially loses its predictive power. To overcome these limitations, Weighted Nodes Networks (WNNets) were developed. WNNets allow to easily and effectively weigh nodes using experimental information from multiple conditions. In this study, the characteristics of WNNets were described and a proteomics data set was modelled and analysed. Results suggested that degree, an established centrality index, may offer a novel perspective about the functional role of nodes in WNNets. Indeed, degree allowed retrieving significant differences between experimental conditions, highlighting relevant proteins, and provided a novel interpretation for degree itself, opening new perspectives in experimental data modelling and analysis. Overall, WNNets may be used to model any high-throughput experimental data set requiring weighted nodes. Finally, improving the power of the analysis by using centralities such as betweenness may provide further biological insights and unveil novel, interesting characteristics of WNNets.

Download Full-text

Determination of Reactivity Ratios from Binary Copolymerization Using the k-Nearest Neighbor Non-Parametric Regression

Polymers ◽

10.3390/polym13213811 ◽

2021 ◽

Vol 13 (21) ◽

pp. 3811

Author(s):

Iosif Sorin Fazakas-Anca ◽

Arina Modrea ◽

Sorin Vlase

Keyword(s):

Experimental Data ◽

Nearest Neighbor ◽

Optimization Method ◽

Reactivity Ratios ◽

Data Sets ◽

K Nearest Neighbor ◽

Integration Algorithm ◽

Data Set ◽

Parametric Regression ◽

Non Parametric

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.

Download Full-text

Inferring parameters for a lattice-free model of cell migration and proliferation using experimental data

10.1101/186197 ◽

2017 ◽

Author(s):

Alexander P. Browning ◽

Scott W. McCue ◽

Rachelle N. Binny ◽

Michael J. Plank ◽

Esha T. Shah ◽

...

Keyword(s):

Experimental Data ◽

Cell Migration ◽

Spatial Clustering ◽

Cancer Cell Line ◽

Movement Direction ◽

Data Sets ◽

Collective Cell Migration ◽

Rejection Sampling ◽

Data Set ◽

Cell Migration And Proliferation

AbstractCollective cell spreading takes place in spatially continuous environments, yet it is often modelled using discrete lattice-based approaches. Here, we use data from a series of cell proliferation assays, with a prostate cancer cell line, to calibrate a spatially continuous individual based model (IBM) of collective cell migration and proliferation. The IBM explicitly accounts for crowding effects by modifying the rate of movement, direction of movement, and the rate of proliferation by accounting for pair-wise interactions. Taking a Bayesian approach we estimate the free parameters in the IBM using rejection sampling on three separate, independent experimental data sets. Since the posterior distributions for each experiment are similar, we perform simulations with parameters sampled from a new posterior distribution generated by combining the three data sets. To explore the predictive power of the calibrated IBM, we forecast the evolution of a fourth experimental data set. Overall, we show how to calibrate a lattice-free IBM to experimental data, and our work highlights the importance of interactions between individuals. Despite great care taken to distribute cells as uniformly as possible experimentally, we find evidence of significant spatial clustering over short distances, suggesting that standard mean-field models could be inappropriate.

Download Full-text

The GTC exoplanet transit spectroscopy survey

Astronomy and Astrophysics ◽

10.1051/0004-6361/201834063 ◽

2019 ◽

Vol 622 ◽

pp. A172 ◽

Cited By ~ 7

Author(s):

F. Murgas ◽

G. Chen ◽

E. Pallé ◽

L. Nortmann ◽

G. Nowak

Keyword(s):

Transmission Spectrum ◽

Rayleigh Scattering ◽

Extrasolar Planets ◽

Light Curves ◽

Data Sets ◽

Transmission Spectra ◽

Data Set ◽

Systematic Effects ◽

The Individual ◽

Spectrum Slope

Context. Rayleigh scattering in a hydrogen-dominated exoplanet atmosphere can be detected using ground- or space-based telescopes. However, stellar activity in the form of spots can mimic Rayleigh scattering in the observed transmission spectrum. Quantifying this phenomena is key to our correct interpretation of exoplanet atmospheric properties. Aims. We use the ten-meter Gran Telescopio Canarias (GTC) telescope to carry out a ground-based transmission spectra survey of extrasolar planets to characterize their atmospheres. In this paper we investigate the exoplanet HAT-P-11b, a Neptune-sized planet orbiting an active K-type star. Methods. We obtained long-slit optical spectroscopy of two transits of HAT-P-11b with the Optical System for Imaging and low-Intermediate-Resolution Integrated Spectroscopy (OSIRIS) on August 30, 2016 and September 25, 2017. We integrated the spectrum of HAT-P-11 and one reference star in several spectroscopic channels across the λ ~ 400–785 nm region, creating numerous light curves of the transits. We fit analytic transit curves to the data taking into account the systematic effects and red noise present in the time series in an effort to measure the change of the planet-to-star radius ratio (Rp∕Rs) across wavelength. Results. By fitting both transits together, we find a slope in the transmission spectrum showing an increase of the planetary radius towards blue wavelengths. Closer inspection of the transmission spectrum of the individual data sets reveals that the first transit presents this slope while the transmission spectrum of the second data set is flat. Additionally, we detect hints of Na absorption on the first night, but not on the second. We conclude that the transmission spectrum slope and Na absorption excess found in the first transit observation are caused by unocculted stellar spots. Modeling the contribution of unocculted spots to reproduce the results of the first night we find a spot filling factor of δ = 0.62−0.17+0.20 and a spot-to-photosphere temperature difference of ΔT = 429−299+184 K.

Download Full-text

A classification approach based on variable precision rough sets and cluster validity index function

Engineering Computations ◽

10.1108/ec-11-2012-0297 ◽

2014 ◽

Vol 31 (8) ◽

pp. 1778-1789

Author(s):

Hongkang Lin

Keyword(s):

Optimal Number ◽

Data Sets ◽

Cluster Validity ◽

Cluster Validity Index ◽

Index Method ◽

Data Set ◽

Content Type ◽

The Individual ◽

Variable Precision Rough Sets ◽

Optimal Number Of Clusters

Purpose – The clustering/classification method proposed in this study, designated as the PFV-index method, provides the means to solve the following problems for a data set characterized by imprecision and uncertainty: first, discretizing the continuous values of all the individual attributes within a data set; second, evaluating the optimality of the discretization results; third, determining the optimal number of clusters per attribute; and fourth, improving the classification accuracy (CA) of data sets characterized by uncertainty. The paper aims to discuss these issues. Design/methodology/approach – The proposed method for the solution of the clustering/classifying problem, designated as PFV-index method, combines a particle swarm optimization algorithm, fuzzy C-means method, variable precision rough sets theory, and a new cluster validity index function. Findings – This method could cluster the values of the individual attributes within the data set and achieves both the optimal number of clusters and the optimal CA. Originality/value – The validity of the proposed approach is investigated by comparing the classification results obtained for UCI data sets with those obtained by supervised classification BPNN, decision-tree methods.

Download Full-text

Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

Cancer Informatics ◽

10.1177/1176935118771082 ◽

2018 ◽

Vol 17 ◽

pp. 117693511877108 ◽

Cited By ~ 4

Author(s):

Min Wang ◽

Steven M Kornblau ◽

Kevin R Coombes

Keyword(s):

Principal Components ◽

Myeloid Leukemia ◽

Principal Component ◽

R Package ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Data Set ◽

Apoptosis Pathway ◽

Biological Interpretation

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.

Download Full-text

An artificial-intelligence technique for qualitatively deriving enzyme kinetic mechanisms from initial-velocity measurements and its application to hexokinase

Biochemical Journal ◽

10.1042/bj2640175 ◽

1989 ◽

Vol 264 (1) ◽

pp. 175-184 ◽

Cited By ~ 2

Author(s):

L Garfinkel ◽

D M Cohen ◽

V W Soo ◽

D Garfinkel ◽

C A Kulikowski

Keyword(s):

Artificial Intelligence ◽

Experimental Data ◽

Ionic Strength ◽

Initial Velocity ◽

Large Data ◽

Data Sets ◽

Enzyme Kinetic ◽

Experimental Conditions ◽

Enzyme Preparations ◽

Kinetic Mechanisms

We have developed a computer method based on artificial-intelligence techniques for qualitatively analysing steady-state initial-velocity enzyme kinetic data. We have applied our system to experiments on hexokinase from a variety of sources: yeast, ascites and muscle. Our system accepts qualitative stylized descriptions of experimental data, infers constraints from the observed data behaviour and then compares the experimentally inferred constraints with corresponding theoretical model-based constraints. It is desirable to have large data sets which include the results of a variety of experiments. Human intervention is needed to interpret non-kinetic information, differences in conditions, etc. Different strategies were used by the several experimenters whose data was studied to formulate mechanisms for their enzyme preparations, including different methods (product inhibitors or alternate substrates), different experimental protocols (monitoring enzyme activity differently), or different experimental conditions (temperature, pH or ionic strength). The different ordered and rapid-equilibrium mechanisms proposed by these experimenters were generally consistent with their data. On comparing the constraints derived from the several experimental data sets, they are found to be in much less disagreement than the mechanisms published, and some of the disagreement can be ascribed to different experimental conditions (especially ionic strength).

Download Full-text

An Improved Classification Analysis on Utility Aware K-Anonymized Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.7748 ◽

2019 ◽

Vol 16 (2) ◽

pp. 445-452

Author(s):

Kishore S. Verma ◽

A. Rajesh ◽

Adeline J. S. Johnsana

Keyword(s):

Data Mining ◽

Analytical Approach ◽

Value Added ◽

Data Sets ◽

Data Set ◽

Privacy Preserving Data Mining ◽

Privacy Leakage ◽

Anonymized Data ◽

Null Values ◽

The Individual

K anonymization is one of the worldwide used approaches to protect the individual records from the privacy leakage attack of Privacy Preserving Data Mining (PPDM) arena. Typically anonymized dataset will impact the effectiveness of data mining results. Anyhow, currently researchers of PPDM progress in driving their efforts in finding out the optimum trade-off between privacy and utility. This work tends in bringing out the optimum classifier from a set of best classifiers of data mining approaches that are capable enough in generating value-added classifying results on utility aware k-anonymized data set. We performed the analytical approach on the data set that are anonymized in sense of accompanying the anonymity utility factors like null values count and transformation pattern loss. The experimentation is done with three widely used classifiers HNB, PART and J48 and these classifiers are analysed with Accuracy, F-measure, and ROC-AUC which are literately proved to be the perfect measures of classification. Our experimental analysis reveals the best classifiers on the utility aware anonymized data sets of Cell oriented Anonymization (CoA), Attribute oriented Anonymization (AoA) and Record oriented Anonymization (RoA).

Download Full-text

An Interpretable Computer-Aided Diagnosis Method for Periodontitis From Panoramic Radiographs

Frontiers in Physiology ◽

10.3389/fphys.2021.655556 ◽

2021 ◽

Vol 12 ◽

Author(s):

Haoyang Li ◽

Juexiao Zhou ◽

Yi Zhou ◽

Qiang Chen ◽

Yangyang She ◽

...

Keyword(s):

Computational Models ◽

Alveolar Bone ◽

Calibration Method ◽

Disease Diagnosis ◽

Chronic Inflammatory Disease ◽

Data Sets ◽

Data Set ◽

Panoramic Radiographs ◽

The Individual ◽

Severity Degree

Periodontitis is a prevalent and irreversible chronic inflammatory disease both in developed and developing countries, and affects about 20–50% of the global population. The tool for automatically diagnosing periodontitis is highly demanded to screen at-risk people for periodontitis and its early detection could prevent the onset of tooth loss, especially in local communities and health care settings with limited dental professionals. In the medical field, doctors need to understand and trust the decisions made by computational models and developing interpretable models is crucial for disease diagnosis. Based on these considerations, we propose an interpretable method called Deetal-Perio to predict the severity degree of periodontitis in dental panoramic radiographs. In our method, alveolar bone loss (ABL), the clinical hallmark for periodontitis diagnosis, could be interpreted as the key feature. To calculate ABL, we also propose a method for teeth numbering and segmentation. First, Deetal-Perio segments and indexes the individual tooth via Mask R-CNN combined with a novel calibration method. Next, Deetal-Perio segments the contour of the alveolar bone and calculates a ratio for individual tooth to represent ABL. Finally, Deetal-Perio predicts the severity degree of periodontitis given the ratios of all the teeth. The Macro F1-score and accuracy of the periodontitis prediction task in our method reach 0.894 and 0.896, respectively, on Suzhou data set, and 0.820 and 0.824, respectively on Zhongshan data set. The entire architecture could not only outperform state-of-the-art methods and show robustness on two data sets in both periodontitis prediction, and teeth numbering and segmentation tasks, but also be interpretable for doctors to understand the reason why Deetal-Perio works so well.

Download Full-text

The importance of the accuracy of the experimental data for the prediction of solubility

Journal of the Serbian Chemical Society ◽

10.2298/jsc090809022e ◽

2010 ◽

Vol 75 (4) ◽

pp. 483-495 ◽

Cited By ~ 2

Author(s):

Slavica Eric ◽

Marko Kalinic ◽

Aleksandar Popovic ◽

Halid Makic ◽

Elvisa Civic ◽

...

Keyword(s):

Experimental Data ◽

Computational Models ◽

Linear Regression Analysis ◽

Heuristic Method ◽

Parameter Analysis ◽

Solubility Data ◽

Data Sets ◽

Data Set ◽

Experimental Solubility Data ◽

Experimental Solubility

Aqueous solubility is an important factor influencing several aspects of the pharmacokinetic profile of a drug. Numerous publications present different methodologies for the development of reliable computational models for the prediction of solubility from structure. The quality of such models can be significantly affected by the accuracy of the employed experimental solubility data. In this work, the importance of the accuracy of the experimental solubility data used for model training was investigated. Three data sets were used as training sets - Data Set 1 containing solubility data collected from various literature sources using a few criteria (n = 319), Data Set 2 created by substituting 28 values from Data set 1 with uniformly determined experimental data from one laboratory (n = 319) and Data Set 3 created by including 56 additional components, for which the solubility was also determined under uniform conditions in the same laboratory, in the Data Set 2 (n = 375). The selection of the most significant descriptors was performed by the heuristic method, using one-parameter and multi-parameter analysis. The correlations between the most significant descriptors and solubility were established using multi-linear regression analysis (MLR) for all three investigated data sets. Notable differences were observed between the equations corresponding to different data sets, suggesting that models updated with new experimental data need to be additionally optimized. It was successfully shown that the inclusion of uniform experimental data consistently leads to an improvement in the correlation coefficients. These findings contribute to an emerging consensus that improving the reliability of solubility prediction requires the inclusion of many diverse compounds for which solubility was measured under standardized conditions in the data set.

Download Full-text

An application of fuzzy linear modeling: prediction of uncertainty for beta-glucan content

An International Journal of Optimization and Control Theories & Applications (IJOCTA) ◽

10.11121/ijocta.01.2019.00664 ◽

2019 ◽

Vol 9 (3) ◽

pp. 45-51

Author(s):

Özlem Türkşen ◽

Suna Ertunç

Keyword(s):

Experimental Data ◽

Data Sets ◽

Linear Modeling ◽

Positive Health ◽

Beta Glucan ◽

Data Set ◽

Growth Step ◽

Fuzzy Function ◽

Optimal Values ◽

Additive Materials

Beta-glucan (BG) has positive health effects for the mamalians. However, the BG sources have limited content of it. Besides, the production of the BG has stringent procedures with low productivity. Economical production of the BG needs the improvement of the BG production steps. In this study, it is aimed to improve the BG content during the first step of the BG production, microorganism growth step, by obtaining the optimal values of additive materials (EDTA, CaCl2 and Sorbitol). For this purpose, the experimental data sets with replicated response measures (RRM) are obtained at spesific levels of EDTA, CaCl2 and Sorbitol. Fuzzy modeling, a flexible modeling approach, is applied on the experimental data set because of the small sized data set and diffulty of satisfying probabilistic modeling assumptions. The predicted fuzzy function is obtained according to the fuzzy least squares approach. In order to get the optimal values of EDTA, CaCl2 and Sorbitol, the predicted fuzzy function is maximized based on multi-objective optimization (MOO) approach. By using the optimal values of EDTA, CaCl2 and Sorbitol, the uncertainty for predicted BG content is evaluated from the economic perspective.

Download Full-text