Estimating chain lengths for time delays in dynamical systems using profile likelihood

Inferring cellular heterogeneity of associations from single cell genomics

Bioinformatics ◽

10.1093/bioinformatics/btaa151 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3466-3473

Author(s):

Maya Levy ◽

Amit Frishberg ◽

Irit Gat-Viks

Keyword(s):

Simulated Data ◽

R Package ◽

Biological Data ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Dynamic Changes ◽

Entire Cell ◽

Complete Set ◽

Cellular Phenotypes ◽

Cell Variation

Abstract Motivation Cell-to-cell variation has uncovered associations between cellular phenotypes. However, it remains challenging to address the cellular diversity of such associations. Results Here, we do not rely on the conventional assumption that the same association holds throughout the entire cell population. Instead, we assume that associations may exist in a certain subset of the cells. We developed CEllular Niche Association (CENA) to reliably predict pairwise associations together with the cell subsets in which the associations are detected. CENA does not rely on predefined subsets but only requires that the cells of each predicted subset would share a certain characteristic state. CENA may therefore reveal dynamic modulation of dependencies along cellular trajectories of temporally evolving states. Using simulated data, we show the advantage of CENA over existing methods and its scalability to a large number of cells. Application of CENA to real biological data demonstrates dynamic changes in associations that would be otherwise masked. Availability and implementation CENA is available as an R package at Github: https://github.com/mayalevy/CENA and is accompanied by a complete set of documentations and instructions. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

LiPLike: towards gene regulatory network predictions of high certainty

Bioinformatics ◽

10.1093/bioinformatics/btz950 ◽

2020 ◽

Vol 36 (8) ◽

pp. 2522-2529

Author(s):

Rasmus Magnusson ◽

Mika Gustafsson

Keyword(s):

Gene Regulation ◽

Reverse Engineering ◽

False Positive ◽

Profile Likelihood ◽

Expression Patterns ◽

Regulatory Elements ◽

Biological Data ◽

Supplementary Information ◽

High Confidence ◽

Gene Regulatory

Abstract Motivation High correlation in expression between regulatory elements is a persistent obstacle for the reverse-engineering of gene regulatory networks. If two potential regulators have matching expression patterns, it becomes challenging to differentiate between them, thus increasing the risk of false positive identifications. Results To allow for gene regulation predictions of high confidence, we propose a novel method, the Linear Profile Likelihood (LiPLike), that assumes a regression model and iteratively searches for interactions that cannot be replaced by a linear combination of other predictors. To compare the performance of LiPLike with other available inference methods, we benchmarked LiPLike using three independent datasets from the Dialogue on Reverse Engineering Assessment and Methods 5 (DREAM5) network inference challenge. We found that LiPLike could be used to stratify predictions of other inference tools, and when applied to the predictions of DREAM5 participants, we observed an average improvement in accuracy of >140% compared to individual methods. Furthermore, LiPLike was able to independently predict networks better than all DREAM5 participants when applied to biological data. When predicting the Escherichia coli network, LiPLike had an accuracy of 0.38 for the top-ranked 100 interactions, whereas the corresponding DREAM5 consensus model yielded an accuracy of 0.11. Availability and implementation We made LiPLike available to the community as a Python toolbox, available at https://gitlab.com/Gustafsson-lab/liplike. We believe that LiPLike will be used for high confidence predictions in studies where individual model interactions are of high importance, and to remove false positive predictions made by other state-of-the-art gene–gene regulation prediction tools. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Joint detection of germline and somatic copy number events in matched tumor–normal sample pairs

Bioinformatics ◽

10.1093/bioinformatics/btz429 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4955-4961

Author(s):

Yongzhuang Liu ◽

Jian Liu ◽

Yadong Wang

Keyword(s):

Copy Number ◽

Simulated Data ◽

Real Data ◽

Copy Number Variations ◽

Superior Performance ◽

Supplementary Information ◽

Normal Sample ◽

Joint Detection ◽

Novel Approach ◽

Powerful Approach

Abstract Motivation Whole-genome sequencing (WGS) of tumor–normal sample pairs is a powerful approach for comprehensively characterizing germline copy number variations (CNVs) and somatic copy number alterations (SCNAs) in cancer research and clinical practice. Existing computational approaches for detecting copy number events cannot detect germline CNVs and SCNAs simultaneously, and yield low accuracy for SCNAs. Results In this study, we developed TumorCNV, a novel approach for jointly detecting germline CNVs and SCNAs from WGS data of the matched tumor–normal sample pair. We compared TumorCNV with existing copy number event detection approaches using the simulated data and real data for the COLO-829 melanoma cell line. The experimental results showed that TumorCNV achieved superior performance than existing approaches. Availability and implementation The software TumorCNV is implemented using a combination of Java and R, and it is freely available from the website at https://github.com/yongzhuang/TumorCNV. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Estimation of dynamic SNP-heritability with Bayesian Gaussian process models

Bioinformatics ◽

10.1093/bioinformatics/btaa199 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3795-3802

Author(s):

Arttu Arjas ◽

Andreas Hauptmann ◽

Mikko J Sillanpää

Keyword(s):

Gaussian Process ◽

Variance Components ◽

Simulated Data ◽

Joint Estimation ◽

Process Models ◽

Random Regression ◽

Superior Performance ◽

Supplementary Information ◽

Time Points ◽

Heritability Estimation

Abstract Motivation Improved DNA technology has made it practical to estimate single-nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth- and development-related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty. Results We introduce a completely tuning-free Bayesian Gaussian process (GP)-based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo method which allows full uncertainty quantification. Several datasets are analysed and our results clearly illustrate that the 95% credible intervals of the proposed joint estimation method (which ‘borrows strength’ from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 software and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals. Availability and implementation The C++ implementation dynBGP and simulated data are available in GitHub: https://github.com/aarjas/dynBGP. The programmes can be run in R. Real datasets are available in QTL archive: https://phenome.jax.org/centers/QTLA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Mechanisms of Drug Solubilization by Polar Lipids in Biorelevant Media

10.26434/chemrxiv.12649145.v1 ◽

2020 ◽

Author(s):

Vladimir Katev ◽

Zahari Vinarov ◽

Slavka S. Tcholakova

Keyword(s):

Chain Length ◽

Acyl Chain ◽

Polar Lipids ◽

Superior Performance ◽

Head Group ◽

Formulation Development ◽

Biorelevant Media ◽

Drug Solubilization ◽

Colloidal Aggregates ◽

Class 1

Despite the widespread use of lipid excipients in both academic research and oral formulation development, rational selection guidelines are still missing. In the current study, we aimed to establish a link between the molecular structure of commonly used polar lipids and drug solubilization in biorelevant media. We studied the effect of 26 polar lipids of the fatty acid, phospholipid or monoglyceride type on the solubilization of fenofibrate in a two-stage in vitro GI tract model. The main trends were checked also with progesterone and danazol. Based on their fenofibrate solubilization efficiency, the polar lipids can be grouped in 3 main classes. Class 1 substances (n = 5) provide biggest enhancement of drug solubilization (>10-fold) and are composed only by unsaturated compounds. Class 2 materials (n = 10) have an intermediate effect (3-10 fold increase) and are composed primarily (80 %) of saturated compounds. Class 3 materials (n = 11) have very low or no effect on drug solubilization and are entirely composed of saturated compounds. The observed behaviour of the polar lipids was rationalized by using two classical physicochemical parameters: the acyl chain phase transition temperature (Tm) and the critical micellar concentration (CMC). Hence, the superior performance of class 1 polar lipids was explained by the double bonds in their acyl chains, which: (1) significantly decrease Tm, allowing these C18 lipids to form colloidal aggregates and (2) prevent tight packing of the molecules in the aggregates, resulting in bigger volume available for drug solubilization. Long-chain (C18) saturated polar lipids had no significant effect on drug solubilization because their Tm was much higher than the temperature of the experiment (T = 37 C) and, therefore, their association in colloidal aggregates was limited. On the other end of the spectrum, the short chain octanoic acid manifested a high CMC (50 mM), which had to be exceeded in order to enhance drug solubilization. When these two parameters were satisfied (C > CMC, Tm < Texp), the increase of the polar lipid chain length increased the drug solubilization capacity (similarly to classical surfactants), due to the decreased CMC and bigger volume available for solubilization. The hydrophilic head group also has a dramatic impact on the drug solubilization enhancement, with polar lipids performance decreasing in the order: choline phospholipids > monoglycerides > fatty acids. As both the acyl chain length and the head group type are structural features of the polar lipids, and not of the solubilized drugs, the impact of Tm and CMC on solubilization by polar lipids should hold true for a wide variety of hydrophobic molecules. The obtained mechanistic insights can guide rational drug formulation development and thus support modern drug discovery pipelines.

Download Full-text

Mechanisms of Drug Solubilization by Polar Lipids in Biorelevant Media

10.26434/chemrxiv.12649145 ◽

2020 ◽

Author(s):

Vladimir Katev ◽

Zahari Vinarov ◽

Slavka S. Tcholakova

Keyword(s):

Chain Length ◽

Acyl Chain ◽

Polar Lipids ◽

Superior Performance ◽

Head Group ◽

Formulation Development ◽

Biorelevant Media ◽

Drug Solubilization ◽

Colloidal Aggregates ◽

Class 1

Despite the widespread use of lipid excipients in both academic research and oral formulation development, rational selection guidelines are still missing. In the current study, we aimed to establish a link between the molecular structure of commonly used polar lipids and drug solubilization in biorelevant media. We studied the effect of 26 polar lipids of the fatty acid, phospholipid or monoglyceride type on the solubilization of fenofibrate in a two-stage in vitro GI tract model. The main trends were checked also with progesterone and danazol. Based on their fenofibrate solubilization efficiency, the polar lipids can be grouped in 3 main classes. Class 1 substances (n = 5) provide biggest enhancement of drug solubilization (>10-fold) and are composed only by unsaturated compounds. Class 2 materials (n = 10) have an intermediate effect (3-10 fold increase) and are composed primarily (80 %) of saturated compounds. Class 3 materials (n = 11) have very low or no effect on drug solubilization and are entirely composed of saturated compounds. The observed behaviour of the polar lipids was rationalized by using two classical physicochemical parameters: the acyl chain phase transition temperature (Tm) and the critical micellar concentration (CMC). Hence, the superior performance of class 1 polar lipids was explained by the double bonds in their acyl chains, which: (1) significantly decrease Tm, allowing these C18 lipids to form colloidal aggregates and (2) prevent tight packing of the molecules in the aggregates, resulting in bigger volume available for drug solubilization. Long-chain (C18) saturated polar lipids had no significant effect on drug solubilization because their Tm was much higher than the temperature of the experiment (T = 37 C) and, therefore, their association in colloidal aggregates was limited. On the other end of the spectrum, the short chain octanoic acid manifested a high CMC (50 mM), which had to be exceeded in order to enhance drug solubilization. When these two parameters were satisfied (C > CMC, Tm < Texp), the increase of the polar lipid chain length increased the drug solubilization capacity (similarly to classical surfactants), due to the decreased CMC and bigger volume available for solubilization. The hydrophilic head group also has a dramatic impact on the drug solubilization enhancement, with polar lipids performance decreasing in the order: choline phospholipids > monoglycerides > fatty acids. As both the acyl chain length and the head group type are structural features of the polar lipids, and not of the solubilized drugs, the impact of Tm and CMC on solubilization by polar lipids should hold true for a wide variety of hydrophobic molecules. The obtained mechanistic insights can guide rational drug formulation development and thus support modern drug discovery pipelines.

Download Full-text

A zero-inflated non-negative matrix factorization for the deconvolution of mixed signals of biological data

The International Journal of Biostatistics ◽

10.1515/ijb-2020-0039 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Yixin Kong ◽

Ariangela Kozik ◽

Cindy H. Nakatsu ◽

Yava L. Jones-Hall ◽

Hyonho Chun

Keyword(s):

Matrix Factorization ◽

Factor Model ◽

R Package ◽

Biological Data ◽

Superior Performance ◽

Sequencing Data ◽

Fecal Microbiome ◽

Brain Gene Expression ◽

Cell Transcriptome ◽

Non Negative Matrix Factorization

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.

Download Full-text

UniBioDicts: Unified access to Biological Dictionaries

Bioinformatics ◽

10.1093/bioinformatics/btaa1065 ◽

2020 ◽

Author(s):

John Zobolas ◽

Vasundra Touré ◽

Martin Kuiper ◽

Steven Vercruysse

Keyword(s):

User Interface ◽

Life Science ◽

Biological Data ◽

Supplementary Information ◽

Supplementary Data ◽

Query Interface ◽

Controlled Vocabularies ◽

Search String ◽

Software Packages ◽

The Right

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Phenylalkylammonium passivation enables perovskite light emitting diodes with record high-radiance operational lifetime: the chain length matters

Nature Communications ◽

10.1038/s41467-021-20970-6 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Yuwei Guo ◽

Sofia Apergi ◽

Nan Li ◽

Mengyu Chen ◽

Chunyang Yin ◽

...

Keyword(s):

Chain Length ◽

Quantum Efficiency ◽

External Quantum Efficiency ◽

Light Emitting Diodes ◽

Surface Defects ◽

Theoretical Modelling ◽

Ion Migration ◽

Light Emitting ◽

Operational Lifetime ◽

Chain Lengths

AbstractPerovskite light emitting diodes suffer from poor operational stability, exhibiting a rapid decay of external quantum efficiency within minutes to hours after turn-on. To address this issue, we explore surface treatment of perovskite films with phenylalkylammonium iodide molecules of varying alkyl chain lengths. Combining experimental characterization and theoretical modelling, we show that these molecules stabilize the perovskite through suppression of iodide ion migration. The stabilization effect is enhanced with increasing chain length due to the stronger binding of the molecules with the perovskite surface, as well as the increased steric hindrance to reconfiguration for accommodating ion migration. The passivation also reduces the surface defects, resulting in a high radiance and delayed roll-off of external quantum efficiency. Using the optimized passivation molecule, phenylpropylammonium iodide, we achieve devices with an efficiency of 17.5%, a radiance of 1282.8 W sr−1 m−2 and a record T50 half-lifetime of 130 h under 100 mA cm−2.

Download Full-text

Mapping EQ-5D-3L from the Knee Injury and Osteoarthritis Outcome Score (KOOS)

Quality of Life Research ◽

10.1007/s11136-019-02303-9 ◽

2019 ◽

Vol 29 (1) ◽

pp. 265-274

Author(s):

Ali Kiadaliri ◽

Monica Hernández Alava ◽

Ewa M. Roos ◽

Martin Englund

Keyword(s):

Disease Severity ◽

Knee Injury ◽

Cruciate Ligament ◽

Simulated Data ◽

Probit Model ◽

Original Data ◽

Cost Utility ◽

Superior Performance ◽

Outcome Score ◽

Mapping Models

Abstract Purpose To develop a mapping model to estimate EQ-5D-3L from the Knee Injury and Osteoarthritis Outcome Score (KOOS). Methods The responses to EQ-5D-3L and KOOS questionnaires (n = 40,459 observations) were obtained from the Swedish National anterior cruciate ligament (ACL) Register for patients ≥ 18 years with the knee ACL injury. We used linear regression (LR) and beta-mixture (BM) for direct mapping and the generalized ordered probit model for response mapping (RM). We compared the distribution of the original data to the distributions of the data generated using the estimated models. Results Models with individual KOOS subscales performed better than those with the average of KOOS subscale scores (KOOS5, KOOS4). LR had the poorest performance overall and across the range of disease severity particularly at the extremes of the distribution of severity. Compared with the RM, the BM performed better across the entire range of disease severity except the most severe range (KOOS5 < 25). Moving from the most to the least disease severity was associated with 0.785 gain in the observed EQ-5D-3L. The corresponding value was 0.743, 0.772 and 0.782 for LR, BM and RM, respectively. LR generated simulated EQ-5D-3L values outside the feasible range. The distribution of simulated data generated from the BM model was almost identical to the original data. Conclusions We developed mapping models to estimate EQ-5D-3L from KOOS facilitating application of KOOS in cost-utility analyses. The BM showed superior performance for estimating EQ-5D-3L from KOOS. Further validation of the estimated models in different independent samples is warranted.

Download Full-text