multinomial sampling
Recently Published Documents


TOTAL DOCUMENTS

42
(FIVE YEARS 8)

H-INDEX

11
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Michael Moret ◽  
Francesca Grisoni ◽  
Paul Katzberger ◽  
Gisbert Schneider

Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.


Author(s):  
Lorena Romero-Medrano ◽  
Pablo Moreno-Muñoz ◽  
Antonio Artés-Rodríguez

AbstractBayesian change-point detection, with latent variable models, allows to perform segmentation of high-dimensional time-series with heterogeneous statistical nature. We assume that change-points lie on a lower-dimensional manifold where we aim to infer a discrete representation via subsets of latent variables. For this particular model, full inference is computationally unfeasible and pseudo-observations based on point-estimates of latent variables are used instead. However, if their estimation is not certain enough, change-point detection gets affected. To circumvent this problem, we propose a multinomial sampling methodology that improves the detection rate and reduces the delay while keeping complexity stable and inference analytically tractable. Our experiments show results that outperform the baseline method and we also provide an example oriented to a human behavioral study.


2021 ◽  
Author(s):  
Jan Graffelman

AbstractThe geometric series or niche preemption model is an elementary ecological model in biodiversity studies. The preemption parameter of this model is usually estimated by regression or iteratively by using May’s equation. This article proposes a maximum likelihood estimator for the niche preemption model, assuming a known number of species and multinomial sampling. A simulation study shows that the maximum likelihood estimator outperforms the classical estimators in this context in terms of bias and precision. We obtain the distribution of the maximum likelihood estimator and use it to obtain confidence intervals for the preemption parameter and to develop a preemption t test that can address the hypothesis of equal geometric decay in two samples. We illustrate the use of the new estimator with some empirical data sets taken from the literature and provide software for its use.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
F. William Townes ◽  
Stephanie C. Hicks ◽  
Martin J. Aryee ◽  
Rafael A. Irizarry

AbstractSingle-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.


2019 ◽  
Author(s):  
F. William Townes ◽  
Stephanie C. Hicks ◽  
Martin J. Aryee ◽  
Rafael A. Irizarry

AbstractSingle cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization pro-cedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We pro-pose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform current practice in a downstream clustering assessment using ground-truth datasets.


2015 ◽  
Vol 86 (3) ◽  
pp. 510-523 ◽  
Author(s):  
Mary E. Haynes ◽  
Roy T. Sabo ◽  
N. Rao Chaganty

Sign in / Sign up

Export Citation Format

Share Document