scholarly journals An efficient extension of N-mixture models for multi-species abundance estimation

2016 ◽  
Author(s):  
Juan Pablo Gomez ◽  
Scott K. Robinson ◽  
Jason K. Blackburn ◽  
José Miguel Ponciano

Abstract1. In this study we propose an extension of the N-mixture family of models that targets an improvement of the statistical properties of rare species abundance estimators when sample sizes are low, yet typical size for tropical studies. The proposed method harnesses information from other species in an ecological community to correct each species’ estimator. We provide guidance to determine the sample size required to estimate accurately the abundance of rare tropical species when attempting to estimate the abundance of single species.2. We evaluate the proposed methods using an assumption of 50-m radius plots and perform simulations comprising a broad range of sample sizes, true abundances and detectability values and a complex data generating process. The extension of the N-mixture model is achieved by assuming that the detection probabilities of a set of species are all drawn at random from a beta distribution in a multi-species fashion. This hierarchical model avoids having to specify a single detection probability parameter per species in the targeted community. Parameter estimation is done via Maximum Likelihood.3. We compared our multi-species approach with previously proposed multi-species N-mixture models, which we show are biased when the true densities of species in the community are less than seven individuals per 100-ha. The beta N-mixture model proposed here outperforms the traditional Multi-species N-mixture model by allowing the estimation of organisms at lower densities and controlling the bias in the estimation.4. We illustrate how our methodology can be used to suggest sample sizes required to estimate the abundance of organisms, when these are either rare, common or abundant. When the interest is full communities, we show how the multi-species approaches, and in particular our beta model and estimation methodology, can be used as a practical solution to estimate organism densities from rapid inventory datasets. The statistical inferences done with our model via Maximum Likelihood can also be used to group species in a community according to their detectabilities.

2021 ◽  
Author(s):  
Samyajoy Pal ◽  
Christian Heumann

Abstract A generalized way of building mixture models using different distributions is explored in this article. The EM algorithm is used with some modifications to accommodate different distributions within the same model. The model uses any point estimate available for the respective distributions to estimate the mixture components and model parameters. The study is focused on the application of mixture models in unsupervised learning problems, especially cluster analysis. The convenience of building mixture models using the generalized approach is further emphasised by appropriate examples, exploiting the well-known maximum likelihood and Bayesian estimates of the parameters of the parent distributions.


2011 ◽  
Vol 58-60 ◽  
pp. 1847-1853 ◽  
Author(s):  
Yan Zhang ◽  
Cun Bao Chen ◽  
Li Zhao

In this paper, Gaussian Mixture model (GMM) as specific method is applied to noise classification. On this basis, a modified Gaussian Mixture Model with an embedded Auto-Associate Neural Network (AANN) is proposed. It integrates the merits of GMM and AANN. We train GMM and AANN as a whole and they are trained by means of Maximum Likelihood (ML). In the process of training, the parameter of GMM and AANN are updated alternately. AANN reshapes the distribution of the data and improves the similarity of the feature data in the same distribution type of noise. Experiments show that the GMM with embedded AANN improves accuracy rate of noise classification against baseline GMM.


2020 ◽  
Vol 4 ◽  
Author(s):  
Lidia Garrido-Sanz ◽  
Miquel Àngel Senar ◽  
Josep Piñol

Amplicon metabarcoding is an established technique to analyse the taxonomic composition of communities of organisms using high-throughput DNA sequencing, but there are doubts about its ability to quantify the relative proportions of the species, as opposed to the species list. Here, we bypass the enrichment step and avoid the PCR-bias, by directly sequencing the extracted DNA using shotgun metagenomics. This approach is common practice in prokaryotes, but not in eukaryotes, because of the low number of sequenced genomes of eukaryotic species. We tested the metagenomics approach using insect species whose genome is already sequenced and assembled to an advanced degree. We shotgun-sequenced, at low-coverage, 18 species of insects in 22 single-species and 6 mixed-species libraries and mapped the reads against 110 reference genomes of insects. We used the single-species libraries to calibrate the process of assignation of reads to species and the libraries created from species mixtures to evaluate the ability of the method to quantify the relative species abundance. Our results showed that the shotgun metagenomic method is easily able to set apart closely-related insect species, like four species of Drosophila included in the artificial libraries. However, to avoid the counting of rare misclassified reads in samples, it was necessary to use a rather stringent detection limit of 0.001, so species with a lower relative abundance are ignored. We also identified that approximately half the raw reads were informative for taxonomic purposes. Finally, using the mixed-species libraries, we showed that it was feasible to quantify with confidence the relative abundance of individual species in the mixtures.


2006 ◽  
Vol 36 (2) ◽  
pp. 573-588 ◽  
Author(s):  
John W. Lau ◽  
Tak Kuen Siu ◽  
Hailiang Yang

We introduce a class of Bayesian infinite mixture models first introduced by Lo (1984) to determine the credibility premium for a non-homogeneous insurance portfolio. The Bayesian infinite mixture models provide us with much flexibility in the specification of the claim distribution. We employ the sampling scheme based on a weighted Chinese restaurant process introduced in Lo et al. (1996) to estimate a Bayesian infinite mixture model from the claim data. The Bayesian sampling scheme also provides a systematic way to cluster the claim data. This can provide some insights into the risk characteristics of the policyholders. The estimated credibility premium from the Bayesian infinite mixture model can be written as a linear combination of the prior estimate and the sample mean of the claim data. Estimation results for the Bayesian mixture credibility premiums will be presented.


2020 ◽  
Vol 34 (04) ◽  
pp. 4215-4222
Author(s):  
Binyuan Hui ◽  
Pengfei Zhu ◽  
Qinghua Hu

Graph convolutional networks (GCN) have achieved promising performance in attributed graph clustering and semi-supervised node classification because it is capable of modeling complex graphical structure, and jointly learning both features and relations of nodes. Inspired by the success of unsupervised learning in the training of deep models, we wonder whether graph-based unsupervised learning can collaboratively boost the performance of semi-supervised learning. In this paper, we propose a multi-task graph learning model, called collaborative graph convolutional networks (CGCN). CGCN is composed of an attributed graph clustering network and a semi-supervised node classification network. As Gaussian mixture models can effectively discover the inherent complex data distributions, a new end to end attributed graph clustering network is designed by combining variational graph auto-encoder with Gaussian mixture models (GMM-VGAE) rather than the classic k-means. If the pseudo-label of an unlabeled sample assigned by GMM-VGAE is consistent with the prediction of the semi-supervised GCN, it is selected to further boost the performance of semi-supervised learning with the help of the pseudo-labels. Extensive experiments on benchmark graph datasets validate the superiority of our proposed GMM-VGAE compared with the state-of-the-art attributed graph clustering networks. The performance of node classification is greatly improved by our proposed CGCN, which verifies graph-based unsupervised learning can be well exploited to enhance the performance of semi-supervised learning.


1994 ◽  
Vol 31 (1) ◽  
pp. 128-136 ◽  
Author(s):  
Sachin Gupta ◽  
Pradeep K. Chintagunta

The authors propose an extension of the logit-mixture model that defines prior segment membership probabilities as functions of concomitant (demographic) variables. Using this approach it is possible to describe how membership in each of the segments, segments being characterized by a specific profile of brand preferences and marketing variable sensitivities, is related to household demographic characteristics. An empirical application of the methodology is provided using A.C. Nielsen scanner panel data on catsup. The authors provide a comparison with the results obtained using the extant methodology in estimation and validation samples of households.


Sign in / Sign up

Export Citation Format

Share Document