diversity measure
Recently Published Documents


TOTAL DOCUMENTS

120
(FIVE YEARS 20)

H-INDEX

23
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Suyash Sawant ◽  
Chiti Arvind ◽  
Viral Joshi ◽  
V.V. Robin

Birdsong plays an important role in mate attraction and territorial defense. Many birds, especially Passerines, produce varying sequences of multiple notes resulting in complex songs. Studying the diversity of notes within these songs can give insights into an individuals reproductive fitness. We first looked at the previously described and commonly used diversity measures to understand the possible case-specific limitations. We then developed a new diversity measure- Song Richness Index (SRI). We compared SRI with three measures of diversity using all possible combinations of notes to understand the case-specific advantages and limitations of all approaches. Simulating all possible combinations gave us insights into how each diversity measure works in a real scenario. SRI showed an advantage over conventional measures of diversity like Note Diversity Index (NDI), Shannons Equitability (SH), and Simpsons Diversity (SI), especially in the cases where songs are made up of only one type of repetitive note.


2021 ◽  
Author(s):  
M. Senthil Kumar ◽  
Eric V. Slud ◽  
Christine Hehnly ◽  
Lijun Zhang ◽  
James Broach ◽  
...  

Individual and environmental health outcomes are frequently linked to changes in the diversity of associated microbial communities. This makes deriving health indicators based on microbiome diversity measures essential. While microbiome data generated using high throughput 16S rRNA marker gene surveys are appealing for this purpose, 16S surveys also generate a plethora of spurious microbial taxa. When this artificial inflation in the observed number of taxa (i.e., richness, a diversity measure) is ignored, we find that changes in the abundance of detected taxa confound current methods for inferring differences in richness. Here we argue that the evidence of our own experiments, theory guided exploratory data analyses and existing literature, support the conclusion that most sub-genus discoveries are spurious artifacts of clustering 16S sequencing reads. We proceed based on this finding to model a 16S survey's systematic patterns of sub-genus taxa generation as a function of genus abundance to derive a robust control for false taxa accumulation. Such controls unlock classical regression approaches for highly flexible differential richness inference at various levels of the surveyed microbial assemblage: from sample groups to specific taxa collections. The proposed methodology for differential richness inference is available through an R package, Prokounter. Package availability: https://github.com/mskb01/prokounter


Author(s):  
Paul Hoffman ◽  
Matthew A. Lambon Ralph ◽  
Timothy T. Rogers

AbstractSemantic diversity refers to the degree of semantic variability in the contexts in which a particular word is used. We have previously proposed a method for measuring semantic diversity based on latent semantic analysis (LSA). In a recent paper, Cevoli et al. (2020) attempted to replicate our method and obtained different semantic diversity values. They suggested that this discrepancy occurred because they scaled their LSA vectors by their singular values, while we did not. Using their new results, they argued that semantic diversity is not related to ambiguity in word meaning, as we originally proposed. In this reply, we demonstrate that the use of unscaled vectors provides better fits to human semantic judgements than scaled ones. Thus we argue that our original semantic diversity measure should be preferred over the Cevoli et al. version. We replicate Cevoli et al.’s analysis using the original semantic diversity measure and find (a) our original measure is a better predictor of word recognition latencies than the Cevoli et al. equivalent and (b) that, unlike Cevoli et al.’s measure, our semantic diversity is reliably associated with a measure of polysemy based on dictionary definitions. We conclude that the Hoffman et al. semantic diversity measure is better-suited to capturing the contextual variability among words and that words appearing in a more diverse set of contexts have more variable semantic representations. However, we found that homonyms did not have higher semantic diversity values than non-homonyms, suggesting that the measure does not capture this special case of ambiguity.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
T Goolam Mahomed ◽  
RPH Peters ◽  
GHJ Pretorius ◽  
A Goolam Mahomed ◽  
V Ueckermann ◽  
...  

Abstract Background Targeted metagenomics and IS-Pro method are two of the many methods that have been used to study the microbiome. The two methods target different regions of the 16 S rRNA gene. The aim of this study was to compare targeted metagenomics and IS-Pro methods for the ability to discern the microbial composition of the lung microbiome of COPD patients. Methods Spontaneously expectorated sputum specimens were collected from COPD patients. Bacterial DNA was extracted and used for targeted metagenomics and IS-Pro method. The analysis was performed using QIIME2 (targeted metagenomics) and IS-Pro software (IS-Pro method). Additionally, a laboratory cost per isolate and time analysis was performed for each method. Results Statistically significant differences were observed in alpha diversity when targeted metagenomics and IS-Pro methods’ data were compared using the Shannon diversity measure (p-value = 0.0006) but not with the Simpson diversity measure (p-value = 0.84). Distinct clusters with no overlap between the two technologies were observed for beta diversity. Targeted metagenomics had a lower relative abundance of phyla, such as the Proteobacteria, and higher relative abundance of phyla, such as Firmicutes when compared to the IS-Pro method. Haemophilus, Prevotella and Streptococcus were most prevalent genera across both methods. Targeted metagenomics classified 23 % (144/631) of OTUs to a species level, whereas IS-Pro method classified 86 % (55/64) of OTUs to a species level. However, unclassified OTUs accounted for a higher relative abundance when using the IS-Pro method (35 %) compared to targeted metagenomics (5 %). The two methods performed comparably in terms of cost and time; however, the IS-Pro method was more user-friendly. Conclusions It is essential to understand the value of different methods for characterisation of the microbiome. Targeted metagenomics and IS-Pro methods showed differences in ability in identifying and characterising OTUs, diversity and microbial composition of the lung microbiome. The IS-Pro method might miss relevant species and could inflate the abundance of Proteobacteria. However, the IS-Pro kit identified most of the important lung pathogens, such as Burkholderia and Pseudomonas and may work in a more diagnostics-orientated setting. Both methods were comparable in terms of cost and time; however, the IS-Pro method was easier to use.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jiangbo Zou ◽  
Xiaokang Fu ◽  
Lingling Guo ◽  
Chunhua Ju ◽  
Jingjing Chen

Ensemble classifiers improve the classification accuracy by incorporating the decisions made by its component classifiers. Basically, there are two steps to create an ensemble classifier: one is to generate base classifiers and the other is to align the base classifiers to achieve maximum accuracy integrally. One of the major problems in creating ensemble classifiers is the classification accuracy and diversity of the component classifiers. In this paper, we propose an ensemble classifier generating algorithm to improve the accuracy of an ensemble classification and to maximize the diversity of its component classifiers. In this algorithm, information entropy is introduced to measure the diversity of component classifiers, and a cyclic iterative optimization selection tactic is applied to select component classifiers from base classifiers, in which the number of component classifiers is dynamically adjusted to minimize system cost. It is demonstrated that our method has an obvious lower memory cost with higher classification accuracy compared with existing classifier methods.


2021 ◽  
Author(s):  
Paul Hoffman ◽  
Matt Lambon Ralph ◽  
Timothy Thomas Rogers

Semantic diversity refers to the degree of semantic variability in the contexts in which a particular word is used. In 2013, we proposed a method for measuring semantic diversity based on latent semantic analysis (LSA) (Hoffman, Lambon Ralph, & Rogers, 2013). In a recent paper, Cevoli, Watkins and Rastle (2020) criticised our method, noting that we had failed to scale our LSA vectors by their singular values, which they considered to be a critical stage in the analysis. They presented new analyses using their own semantic diversity measure that included this step. In this reply, we demonstrate that the use of unscaled vectors provides better fits to human semantic judgements than scaled ones. Thus we argue that our original semantic diversity measure should be preferred over the Cevoli et al. version. We replicate Cevoli et al.’s analysis using the original semantic diversity measure and find (a) our original measure is a better predictor of word recognition latencies than the Cevoli et al. equivalent and (b) that, unlike Cevoli et al.’s measure, our semantic diversity is reliably associated with a measure of polysemy based on dictionary definitions. We conclude that the original Hoffman et al. semantic diversity measure is better-suited to capturing the contextual variability among words and that words appearing in a more diverse set of contexts have more variable semantic representations. However, we found that homonyms did not have higher semantic diversity values than non-homonyms, suggesting that the measure does not capture this special case of ambiguity.


Corpora ◽  
2020 ◽  
Vol 15 (3) ◽  
pp. 317-342
Author(s):  
Linlin Sun ◽  
David Correia Saavedra

This paper applies a quantitative model developed for measuring grammatical status, using data from the Lancaster Corpus of Mandarin Chinese (lcmc). The model takes into account four quantitative factors (token frequency, collocate diversity, colligate diversity and deviation of proportions) and uses them as predictors in a binary logistic regression in order to compute a score of grammatical status between ‘0’ (lexical/non-grammatical) and ‘1’ (highly grammatical) for each given element. The results of the lcmc model are then compared to those of a similar study of the British National Corpus (bnc). The comparison suggests that token frequency emerges as one of the most relevant parameters for quantifying degrees of grammatical status in both language models, together with the collocate diversity measure when using a broad window span. On the other hand, the colligational measures (left- or right-based) and the other collocate diversity measures using small spans (left- or right-based) contribute very differently to the two languages due to their typologically distinctive structures.


2020 ◽  
Vol 8 (10) ◽  
pp. 1612
Author(s):  
Dongyang Yang ◽  
Wei Xu

Modeling and analyzing human microbiome allows the assessment of the microbial community and its impacts on human health. Microbiome composition can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are useful in personalized medicine by identifying subgroups for patients stratification. However, there is currently a lack of standardized clustering method for the complex microbiome sequencing data. We propose a clustering algorithm with a specific beta diversity measure that can address the presence-absence bias encountered for sparse count data and effectively measure the sample distances for sample stratification. Our distance measure used for clustering is derived from a parametric based mixture model producing sample-specific distributions conditional on the observed operational taxonomic unit (OTU) counts and estimated mixture weights. The method can provide accurate estimates of the true zero proportions and thus construct a precise beta diversity measure. Extensive simulation studies have been conducted and suggest that the proposed method achieves substantial clustering improvement compared with some widely used distance measures when a large proportion of zeros is presented. The proposed algorithm was implemented to a human gut microbiome study on Parkinson’s diseases to identify distinct microbiome states with biological interpretations.


Sign in / Sign up

Export Citation Format

Share Document