scholarly journals Performance Determinants of Unsupervised Clustering Methods for Microbiome Data

Author(s):  
Yushu Shi ◽  
Liangliang Zhang ◽  
Christine Peterson ◽  
Kim-Anh Do ◽  
Robert Jenq

Abstract Background: In microbiome data analysis, unsupervised clustering is often used to identify naturally occurring clusters, which can then be assessed for associations with characteristics of interest. In this work, we systematically compared beta diversity and clustering methods commonly used in microbiome analyses. We applied these to four published datasets where highly distinct microbiome profiles could be seen between sample groups, as well a clinical dataset with less clear separation between groups. Results: Although no single method outperformed the others consistently, we did identify key scenarios where certain methods can underperform. Specifically, the Bray Curtis (BC) metric resulted in poor clustering in a dataset where high-abundance OTUs were relatively rare. In contrast, the unweighted UniFrac (UU) metric clustered poorly on dataset with a high prevalence of low-abundance OTUs. To explore these hypotheses about BC and UU, we systematically modified properties of the poorly performing datasets and found that this approach resulted in improved BC and UU performance. Based on these observations, we rationally combined BC and UU to generate a novel metric. We tested its performance while varying the relative contributions of each metric and also compared it with another combined metric, the generalized UniFrac distance. The proposed metric showed high performance across all datasets. Conclusions Our systematic evaluation of clustering performance in these five datasets demonstrates that there is no existing clustering method that universally performs best across all datasets. We propose a combined metric of BC and UU that capitalizes on the complementary strengths of the two metrics.

2021 ◽  
Author(s):  
Yushu Shi ◽  
Liangliang Zhang ◽  
Christine Peterson ◽  
Kim-Anh Do ◽  
Robert Jenq

Abstract Background: In microbiome data analysis, unsupervised clustering is often used to identify naturally occurring clusters, which can then be assessed for associations with characteristics of interest. In this work, we systematically compared beta diversity and clustering methods commonly used in microbiome analyses. We applied these to four published datasets where highly distinct microbiome profiles could be seen between sample groups. Results: Although no single method outperformed the others consistently, we did identify key scenarios where certain methods can underperform. Specifically, the Bray Curtis metric resulted in poor clustering in a dataset where high-abundance OTUs were relatively rare. In contrast, the unweighted UniFrac metric clustered poorly when used on a dataset with a high prevalence of low-abundance OTUs. To test our proposition, we systematically modified properties of the poorly performing datasets and found that this approach resulted in improved Bray Curtis and unweighted UniFrac performance. Based on these observations, we rationally combined the Bray Curtis metric and the unweighted UniFrac metrics and found that this new beta diversity metric showed high performance across all datasets. We also evaluated our findings by examining a clinical dataset where clusters are less separated. Conclusions: Our systematic evaluation of clustering performance in these five datasets demonstrates that there is no existing clustering method that universally performs best across all datasets. We propose a combined metric of Bray Curtis and unweighted UniFrac that capitalizes on the complementary strengths of the two metrics.


2021 ◽  
Author(s):  
Yushu Shi ◽  
Liangliang Zhang ◽  
Christine Peterson ◽  
Kim-Anh Do ◽  
Robert Jenq

In microbiome data analysis, unsupervised clustering is often used to identify naturally occurring clusters, which can then be assessed for associations with characteristics of interest. In this work, we systematically compared beta diversity and clustering methods commonly used in microbiome analyses. We applied these to four published datasets where highly distinct microbiome profiles could be seen between sample groups. Although no single method outperformed the others consistently, we did identify key scenarios where certain methods can underperform. Specifically, the Bray Curtis metric resulted in poor clustering in a dataset where high-abundance OTUs were relatively rare. In contrast, the unweighted UniFrac metric clustered poorly when used on a dataset with a high prevalence of low-abundance OTUs. To test our proposition, we systematically modified properties of the poorly performing datasets and found that this approach resulted in improved Bray Curtis and unweighted UniFrac performance. Based on these observations, we rationally combined the Bray Curtis metric and the unweighted UniFrac metrics and found that this new beta diversity metric showed high performance across all datasets. We also evaluated our findings by examining a clinical dataset where clusters are less separated. Our systematic evaluation of clustering performance in these five datasets demonstrates that there is no existing clustering method that universally performs best across all datasets. We propose a combined metric of Bray Curtis and unweighted UniFrac that capitalizes on the complementary strengths of the two metrics.


2020 ◽  
Author(s):  
Yushu Shi ◽  
Liangliang Zhang ◽  
Christine Peterson ◽  
Kim-Anh Do ◽  
Robert Jenq

Abstract Background In Microbiome data analysis, unsupervised clustering is often used to identify naturally occurring clusters, which can then be assessed for associations with characteristics of interest. In this work, we systematically compared beta diversity and clustering methods commonly used in microbiome analyses. We applied these to four published datasets where highly distinct microbiome profiles could be seen between sample groups. Results Although no single method outperformed the others consistently, we did identify key scenarios where certain methods can underperform. Specifically, the Bray Curtis metric resulted in poor clustering in a dataset where high-abundance OTUs were relatively rare. In contrast, the unweighted UniFrac metric clustered poorly when used on a dataset with a high prevalence of low-abundance OTUs. To test our proposition, we systematically modified properties of the poorly performing datasets and found that this approach resulted in improved Bray Curtis and unweighted UniFrac performance. Conclusions Based on these observations, we rationally combined the Bray Curtis metric and the unweighted UniFrac metrics and found that this new beta diversity metric showed high performance across all datasets.


2015 ◽  
Vol 75 (6) ◽  
pp. 1016-1023 ◽  
Author(s):  
Anna Moltó ◽  
Adrien Etcheto ◽  
Désirée van der Heijde ◽  
Robert Landewé ◽  
Filip van den Bosch ◽  
...  

BackgroundIncreased risk of some comorbidities has been reported in spondyloarthritis (SpA). Recommendations for detection/management of some of these comorbidities have been proposed, and it is known that a gap exists between these and their implementation in practice.ObjectiveTo evaluate (1) the prevalence of comorbidities and risk factors in different countries worldwide, (2) the gap between available recommendations and daily practice for management of these comorbidities and (3) the prevalence of previously unknown risk factors detected as a result of the present initiative.MethodsCross-sectional international study with 22 participating countries (from four continents), including 3984 patients with SpA according to the rheumatologist.Statistical analysisThe prevalence of comorbidities (cardiovascular, infection, cancer, osteoporosis and gastrointestinal) and risk factors; percentage of patients optimally monitored for comorbidities according to available recommendations and percentage of patients for whom a risk factor was detected due to this study.ResultsThe most frequent comorbidities were osteoporosis (13%) and gastroduodenal ulcer (11%). The most frequent risk factors were hypertension (34%), smoking (29%) and hypercholesterolaemia (27%). Substantial intercountry variability was observed for screening of comorbidities (eg, for LDL cholesterol measurement: from 8% (Taiwan) to 98% (Germany)). Systematic evaluation (eg, blood pressure (BP), cholesterol) during this study unveiled previously unknown risk factors (eg, elevated BP (14%)), emphasising the suboptimal monitoring of comorbidities.ConclusionsA high prevalence of comorbidities in SpA has been shown. Rigorous application of systematic evaluation of comorbidities may permit earlier detection, which may ultimately result in an improved outcome of patients with SpA.


2016 ◽  
Vol 106 (03) ◽  
pp. 125-130
Author(s):  
D. Hofbauer ◽  
J. Greitemann ◽  
M. Grammer ◽  
J. Kaufmann ◽  
G. Prof. Reinhart

Hochleistungswerkstoffe wurden bisher nur für Spezialanwendungen eingesetzt, da hohe Materialkosten und eine geringe Reife der Fertigungstechnologien die Anwendung in der Großserie erschwert haben. Um die grundlegende Eignung der Technologien unter Beachtung der Produktanforderungen zu ermitteln, präsentiert dieser Fachbeitrag eine Methodik für die systematische Bewertung, die am Beispiel der Großserienfertigung von Bauteilen aus Faser-Kunststoff-Verbundwerkstoffen (FKV) erläutert wird.   The use of high-performance materials has so far been limited to special applications for reasons of high material costs and low maturity of manufacturing technologies. These facts avoided their use in mass production in the past. This paper presents a method for systematically evaluating technologies to determine their fundamental suitability for mass production. It is exemplified by large-scale series production of fiber-reinforced plastic components.


Forecasting ◽  
2021 ◽  
Vol 3 (4) ◽  
pp. 663-681
Author(s):  
Alfredo Nespoli ◽  
Andrea Matteri ◽  
Silvia Pretto ◽  
Luca De De Ciechi ◽  
Emanuele Ogliari

The increasing penetration of Renewable Energy Sources (RESs) in the energy mix is determining an energy scenario characterized by decentralized power production. Between RESs power generation technologies, solar PhotoVoltaic (PV) systems constitute a very promising option, but their production is not programmable due to the intermittent nature of solar energy. The coupling between a PV facility and a Battery Energy Storage System (BESS) allows to achieve a greater flexibility in power generation. However, the design phase of a PV+BESS hybrid plant is challenging due to the large number of possible configurations. The present paper proposes a preliminary procedure aimed at predicting a family of batteries which is suitable to be coupled with a given PV plant configuration. The proposed procedure is applied to new hypothetical plants built to fulfill the energy requirements of a commercial and an industrial load. The energy produced by the PV system is estimated on the basis of a performance analysis carried out on similar real plants. The battery operations are established through two decision-tree-like structures regulating charge and discharge respectively. Finally, an unsupervised clustering is applied to all the possible PV+BESS configurations in order to identify the family of feasible solutions.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lifeng Zhu ◽  
Wei Zhu ◽  
Tian Zhao ◽  
Hua Chen ◽  
Chunlin Zhao ◽  
...  

An increasing number of studies have shown that warming also influences the animal gut microbiome (altering the community structure and decreasing its diversity), which might further impact host fitness. Here, based on an analysis of the stomach and gut (the entire intestine: from the anterior intestine to the cloaca) microbiome in laboratory larva of giant salamanders (Andrias davidianus) under different living water temperatures (5, 15, and 25°C) at two sample time points (80 and 330 days after the acclimation), we investigated the potential effect of temperature on the gastrointestinal microbiome community. We found the significant Interaction between sampling time and temperature, or type (stomach and gut) on Shannon index in the gastrointestinal microbiome of the giant salamanders. We also found the significant difference in Shannon index among temperature groups within the same sample type (stomach or gut) at each sample time. 10% of variation in microbiome community could be explained by temperature alone in the total samples. Both the stomach and gut microbiomes displayed the highest similarity in the microbiome community (significantly lowest pairwise unweighted Unifrac distance) in the 25-degree group between the two sampling times compared to those in the 5-degree and 15-degree groups. Moreover, the salamanders in the 25°C treatment showed the highest food intake and body mess compared to that of other temperature treatments. A significant increase in the abundance of Firmicutes in the gastrointestinal microbiome on day 330 with increasing temperatures might be caused by increased host metabolism and food consumption. Therefore, we speculate that the high environmental temperature might indirectly affect both alpha and beta diversity of the gastrointestinal microbiome.


Sign in / Sign up

Export Citation Format

Share Document