scholarly journals Multi-scale Adaptive Differential Abundance Analysis in Microbial Compositional Data

2021 ◽  
Author(s):  
Shulei Wang

Differential abundance analysis is an essential and commonly used tool to characterize the difference between microbial communities. However, identifying differentially abundant microbes remains a challenging problem because the observed microbiome data is inherently compositional, excessive sparse, and distorted by experimental bias. Besides these major challenges, the results of differential abundance analysis also depend largely on the choice of analysis unit, adding another practical complexity to this already complicated problem. In this work, we introduce a new differential abundance test called the MsRDB test, which embeds the sequences into a metric space and integrates a multi-scale adaptive strategy for utilizing spatial structure to identify differentially abundant microbes. Compared with existing methods, the MsRDB test can detect differentially abundant microbes at the finest resolution offered by data and provide adequate detection power while being robust to zero counts, compositional effect, and experimental bias in the microbial compositional data set. Applications to both simulated and real microbial compositional data sets demonstrate the usefulness of the MsRDB test.

BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Da Xu ◽  
Jialin Zhang ◽  
Hanxiao Xu ◽  
Yusen Zhang ◽  
Wei Chen ◽  
...  

Abstract Background The small number of samples and the curse of dimensionality hamper the better application of deep learning techniques for disease classification. Additionally, the performance of clustering-based feature selection algorithms is still far from being satisfactory due to their limitation in using unsupervised learning methods. To enhance interpretability and overcome this problem, we developed a novel feature selection algorithm. In the meantime, complex genomic data brought great challenges for the identification of biomarkers and therapeutic targets. The current some feature selection methods have the problem of low sensitivity and specificity in this field. Results In this article, we designed a multi-scale clustering-based feature selection algorithm named MCBFS which simultaneously performs feature selection and model learning for genomic data analysis. The experimental results demonstrated that MCBFS is robust and effective by comparing it with seven benchmark and six state-of-the-art supervised methods on eight data sets. The visualization results and the statistical test showed that MCBFS can capture the informative genes and improve the interpretability and visualization of tumor gene expression and single-cell sequencing data. Additionally, we developed a general framework named McbfsNW using gene expression data and protein interaction data to identify robust biomarkers and therapeutic targets for diagnosis and therapy of diseases. The framework incorporates the MCBFS algorithm, network recognition ensemble algorithm and feature selection wrapper. McbfsNW has been applied to the lung adenocarcinoma (LUAD) data sets. The preliminary results demonstrated that higher prediction results can be attained by identified biomarkers on the independent LUAD data set, and we also structured a drug-target network which may be good for LUAD therapy. Conclusions The proposed novel feature selection method is robust and effective for gene selection, classification, and visualization. The framework McbfsNW is practical and helpful for the identification of biomarkers and targets on genomic data. It is believed that the same methods and principles are extensible and applicable to other different kinds of data sets.


Soil Research ◽  
1993 ◽  
Vol 31 (4) ◽  
pp. 407 ◽  
Author(s):  
GD Buchan ◽  
KS Grewal ◽  
JJ Claydon ◽  
RJ Mcpherson

The X-ray attenuation (Sedigraph) method for particle-size analysis is known to consistently estimate a finer size distribution than the pipette method. The objectives of this study were to compare the two methods, and to explore the reasons for their divergence. The methods are compared using two data sets from measurements made independently in two New Zealand laboratories, on two different sets of New Zealand soils, covering a range of textures and parent materials. The Sedigraph method gave systematically greater mass percentages at the four measurement diameters (20, 10, 5 and 2 �m). For one data set, the difference between clay (<2 �m) percentages from the two methods is shown to be positively correlated (R2 = 0.625) with total iron content of the sample, for all but one of the soils. This supports a novel hypothesis that the typically greater concentration of Fe (a strong X-ray absorber) in smaller size fractions is the major factor causing the difference. Regression equations are presented for converting the Sedigraph data to their pipette equivalents.


2018 ◽  
Vol 18 (3) ◽  
pp. 1573-1592 ◽  
Author(s):  
Gerrit de Leeuw ◽  
Larisa Sogacheva ◽  
Edith Rodriguez ◽  
Konstantinos Kourtidis ◽  
Aristeidis K. Georgoulias ◽  
...  

Abstract. The retrieval of aerosol properties from satellite observations provides their spatial distribution over a wide area in cloud-free conditions. As such, they complement ground-based measurements by providing information over sparsely instrumented areas, albeit that significant differences may exist in both the type of information obtained and the temporal information from satellite and ground-based observations. In this paper, information from different types of satellite-based instruments is used to provide a 3-D climatology of aerosol properties over mainland China, i.e., vertical profiles of extinction coefficients from the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP), a lidar flying aboard the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite and the column-integrated extinction (aerosol optical depth – AOD) available from three radiometers: the European Space Agency (ESA)'s Along-Track Scanning Radiometer version 2 (ATSR-2), Advanced Along-Track Scanning Radiometer (AATSR) (together referred to as ATSR) and NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the Terra satellite, together spanning the period 1995–2015. AOD data are retrieved from ATSR using the ATSR dual view (ADV) v2.31 algorithm, while for MODIS Collection 6 (C6) the AOD data set is used that was obtained from merging the AODs obtained from the dark target (DT) and deep blue (DB) algorithms, further referred to as the DTDB merged AOD product. These data sets are validated and differences are compared using Aerosol Robotic Network (AERONET) version 2 L2.0 AOD data as reference. The results show that, over China, ATSR slightly underestimates the AOD and MODIS slightly overestimates the AOD. Consequently, ATSR AOD is overall lower than that from MODIS, and the difference increases with increasing AOD. The comparison also shows that neither of the ATSR and MODIS AOD data sets is better than the other one everywhere. However, ATSR ADV has limitations over bright surfaces which the MODIS DB was designed for. To allow for comparison of MODIS C6 results with previous analyses where MODIS Collection 5.1 (C5.1) data were used, also the difference between the C6 and C5.1 merged DTDB data sets from MODIS/Terra over China is briefly discussed. The AOD data sets show strong seasonal differences and the seasonal features vary with latitude and longitude across China. Two-decadal AOD time series, averaged over all of mainland China, are presented and briefly discussed. Using the 17 years of ATSR data as the basis and MODIS/Terra to follow the temporal evolution in recent years when the environmental satellite Envisat was lost requires a comparison of the data sets for the overlapping period to show their complementarity. ATSR precedes the MODIS time series between 1995 and 2000 and shows a distinct increase in the AOD over this period. The two data series show similar variations during the overlapping period between 2000 and 2011, with minima and maxima in the same years. MODIS extends this time series beyond the end of the Envisat period in 2012, showing decreasing AOD.


Geophysics ◽  
2020 ◽  
pp. 1-41 ◽  
Author(s):  
Jens Tronicke ◽  
Niklas Allroggen ◽  
Felix Biermann ◽  
Florian Fanselow ◽  
Julien Guillemoteau ◽  
...  

In near-surface geophysics, ground-based mapping surveys are routinely employed in a variety of applications including those from archaeology, civil engineering, hydrology, and soil science. The resulting geophysical anomaly maps of, for example, magnetic or electrical parameters are usually interpreted to laterally delineate subsurface structures such as those related to the remains of past human activities, subsurface utilities and other installations, hydrological properties, or different soil types. To ease the interpretation of such data sets, we propose a multi-scale processing, analysis, and visualization strategy. Our approach relies on a discrete redundant wavelet transform (RWT) implemented using cubic-spline filters and the à trous algorithm, which allows to efficiently compute a multi-scale decomposition of 2D data using a series of 1D convolutions. The basic idea of the approach is presented using a synthetic test image, while our archaeo-geophysical case study from North-East Germany demonstrates its potential to analyze and process rather typical geophysical anomaly maps including magnetic and topographic data. Our vertical-gradient magnetic data show amplitude variations over several orders of magnitude, complex anomaly patterns at various spatial scales, and typical noise patterns, while our topographic data show a distinct hill structure superimposed by a microtopographic stripe pattern and random noise. Our results demonstrate that the RWT approach is capable to successfully separate these components and that selected wavelet planes can be scaled and combined so that the reconstructed images allow for a detailed, multi-scale structural interpretation also using integrated visualizations of magnetic and topographic data. Because our analysis approach is straightforward to implement without laborious parameter testing and tuning, computationally efficient, and easily adaptable to other geophysical data sets, we believe that it can help to rapidly analyze and interpret different geophysical mapping data collected to address a variety of near-surface applications from engineering practice and research.


mSphere ◽  
2017 ◽  
Vol 2 (6) ◽  
Author(s):  
Xiang Gao ◽  
Huaiying Lin ◽  
Qunfeng Dong

ABSTRACT Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes’ theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.


2013 ◽  
Vol 31 (4) ◽  
pp. 231-252 ◽  
Author(s):  
Rajat Gupta ◽  
Matthew Gregg ◽  
Hu Du ◽  
Katie Williams

PurposeTo critically compare three future weather year (FWY) downscaling approaches, based on the 2009 UK Climate Projections, used for climate change impact and adaptation analysis in building simulation software.Design/methodology/approachThe validity of these FWYs is assessed through dynamic building simulation modelling to project future overheating risk in typical English homes in 2050s and 2080s.FindingsThe modelling results show that the variation in overheating projections is far too significant to consider the tested FWY data sets equally suitable for the task.Research and practical implicationsIt is recommended that future research should consider harmonisation of the downscaling approaches so as to generate a unified data set of FWYs to be used for a given location and climate projection. If FWY are to be used in practice, live projects will need viable and reliable FWY on which to base their adaptation decisions. The difference between the data sets tested could potentially lead to different adaptation priorities specifically with regard to time series and adaptation phasing through the life of a building.Originality/valueThe paper investigates the different results derived from FWY application to building simulation. The outcome and implications are important considerations for research and practice involved in FWY data use in building simulation intended for climate change adaptation modelling.


2012 ◽  
Vol 5 (2) ◽  
pp. 2887-2931 ◽  
Author(s):  
J. Heymann ◽  
O. Schneising ◽  
M. Reuter ◽  
M. Buchwitz ◽  
V. V. Rozanov ◽  
...  

Abstract. Carbon dioxide (CO2) is the most important greenhouse gas whose atmospheric loading has been significantly increased by anthropogenic activity leading to global warming. Accurate measurements and models are needed in order to reliably predict our future climate. This, however, has challenging requirements. Errors in measurements and models need to be identified and minimised. In this context, we present a comparison between satellite-derived column-averaged dry air mole fractions of CO2, denoted XCO2, retrieved from SCIAMACHY/ENVISAT using the WFM-DOAS algorithm, and output from NOAA's global CO2 modelling and assimilation system CarbonTracker. We investigate to what extent differences between these two data sets are influenced by systematic retrieval errors due to aerosols and unaccounted clouds. We analyse seven years of SCIAMACHY WFM-DOAS version 2.1 retrievals (WFMDv2.1) using the latest version of CarbonTracker (version 2010). We investigate to what extent the difference between SCIAMACHY and CarbonTracker XCO2 are temporally and spatially correlated with global aerosol and cloud data sets. For this purpose, we use a global aerosol data set generated within the European GEMS project, which is based on assimilated MODIS satellite data. For clouds, we use a data set derived from CALIOP/CALIPSO. We find significant correlations of the SCIAMACHY minus CarbonTracker XCO2 difference with thin clouds over the Southern Hemisphere. The maximum temporal correlation we find for Darwin, Australia (r2 = 54%). Large temporal correlations with thin clouds are also observed over other regions of the Southern Hemisphere (e.g. 43% for South America and 31% for South Africa). Over the Northern Hemisphere the temporal correlations are typically much lower. An exception is India, where large temporal correlations with clouds and aerosols have also been found. For all other regions the temporal correlations with aerosol are typically low. For the spatial correlations the picture is less clear. They are typically low for both aerosols and clouds, but dependent on region and season, they may exceed 30% (the maximum value of 46% has been found for Darwin during September to November). Overall we find that the presence of thin clouds can potentially explain a significant fraction of the difference between SCIAMACHY WFMDv2.1 XCO2 and CarbonTracker over the Southern Hemisphere. Aerosols appear to be less of a problem. Our study indicates that the quality of the satellite derived XCO2 will significantly benefit from a reduction of scattering related retrieval errors at least for the Southern Hemisphere.


2018 ◽  
Author(s):  
Farahnaz Khosrawi ◽  
Stefan Lossow ◽  
Gabriele P. Stiller ◽  
Karen H. Rosenlof ◽  
Joachim Urban ◽  
...  

Abstract. Time series of stratospheric and lower mesospheric water vapour using 33 data sets from 15 different satellite instruments were compared in the framework of the second SPARC (Stratosphere-troposphere Processes And their Role in Climate) water vapour assessment (WAVAS-II). This comparison aimed to provide a comprehensive overview of the typical uncertainties in the observational database that can be considered in the future in observational and modelling studies addressing e.g stratospheric water vapour trends. The time series comparisons are presented for the three latitude bands, the Antarctic (80°–70° S), the tropics (15° S–15° N) and the northern hemisphere mid-latitudes (50° N–60° N) at four different altitudes (0.1, 3, 10 and 80 hPa) covering the stratosphere and lower mesosphere. The combined temporal coverage of observations from the 15 satellite instruments allowed considering the time period 1986–2014. In addition to the qualitative comparison of the time series, the agreement of the data sets is assessed quantitatively in the form of the spread (i.e. the difference between the maximum and minimum volume mixing ratio among the data sets), the (Pearson) correlation coefficient and the drift (i.e. linear changes of the difference between time series over time). Generally, good agreement between the time series was found in the middle stratosphere while larger differences were found in the lower mesosphere and near the tropopause. Concerning the latitude bands, the largest differences were found in the Antarctic while the best agreement was found for the tropics. From our assessment we find that all data sets can be considered in the future in observational and modelling studies addressing e.g. stratospheric and lower mesospheric water vapour variability and trends when data set specific characteristics (e.g. a drift) and restrictions (e.g. temporal and spatial coverage) are taken into account.


Radiocarbon ◽  
2010 ◽  
Vol 52 (3) ◽  
pp. 895-900 ◽  
Author(s):  
Yui Takahashi ◽  
Hirohisa Sakurai ◽  
Kayo Suzuki ◽  
Taiichi Sato ◽  
Shuichi Gunji ◽  
...  

Radiocarbon ages of Choukai Jindai cedar tree rings growing in the excess era of 14C concentrations during 2757–2437 cal BP were measured using 2 types of 14C measurement methods, i.e. liquid scintillation counting (LSC) and accelerator mass spectrometry (AMS). The difference between the 2 methods is 3.7 ± 5.2 14C yr on average for 61 single-year tree rings, indicating good agreement between the methods. The Choukai data sets show a small sharp bump with an average 14C age of 2497.1 ± 3.0 14C yr BP during 2650–2600 cal BP. Although the profile of the Choukai LSC data set compares well with that of IntCal04, having a 14C age difference of 4.6 ± 5.3 14C yr on average, the Choukai LSC 14C ages indicate variability against the smoothed profile of IntCal04.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Huang Lin ◽  
Shyamal Das Peddada

AbstractIncreasingly, researchers are discovering associations between microbiome and a wide range of human diseases such as obesity, inflammatory bowel diseases, HIV, and so on. The first step towards microbiome wide association studies is the characterization of the composition of human microbiome under different conditions. Determination of differentially abundant microbes between two or more environments, known as differential abundance (DA) analysis, is a challenging and an important problem that has received considerable interest during the past decade. It is well documented in the literature that the observed microbiome data (OTU/SV table) are relative abundances with an excess of zeros. Since relative abundances sum to a constant, these data are necessarily compositional. In this article we review some recent methods for DA analysis and describe their strengths and weaknesses.


Sign in / Sign up

Export Citation Format

Share Document