scholarly journals The application of zeta diversity as a continuous measure of compositional change in ecology

2017 ◽  
Author(s):  
Melodie A. Mcgeoch ◽  
Guillaume Latombe ◽  
Nigel R. Andrew ◽  
Shinichi Nakagawa ◽  
David A. Nipperess ◽  
...  

AbstractZeta diversity provides the average number of shared species acrossnsites (or shared operational taxonomic units (OTUs) acrossncases). It quantifies the variation in species composition of multiple assemblages in space and time to capture the contribution of the full suite of narrow, intermediate and wide-ranging species to biotic heterogeneity. Zeta diversity was proposed for measuring compositional turnover in plant and animal assemblages, but is equally relevant for application to any biological system that can be characterised by a row by column incidence matrix. Here we illustrate the application of zeta diversity to explore compositional change in empirical data, and how observed patterns may be interpreted. We use 10 datasets from a broad range of scales and levels of biological organisation – from DNA molecules to microbes, plants and birds – including one of the original data sets used by R.H. Whittaker in the 1960’s to express compositional change and distance decay using beta diversity. The applications show (i) how different sampling schemes used during the calculation of zeta diversity may be appropriate for different data types and ecological questions, (ii) how higher orders of zeta may in some cases better detect shifts, transitions or periodicity, and importantly (iii) the relative roles of rare versus common species in driving patterns of compositional change. By exploring the application of zeta diversity across this broad range of contexts, our goal is to demonstrate its value as a tool for understanding continuous biodiversity turnover and as a metric for filling the empirical gap that exists on spatial or temporal change in compositional diversity.


2018 ◽  
Author(s):  
Guillaume Latombe ◽  
Melodie A. McGeoch ◽  
David A. Nipperess ◽  
Cang Hui

AbstractSpatial variation in compositional diversity, or species turnover, is necessary for capturing the components of heterogeneity that constitute biodiversity. However, no incidence-based metric of pairwise species turnover can calculate all components of diversity partitioning. Zeta (ζ) diversity, the mean number of species shared by any given number of sites or assemblages, captures all diversity components produced by assemblage partitioning. zetadiv is an R package for analysing and measuring compositional change for occurrence data using zeta diversity. Four types of analyses are performed on bird composition data in Australia: (i) decline in zeta diversity; (ii) distance decay; (iii) multi-site generalised dissimilarity modelling; and (iv) hierarchical scaling. Some analyses, such as the zeta decline, are specific to zeta diversity, whereas others, such as distance decay, are commonly applied to beta diversity, and have been adapted using zeta diversity to differentiate the contribution of common and rare species to compositional change.HighlightsAn R package to analyse compositional change using zeta diversity is presented.Zeta diversity is the mean number of species shared by any number of assemblagesZeta diversity captures all diversity components produced by assemblage partitioningAnalyses relate zeta diversity to space, environment and spatial scaleAnalyses differentiate the contribution of rare and common species to biodiversity



Author(s):  
Ying Wang ◽  
Yiding Liu ◽  
Minna Xia

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.



Author(s):  
Dhamanpreet Kaur ◽  
Matthew Sobiesk ◽  
Shubham Patil ◽  
Jin Liu ◽  
Puran Bhagat ◽  
...  

Abstract Objective This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data. Materials and Methods We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data. Results Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. Discussion Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools. Conclusion We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.



MycoKeys ◽  
2018 ◽  
Vol 39 ◽  
pp. 29-40 ◽  
Author(s):  
Sten Anslan ◽  
R. Henrik Nilsson ◽  
Christian Wurzbacher ◽  
Petr Baldrian ◽  
Leho Tedersoo ◽  
...  

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.



2006 ◽  
Vol 11 (1) ◽  
pp. 114-129 ◽  
Author(s):  
Teemu Suna ◽  
Michael Hardey ◽  
Jouni Huhtinen ◽  
Yrjö Hiltunen ◽  
Kimmo Kaski ◽  
...  

A marked feature of recent developments in the networked society has been the growth in the number of people making use of Internet dating services. These services involve the accumulation of large amounts of personal information which individuals utilise to find others and potentially arrange offline meetings. The consequent data represent a challenge to conventional analysis, for example, the service that provided the data used in this paper had approximately 5,000 users all of whom completed an extensive questionnaire resulting in some 300 parameters. This creates an opportunity to apply innovative analytical techniques that may provide new sociological insights into complex data. In this paper we utilise the self-organising map (SOM), an unsupervised neural network methodology, to explore Internet dating data. The resulting visual maps are used to demonstrate the ability of SOMs to reveal interrelated parameters. The SOM process led to the emergence of correlations that were obscured in the original data and pointed to the role of what we call ‘cultural age’ in the profiles and partnership preferences of the individuals. Our results suggest that the SOM approach offers a well established methodology that can be easily applied to complex sociological data sets. The SOM outcomes are discussed in relation to other research about identifying others and forming relationships in a network society.



Author(s):  
Danlei Xu ◽  
Lan Du ◽  
Hongwei Liu ◽  
Penghui Wang

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.



F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 146 ◽  
Author(s):  
Guanming Wu ◽  
Eric Dawson ◽  
Adrian Duong ◽  
Robin Haw ◽  
Lincoln Stein

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.



2020 ◽  
Author(s):  
Camden Jansen ◽  
Kitt D. Paraiso ◽  
Jeff J. Zhou ◽  
Ira L. Blitz ◽  
Margaret B. Fish ◽  
...  

SummaryMesendodermal specification is one of the earliest events in embryogenesis, where cells first acquire distinct identities. Cell differentiation is a highly regulated process that involves the function of numerous transcription factors (TFs) and signaling molecules, which can be described with gene regulatory networks (GRNs). Cell differentiation GRNs are difficult to build because existing mechanistic methods are low-throughput, and high-throughput methods tend to be non-mechanistic. Additionally, integrating highly dimensional data comprised of more than two data types is challenging. Here, we use linked self-organizing maps to combine ChIP-seq/ATAC-seq with temporal, spatial and perturbation RNA-seq data from Xenopus tropicalis mesendoderm development to build a high resolution genome scale mechanistic GRN. We recovered both known and previously unsuspected TF-DNA/TF-TF interactions and validated through reporter assays. Our analysis provides new insights into transcriptional regulation of early cell fate decisions and provides a general approach to building GRNs using highly-dimensional multi-omic data sets.HighlightsBuilt a generally applicable pipeline to creating GRNs using highly-dimensional multi-omic data setsPredicted new TF-DNA/TF-TF interactions during mesendoderm developmentGenerate the first genome scale GRN for vertebrate mesendoderm and expanded the core mesendodermal developmental network with high fidelityDeveloped a resource to visualize hundreds of RNA-seq and ChIP-seq data using 2D SOM metaclusters.



2020 ◽  
Author(s):  
Annika Tjuka ◽  
Robert Forkel ◽  
Johann-Mattis List

Psychologists and linguists have collected a great diversity of data for word and concept properties. In psychology, many studies accumulate norms and ratings such as word frequencies or age-of-acquisition often for a large number of words. Linguistics, on the other hand, provides valuable insights into relations of word meanings. We present a collection of those data sets for norms, ratings, and relations that cover different languages: ‘NoRaRe.’ To enable a comparison between the diverse data types, we established workflows that facilitate the expansion of the database. A web application allows convenient access to the data (https://digling.org/norare/). Furthermore, a software API ensures consistent data curation by providing tests to validate the data sets. The NoRaRe collection is linked to the database curated by the Concepticon project (https://concepticon.clld.org) which offers a reference catalog of unified concept sets. The link between words in the data sets and the Concepticon concept sets makes a cross-linguistic comparison possible. In three case studies, we test the validity of our approach, the accuracy of our workflow, and the applicability of our database. The results indicate that the NoRaRe database can be applied for the study of word properties across multiple languages. The data can be used by psychologists and linguists to benefit from the knowledge rooted in both research disciplines.



2021 ◽  
Vol 40 (5) ◽  
pp. 324-334
Author(s):  
Rongxin Huang ◽  
Zhigang Zhang ◽  
Zedong Wu ◽  
Zhiyuan Wei ◽  
Jiawei Mei ◽  
...  

Seismic imaging using full-wavefield data that includes primary reflections, transmitted waves, and their multiples has been the holy grail for generations of geophysicists. To be able to use the full-wavefield data effectively requires a forward-modeling process to generate full-wavefield data, an inversion scheme to minimize the difference between modeled and recorded data, and, more importantly, an accurate velocity model to correctly propagate and collapse energy of different wave modes. All of these elements have been embedded in the framework of full-waveform inversion (FWI) since it was proposed three decades ago. However, for a long time, the application of FWI did not find its way into the domain of full-wavefield imaging, mostly owing to the lack of data sets with good constraints to ensure the convergence of inversion, the required compute power to handle large data sets and extend the inversion frequency to the bandwidth needed for imaging, and, most significantly, stable FWI algorithms that could work with different data types in different geologic settings. Recently, with the advancement of high-performance computing and progress in FWI algorithms at tackling issues such as cycle skipping and amplitude mismatch, FWI has found success using different data types in a variety of geologic settings, providing some of the most accurate velocity models for generating significantly improved migration images. Here, we take a step further to modify the FWI workflow to output the subsurface image or reflectivity directly, potentially eliminating the need to go through the time-consuming conventional seismic imaging process that involves preprocessing, velocity model building, and migration. Compared with a conventional migration image, the reflectivity image directly output from FWI often provides additional structural information with better illumination and higher signal-to-noise ratio naturally as a result of many iterations of least-squares fitting of the full-wavefield data.



Sign in / Sign up

Export Citation Format

Share Document