scholarly journals Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data

2019 ◽  
Vol 20 (S15) ◽  
Author(s):  
Neo Christopher Chung ◽  
BłaŻej Miasojedow ◽  
Michał Startek ◽  
Anna Gambin

Abstract Background A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. Results We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p-values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called (https://cran.r-project.org/package=jaccard). Conclusion We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.

2018 ◽  
Vol 56 (2C) ◽  
pp. 64-71
Author(s):  
Nguyen Hoang My Lan

With the philosophy of stimulating ways that nature behaves under extreme weather conditions, Sustainable Urban Drainage System (SUDS) has been internationally recognized as one of the most sustainable approaches to minimizing the impacts of flooding on urban development coupled with the achievement of multiple benefits on environmental and social aspects. In this paper, the social aspect of SUDS is examined through the community’s acceptance of a wide range of SUDS techniques, including Green Roof (GR), Rainwater Harvesting (RWH), Pervious Pavement (PP), Green Open Space (GOP), and Pervious Parking Lot (PPL). Data were collected through a social survey of community responses to above SUDS applications in Nhieu Loc – Thi Nghe sub-basin from November 2016 to March 2017, then SPSS software was used to analyze data and test statistical hypothesis. The results show that the most preferred SUDS technique is PP, followed by PPL, GOP, RWH and GR respectively. Through statistical hypothesis test, the relationship exists between (1) the community’s acceptability to proposed SUDS techniques and district as well as gender; (2) the community’s acceptance for and their knowledge of SUDS applications; and (3) the priority of SUDS’s benefits between the districts and acceptability as well as understanding of SUDS applications.


2009 ◽  
Vol 55 (6) ◽  
pp. 1203-1213 ◽  
Author(s):  
Matthew D Krasowski ◽  
Mohamed G Siam ◽  
Manisha Iyer ◽  
Anthony F Pizon ◽  
Spiros Giannoutsos ◽  
...  

Abstract Background: Immunoassays used for routine drug of abuse (DOA) and toxicology screening may be limited by cross-reacting compounds able to bind to the antibodies in a manner similar to the target molecule(s). To date, there has been little systematic investigation using computational tools to predict cross-reactive compounds. Methods: Commonly used molecular similarity methods enabled calculation of structural similarity for a wide range of compounds (prescription and over-the-counter medications, illicit drugs, and clinically significant metabolites) to the target molecules of DOA/toxicology screening assays. We used various molecular descriptors (MDL public keys, functional class fingerprints, and pharmacophore fingerprints) and the Tanimoto similarity coefficient. These data were then compared with cross-reactivity data in the package inserts of immunoassays marketed for in vitro diagnostic use. Previously untested compounds that were predicted to have a high probability of cross-reactivity were tested. Results: Molecular similarity calculated using MDL public keys and the Tanimoto similarity coefficient showed a strong and statistically significant separation between cross-reactive and non–cross-reactive compounds. This result was validated experimentally by discovery of additional cross-reactive compounds based on computational predictions. Conclusions: The computational methods employed are amenable toward rapid screening of databases of drugs, metabolites, and endogenous molecules and may be useful for identifying cross-reactive molecules that would be otherwise unsuspected. These methods may also have value in focusing cross-reactivity testing on compounds with high similarity to the target molecule(s) and limiting testing of compounds with low similarity and very low probability of cross-reacting with the assay.


2019 ◽  
Vol 1 (2) ◽  
pp. 653-683 ◽  
Author(s):  
Frank Emmert-Streib ◽  
Matthias Dehmer

A statistical hypothesis test is one of the most eminent methods in statistics. Its pivotal role comes from the wide range of practical problems it can be applied to and the sparsity of data requirements. Being an unsupervised method makes it very flexible in adapting to real-world situations. The availability of high-dimensional data makes it necessary to apply such statistical hypothesis tests simultaneously to the test statistics of the underlying covariates. However, if applied without correction this leads to an inevitable increase in Type 1 errors. To counteract this effect, multiple testing procedures have been introduced to control various types of errors, most notably the Type 1 error. In this paper, we review modern multiple testing procedures for controlling either the family-wise error (FWER) or the false-discovery rate (FDR). We emphasize their principal approach allowing categorization of them as (1) single-step vs. stepwise approaches, (2) adaptive vs. non-adaptive approaches, and (3) marginal vs. joint multiple testing procedures. We place a particular focus on procedures that can deal with data with a (strong) correlation structure because real-world data are rarely uncorrelated. Furthermore, we also provide background information making the often technically intricate methods accessible for interdisciplinary data scientists.


2019 ◽  
Vol 19 (2) ◽  
pp. 134-140
Author(s):  
Baek-Ju Sung ◽  
Sung-kyu Lee ◽  
Mu-Seong Chang ◽  
Do-Sik Kim

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Christos Katsaros ◽  
Sophie Le Panse ◽  
Gillian Milne ◽  
Carl J. Carrano ◽  
Frithjof Christian Küpper

Abstract The objective of the present study is to examine the fine structure of vegetative cells of Laminaria digitata using both chemical fixation and cryofixation. Laminaria digitata was chosen due to its importance as a model organism in a wide range of biological studies, as a keystone species on rocky shores of the North Atlantic, its use of iodide as a unique inorganic antioxidant, and its significance as a raw material for the production of alginate. Details of the fine structural features of vegetative cells are described, with particular emphasis on the differences between the two methods used, i.e. conventional chemical fixation and freeze-fixation. The general structure of the cells was similar to that already described, with minor differences between the different cell types. An intense activity of the Golgi system was found associated with the thick external cell wall, with large dictyosomes from which numerous vesicles and cisternae are released. An interesting type of cisternae was found in the cryofixed material, which was not visible with the chemical fixation. These are elongated structures, in sections appearing tubule-like, close to the external cell wall or to young internal walls. An increased number of these structures was observed near the plasmodesmata of the pit fields. They are similar to the “flat cisternae” found associated with the forming cytokinetic diaphragm of brown algae. Their possible role is discussed. The new findings of this work underline the importance of such combined studies which reveal new data not known until now using the old conventional methods. The main conclusion of the present study is that cryofixation is the method of choice for studying Laminaria cytology by transmission electron microscopy.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Tao Yue ◽  
Da Zhao ◽  
Duc T. T. Phan ◽  
Xiaolin Wang ◽  
Joshua Jonghyun Park ◽  
...  

AbstractThe vascular network of the circulatory system plays a vital role in maintaining homeostasis in the human body. In this paper, a novel modular microfluidic system with a vertical two-layered configuration is developed to generate large-scale perfused microvascular networks in vitro. The two-layer polydimethylsiloxane (PDMS) configuration allows the tissue chambers and medium channels not only to be designed and fabricated independently but also to be aligned and bonded accordingly. This method can produce a modular microfluidic system that has high flexibility and scalability to design an integrated platform with multiple perfused vascularized tissues with high densities. The medium channel was designed with a rhombic shape and fabricated to be semiclosed to form a capillary burst valve in the vertical direction, serving as the interface between the medium channels and tissue chambers. Angiogenesis and anastomosis at the vertical interface were successfully achieved by using different combinations of tissue chambers and medium channels. Various large-scale microvascular networks were generated and quantified in terms of vessel length and density. Minimal leakage of the perfused 70-kDa FITC-dextran confirmed the lumenization of the microvascular networks and the formation of tight vertical interconnections between the microvascular networks and medium channels in different structural layers. This platform enables the culturing of interconnected, large-scale perfused vascularized tissue networks with high density and scalability for a wide range of multiorgan-on-a-chip applications, including basic biological studies and drug screening.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Nathanael Lapidus ◽  
Xianlong Zhou ◽  
Fabrice Carrat ◽  
Bruno Riou ◽  
Yan Zhao ◽  
...  

Abstract Background The average length of stay (LOS) in the intensive care unit (ICU_ALOS) is a helpful parameter summarizing critical bed occupancy. During the outbreak of a novel virus, estimating early a reliable ICU_ALOS estimate of infected patients is critical to accurately parameterize models examining mitigation and preparedness scenarios. Methods Two estimation methods of ICU_ALOS were compared: the average LOS of already discharged patients at the date of estimation (DPE), and a standard parametric method used for analyzing time-to-event data which fits a given distribution to observed data and includes the censored stays of patients still treated in the ICU at the date of estimation (CPE). Methods were compared on a series of all COVID-19 consecutive cases (n = 59) admitted in an ICU devoted to such patients. At the last follow-up date, 99 days after the first admission, all patients but one had been discharged. A simulation study investigated the generalizability of the methods' patterns. CPE and DPE estimates were also compared to COVID-19 estimates reported to date. Results LOS ≥ 30 days concerned 14 out of the 59 patients (24%), including 8 of the 21 deaths observed. Two months after the first admission, 38 (64%) patients had been discharged, with corresponding DPE and CPE estimates of ICU_ALOS (95% CI) at 13.0 days (10.4–15.6) and 23.1 days (18.1–29.7), respectively. Series' true ICU_ALOS was greater than 21 days, well above reported estimates to date. Conclusions Discharges of short stays are more likely observed earlier during the course of an outbreak. Cautious unbiased ICU_ALOS estimates suggest parameterizing a higher burden of ICU bed occupancy than that adopted to date in COVID-19 forecasting models. Funding Support by the National Natural Science Foundation of China (81900097 to Dr. Zhou) and the Emergency Response Project of Hubei Science and Technology Department (2020FCA023 to Pr. Zhao).


1982 ◽  
Vol 14 (10) ◽  
pp. 1341-1354 ◽  
Author(s):  
K E Haynes ◽  
F Y Phillips

Mathematical programming and statistical inference are combined in a constrained minimum discrimination information (MDI) method to provide a basis for a wide range of spatial and individual choice behavior problems. This approach offers an alternative to linear and loglinear regression estimation methods as well as probabilistic models of the logit and probit variety. Some logical and computational difficulties inherent in these approaches are resolved. Further, the approach leads endogenously to alternative hypotheses if the null hypothesis is rejected, and hence has implications for the interaction between research that is oriented toward theory construction and applied research that is empirically oriented.


Author(s):  
Roxanne Albertha Charles

Abstract The sand tampan, Ornithodoros savignyi (Audouin, 1827), is an economically important soft tick of the Afrotropics parasitising a wide range of livestock and humans. These ticks are known to inflict painful bites which may be fatal in susceptible hosts. Historically thought to be a single species, Ornithodoros savignyi is now considered to be a complex of four tick subspecies based on molecular and morphological studies. They include Ornithodoros (Ornithodoros) kalahariensis, O. (O.) pavimentosus, O. (O.) noorsveldensis and O. (O.) savignyi. As such there may be significant implications for previous biological studies conducted on this tick. Therefore, for the purposes of this review, sand tampan toxicosis and potentially useful biological molecules have been discussed for O. (O.) savignyi sensu lato since most reported work was based on ticks collected from the Kalahari and Lake Chad region. An overview of the host range and vector biology for the O. (O.) savignyi species complex will also be examined.


Sign in / Sign up

Export Citation Format

Share Document