scholarly journals Testing for dependence on tree structures

2020 ◽  
Vol 117 (18) ◽  
pp. 9787-9792 ◽  
Author(s):  
Merle Behr ◽  
M. Azim Ansari ◽  
Axel Munk ◽  
Chris Holmes

Tree structures, showing hierarchical relationships and the latent structures between samples, are ubiquitous in genomic and biomedical sciences. A common question in many studies is whether there is an association between a response variable measured on each sample and the latent group structure represented by some given tree. Currently, this is addressed on an ad hoc basis, usually requiring the user to decide on an appropriate number of clusters to prune out of the tree to be tested against the response variable. Here, we present a statistical method with statistical guarantees that tests for association between the response variable and a fixed tree structure across all levels of the tree hierarchy with high power while accounting for the overall false positive error rate. This enhances the robustness and reproducibility of such findings.

2019 ◽  
Author(s):  
Merle Behr ◽  
M. Azim Ansari ◽  
Axel Munk ◽  
Chris Holmes

Tree structures, showing hierarchical relationships and the latent structures between samples, are ubiquitous in genomic and biomedical sciences. A common question in many studies is whether there is an association between a response variable measured on each sample and the latent group structure represented by some given tree. Currently this is addressed on an ad hoc basis, usually requiring the user to decide on an appropriate number of clusters to prune out of the tree to be tested against the response variable. Here we present a statistical method with statistical guarantees that tests for association between the response variable and a fixed tree structure across all levels of the tree hierarchy with high power, while accounting for the overall false positive error rate. This enhances the robustness and reproducibility of such findings.Significance StatementTree like structures are abundant in the empirical sciences as they can summarize high dimensional data and show latent structure among many samples in a single framework. Prominent examples include phylogenetic trees or hierarchical clustering derived from genetic data. Currently users employ ad hoc methods to test for association between a given tree and a response variable, which reduces reproducibility and robustness. In this paper, we introduce treeSeg, a simple to use and widely applicable methodology with high power for testing between all levels of hierarchy for a given tree and the response while accounting for the overall false positive rate. Our method allows for precise uncertainty quantification and therefore increases interpretability and reproducibility of such studies across many fields of science.


2009 ◽  
Vol 102 (1) ◽  
pp. 636-643 ◽  
Author(s):  
Takuya Sasaki ◽  
Genki Minamisawa ◽  
Naoya Takahashi ◽  
Norio Matsuki ◽  
Yuji Ikegaya

We introduce a new method to unveil the network connectivity among dozens of neurons in brain slice preparations. While synaptic inputs were whole cell recorded from given postsynaptic neurons, the spatiotemporal firing patterns of presynaptic neuron candidates were monitored en masse with functional multineuron calcium imaging, an optical technique that records action potential–evoked somatic calcium transients with single-cell resolution. By statistically screening the neurons that exhibited calcium transients immediately before the postsynaptic inputs, we identified the presynaptic cells that made synaptic connections onto the patch-clamped neurons. To enhance the detection power, we devised the following points: 1) [K+]e was lowered and [Ca2+]e and [Mg2+]e were elevated, to reduce background synaptic activity and minimize the failure rate of synaptic transmission; and 2) a small fraction of presynaptic neurons was specifically activated by glutamate applied iontophoretically through a glass pipette that was moved to survey the presynaptic network of interest (“trawling”). Then we could theoretically detect 96% of presynaptic neurons activated in the imaged regions with a 1% false-positive error rate. This on-line probing technique would be a promising tool in the study of the wiring topography of neuronal circuits.


Author(s):  
Arijit Sengupta ◽  
V. Ramesh

This chapter presents DSQL, a conservative extension of SQL, as an ad-hoc query language for XML. The development of DSQL follows the theoretical foundations of first order logic, and uses common query semantics already accepted for SQL. DSQL represents a core subset of XQuery that lends well to query optimization techniques; while at the same time allows easy integration into current databases and applications that use SQL. The intent of DSQL is not to replace XQuery, the current W3C recommended XML query language, but to serve as an ad-hoc querying frontend to XQuery. Further, the authors present proofs for important query language properties such as complexity and closure. An empirical study comparing DSQL and XQuery for the purpose of ad-hoc querying demonstrates that users perform better with DSQL for both flat and tree structures, in terms of both accuracy and efficiency.


2009 ◽  
Vol 20 (4) ◽  
pp. 26-53 ◽  
Author(s):  
Arijit Sengupta ◽  
V. Ramesh

This article presents DSQL, a conservative extension of SQL, as an ad-hoc query language for XML. The development of DSQL follows the theoretical foundations of first order logic, and uses common query semantics already accepted for SQL. DSQL represents a core subset of XQuery that lends well to optimization techniques, while at the same time allows easy integration into current databases and applications that useSQL. The intent of DSQL is not to replace XQuery, the current W3C recommended XML query language, but to serve as an ad-hoc querying frontend to XQuery. Further, the authors present proofs for important query language properties such as complexity and closure. An empirical study comparing DSQL and XQuery for the purpose of ad-hoc querying demonstrates that users perform better with DSQL for both flat and tree structures, in terms of both accuracy and efficiency.


2015 ◽  
Vol 39 (2) ◽  
pp. 67-75 ◽  
Author(s):  
Barbara E. Goodman ◽  
Karen L. Koster ◽  
David L. Swanson

In response to the Howard Hughes Medical Institute/Association of American Medical Colleges Scientific Foundations for Future Physicians (SFFP) report and a concern for better preparing undergraduates for future doctoral programs in the health professions, the deans of the College of Arts and Sciences and Division of Basic Biomedical Sciences of Sanford School of Medicine of the University of South Dakota formed an ad hoc Premedical Curriculum Review Committee with representatives from the science departments and medical school. The Committee began by reviewing the university's suggested premedical curriculum and matching it to the proposed competencies from the SFFP to document duplications and deficiencies. The proposed changes in the Medical College Admission Test for 2015 were also evaluated. The Committee proposed a stronger premedical curriculum, with the development of some new courses, including an inquiry-based physiology course with team-based learning, to more fully address SFFP competencies. These analyses convinced the university that a new major would best help students achieve the competencies and prepare them for admission exams. Thus, a new Medical Biology major was proposed to the South Dakota Board of Regents and accepted for its initial offering in 2012. The new major has been broadly advertised to future students and is successful as a recruiting tool for the university. This article details the process of evaluating the curriculum and designing the new major, describes some of the difficulties in its implementation, and reviews outcomes from the new major to date.


Author(s):  
Yu Wang

The requirement for having a labeled response variable in training data from the supervised learning technique may not be satisfied in some situations: particularly, in dynamic, short-term, and ad-hoc wireless network access environments. Being able to conduct classification without a labeled response variable is an essential challenge to modern network security and intrusion detection. In this chapter we will discuss some unsupervised learning techniques including probability, similarity, and multidimensional models that can be applied in network security. These methods also provide a different angle to analyze network traffic data. For comprehensive knowledge on unsupervised learning techniques please refer to the machine learning references listed in the previous chapter; for their applications in network security see Carmines, Edward & McIver (1981), Lane & Brodley (1997), Herrero, Corchado, Gastaldo, Leoncini, Picasso & Zunino (2007), and Dhanalakshmi & Babu (2008). Unlike in supervised learning, where for each vector 1 2 ( , , , ) n X x x x = ? we have a corresponding observed response, Y, in unsupervised learning we only have X, and Y is not available either because we could not observe it or its frequency is too low to be fit ted with a supervised learning approach. Unsupervised learning has great meanings in practice because in many circumstances, available network traffic data may not include any anomalous events or known anomalous events (e.g., traffics collected from a newly constructed network system). While high-speed mobile wireless and ad-hoc network systems have become popular, the importance and need to develop new unsupervised learning methods that allow the modeling of network traffic data to use anomaly-free training data have significantly increased.


2020 ◽  
pp. jclinpath-2020-206726
Author(s):  
Cornelia Margaret Szecsei ◽  
Jon D Oxley

AimTo examine the effects of specialist reporting on error rates in prostate core biopsy diagnosis.MethodBiopsies were reported by eight specialist uropathologists over 3 years. New cancer diagnoses were double-reported and all biopsies were reviewed for the multidisciplinary team (MDT) meeting. Diagnostic alterations were recorded in supplementary reports and error rates were compared with a decade previously.Results2600 biopsies were reported. 64.1% contained adenocarcinoma, a 19.7% increase. The false-positive error rate had reduced from 0.4% to 0.06%. The false-negative error rate had increased from 1.5% to 1.8%, but represented fewer absolute errors due to increased cancer incidence.ConclusionsSpecialisation and double-reporting have reduced false-positive errors. MDT review of negative cores continues to identify a very low number of false-negative errors. Our data represents a ‘gold standard’ for prostate biopsy diagnostic error rates. Increased use of MRI-targeted biopsies may alter error rates and their future clinical significance.


Author(s):  
Vijay Ram Ghorpade ◽  
Yashwant V. Joshi ◽  
Ramchandra R. Manthalkar

Ideally a hash tree is a perfect binary tree with leaves equal to power of two. Each leaf node in this type of tree can represent a mobile node in an ad hoc network. Each leaf in the tree contains hash value of mobile node’s identification (ID) and public key (PK). Such a tree can be used for authenticating PK in ad hoc networks. Most of the previous works based on hash tree assumed perfect hash tree structures, which can be used efficiently only in networks with a specific number of mobile nodes. Practically the number of mobile nodes may not be always equal to a power of two and the conventional algorithms may result in an inefficient tree structure. In this paper the issue of generating a hash tree is addressed by proposing an algorithm to generate an optimally-balanced structure for a complete hash tree. It is demonstrated through both the mathematical analysis and simulation that such a tree is optimally-balanced and can efficiently be used for public key authentication in ad hoc networks.


Author(s):  
Avani Ahuja

In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation


2019 ◽  
Vol 44 (3) ◽  
pp. 309-341 ◽  
Author(s):  
Jeffrey M. Patton ◽  
Ying Cheng ◽  
Maxwell Hong ◽  
Qi Diao

In psychological and survey research, the prevalence and serious consequences of careless responses from unmotivated participants are well known. In this study, we propose to iteratively detect careless responders and cleanse the data by removing their responses. The careless responders are detected using person-fit statistics. In two simulation studies, the iterative procedure leads to nearly perfect power in detecting extremely careless responders and much higher power than the noniterative procedure in detecting moderately careless responders. Meanwhile, the false-positive error rate is close to the nominal level. In addition, item parameter estimation is much improved by iteratively cleansing the calibration sample. The bias in item discrimination and location parameter estimates is substantially reduced. The standard error estimates, which are spuriously small in the presence of careless responses, are corrected by the iterative cleansing procedure. An empirical example is also presented to illustrate the proposed procedure. These results suggest that the proposed procedure is a promising way to improve item parameter estimation for tests of 20 items or longer when data are contaminated by careless responses.


Sign in / Sign up

Export Citation Format

Share Document