scholarly journals Exploratory Only: A Tool for Large-Scale Exploratory Analyses

2021 ◽  
Author(s):  
Jin Kim

This article presents Exploratory Only: an intuitive tool for conducting large-scale exploratory analyses easily and quickly. Available in three forms (as a web application, standalone program, and R Package) and launched as a point-and-click interface, Exploratory Only allows researchers to conduct all possible correlation, moderation, and mediation analyses among selected variables in their data set with minimal effort and time. Compared to a popular alternative, SPSS, Exploratory Only is shown to be orders of magnitude easier and faster at conducting exploratory analyses. The article demonstrates how to use Exploratory Only and discusses the caveat to using it. As long as researchers use Exploratory Only as intended—to discover novel hypotheses to investigate in follow-up studies, rather than to confirm nonexistent a priori hypotheses (i.e., p-hacking)—Exploratory Only can promote progress in behavioral science by encouraging more exploratory analyses and therefore more discoveries.

2021 ◽  
Author(s):  
Magnus Dehli Vigeland ◽  
Thore Egeland

Abstract We address computational and statistical aspects of DNA-based identification of victims in the aftermath of disasters. Current methods and software for such identification typically consider each victim individually, leading to suboptimal power of identification and potential inconsistencies in the statistical summary of the evidence. We resolve these problems by performing joint identification of all victims, using the complete genetic data set. Individual identification probabilities, conditional on all available information, are derived from the joint solution in the form of posterior pairing probabilities. A closed formula is obtained for the a priori number of possible joint solutions to a given DVI problem. This number increases quickly with the number of victims and missing persons, posing computational challenges for brute force approaches. We address this complexity with a preparatory sequential step aiming to reduce the search space. The examples show that realistic cases are handled efficiently. User-friendly implementations of all methods are provided in the R package dvir, freely available on all platforms.


2001 ◽  
Vol 5 (2) ◽  
pp. 201-213 ◽  
Author(s):  
P. Fiorucci ◽  
P. La Barbera ◽  
L.G. Lanza ◽  
R. Minciardi

Abstract. A rain field reconstruction and downscaling methodology is presented, which allows suitable integration of large scale rainfall information and rain-gauge measurements at the ground. The former data set is assumed to provide probabilistic indicators that are used to infer the parameters of the probability density function of the stochastic rain process at each pixel site. Rain-gauge measurements are assumed as the ground truth and used to constrain the reconstructed rain field to the associated point values. Downscaling is performed by assuming the a posteriori estimates of the rain figures at each grid cell as the a priori large-scale conditioning values for reconstruction of the rain field at finer scale. The case study of an intense rain event recently observed in northern Italy is presented and results are discussed with reference to the modelling capabilities of the proposed methodology. Keywords: Reconstruction, downscaling, remote sensing, geostatistics, Meteosat


2015 ◽  
Vol 14 ◽  
pp. CIN.S22080 ◽  
Author(s):  
Askar Obulkasim ◽  
Mark A. Van De Wiel

Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that “haunted” high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can be combined with other algorithms that operate on detected clusters.


2020 ◽  
Author(s):  
Eric R. Reed ◽  
Stefano Monti

AbstractAs high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a “taxonomy-like” structure (https://github.com/montilab/K2Taxonomer). K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data.


2008 ◽  
Vol 8 (2) ◽  
pp. 4561-4602 ◽  
Author(s):  
L. Hoffmann ◽  
M. Kaufmann ◽  
R. Spang ◽  
R. Müller ◽  
J. J. Remedios ◽  
...  

Abstract. From July 2002 to March 2004 the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) aboard the European Space Agency's Environmental Satellite (Envisat) measured nearly continuously mid infrared limb radiance spectra. These measurements are utilised to retrieve the global distribution of the chlorofluorocarbon CFC-11 by applying a new fast forward model for Envisat MIPAS and an accompanying optimal estimation retrieval processor. A detailed analysis shows that the total retrieval errors of the individual CFC-11 volume mixing ratios are typically below 10% and that the systematic components are dominating. Contribution of a priori information to the retrieval results are less than 5 to 10%. The vertical resolution of the observations is about 3 to 4 km. The data are successfully validated by comparison with several other space experiments, an air-borne in-situ instrument, measurements from ground-based networks, and independent Envisat MIPAS analyses. The retrieval results from 425 000 Envisat MIPAS limb scans are compiled to provide a new climatological data set of CFC-11. The climatology shows significantly lower CFC-11 abundances in the lower stratosphere compared with the Reference Atmospheres for MIPAS (RAMstan V3.1) climatology. Depending on the atmospheric conditions the differences between the climatologies are up to 30 to 110 ppt (45 to 150%) at 19 to 27 km altitude. Additionally, time series of CFC-11 mean abundance and variability for five latitudinal bands are presented. The observed CFC-11 distributions can be explained by the residual mean circulation and large-scale eddy-transports in the upper troposphere and lower stratosphere. The new CFC-11 data set is well suited for further scientific studies.


2020 ◽  
Vol 11 ◽  
Author(s):  
Eva Kozáková ◽  
Eduard Bakštein ◽  
Ondřej Havlíček ◽  
Ondřej Bečev ◽  
Pavel Knytl ◽  
...  

Background: Schizophrenia is often characterized by a general disruption of self-processing and self-demarcation. Previous studies have shown that self-monitoring and sense of agency (SoA, i.e., the ability to recognize one's own actions correctly) are altered in schizophrenia patients. However, research findings are inconclusive in regards to how SoA alterations are linked to clinical symptoms and their severity, or cognitive factors.Methods: In a longitudinal study, we examined 161 first-episode schizophrenia patients and 154 controls with a continuous-report SoA task and a control task testing general cognitive/sensorimotor processes. Clinical symptoms were assessed with the Positive and Negative Syndrome Scale (PANSS).Results: In comparison to controls, patients performed worse in terms of recognition of self-produced movements even when controlling for confounding factors. Patients' SoA score correlated with the severity of PANSS-derived “Disorganized” symptoms and with a priori defined symptoms related to self-disturbances. In the follow-up, the changes in the two subscales were significantly associated with the change in SoA performance.Conclusion: We corroborated previous findings of altered SoA already in the early stage of schizophrenia. Decreased ability to recognize self-produced actions was associated with the severity of symptoms in two complementary domains: self-disturbances and disorganization. While the involvement of the former might indicate impairment in self-monitoring, the latter suggests the role of higher cognitive processes such as information updating or cognitive flexibility. The SoA alterations in schizophrenia are associated, at least partially, with the intensity of respective symptoms in a state-dependent manner.


2014 ◽  
Vol 111 (46) ◽  
pp. 16262-16267 ◽  
Author(s):  
Ruth Heller ◽  
Marina Bogomolov ◽  
Yoav Benjamini

2020 ◽  
Vol 2 (3) ◽  
pp. 379-416 ◽  
Author(s):  
Angelo A. Salatino ◽  
Thiviyan Thanapalasingam ◽  
Andrea Mannocci ◽  
Aliaksandr Birukou ◽  
Francesco Osborne ◽  
...  

Ontologies of research areas are important tools for characterizing, exploring, and analyzing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 14K topics and 162K semantic relationships. It was created by applying the Klink-2 algorithm on a very large data set of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO, we have also released the CSO Classifier, a tool for automatically classifying research papers, and the CSO Portal, a Web application that enables users to download, explore, and provide granular feedback on CSO. Users can use the portal to navigate and visualize sections of the ontology, rate topics and relationships, and suggest missing ones. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various research communities engaged with scholarly data.


Sign in / Sign up

Export Citation Format

Share Document