Integrative Data Analysis from a Unifying Research Synthesis Perspective

Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

2017 ◽  
Vol 44 (2) ◽  
pp. 203-229 ◽  
Author(s):  
Javier D Fernández ◽  
Miguel A Martínez-Prieto ◽  
Pablo de la Fuente Redondo ◽  
Claudio Gutiérrez

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.


2021 ◽  
Author(s):  

ABSTRACTObjectivesEvidence from randomised trials on long-term blood pressure (BP) reduction from pharmacologic treatment is limited. To investigate the effects of antihypertensive drugs on long-term BP change and examine its variation over time and among people with different clinical characteristicsDesignIndividual participant-level data meta-analysisSetting and data sourceThe Blood Pressure Lowering Treatment Trialists’ Collaboration involving 51 large-scale long-term randomised clinical trialsParticipants352,744 people (42% women) with mean age of 65 years and mean baseline systolic/diastolic BP of 152/87 mmHg, of whom 18% were current smokers, 50% had cardiovascular disease, 29% had diabetes, and 72% were taking antihypertensive treatment at baselineInterventionPharmacological BP-lowering treatmentOutcomeDifference in longitudinal changes in systolic and diastolic BP between randomised treatment arms over an average follow-up of four yearsResultDrugs were effective in lowering BP, with the maximum effect becoming apparent after 12-month follow-up and with gradual attenuation towards later years. Based on measures taken ≥12 months post-randomisation, more intense BP-lowering treatment reduced systolic/diastolic BP (95% confidence interval) by −11.2 (−11.4 to −11.0)/−5.6 (−5.8 to −5.5) mmHg than less intense treatment; active treatment by −5.1 (−5.3 to −5.0)/−2.3 (−2.4 to −2.2) mmHg lower than placebo, and active arm by −1.4 (−1.5 to −1.3)/−0.6 (−0.7 to −0.6) mmHg lower than the control arm for drug class comparison trials. BP reductions were consistently observed across a wide range of baseline BP values and ages, and by sex, history of cardiovascular disease and diabetes, and prior antihypertensive treatment use.ConclusionPharmacological agents were effective in lowering long-term BP among individuals with a wide range of characteristics, but the net between-group reductions were modest, which is partly attributable to the intended trial goals.


Author(s):  
Charles Auerbach ◽  
Wendy Zeitlin

Single-subject research designs have been used to build evidence to the effective treatment of problems across various disciplines, including social work, psychology, psychiatry, medicine, allied health fields, juvenile justice, and special education. This book serves as a guide for those desiring to conduct single-subject data analysis. The aim of this text is to introduce readers to the various functions available in SSD for R, a new, free, and innovative software package written in R, the robust open-source statistical programming language written by the book’s authors. SSD for R has the most comprehensive functionality specifically designed for the analysis of single-subject research data currently available. SSD for R has numerous graphing and charting functions to conduct robust visual analysis. Besides the ability to create simple line graphs, features are available to add mean, median, and standard deviation lines across phases to help better visualize change over time. Graphs can be annotated with text. SSD for R contains a wide variety of functions to conduct statistical analyses traditionally conducted with single-subject data. These include numerous descriptive statistics and effect size functions and tests of statistical significance, such as t tests, chi-squares, and the conservative dual criteria. Finally, SSD for R has the capability of analyzing group-level data. Readers are led step by step through the analytical process based on the characteristics of their data. Numerous examples and illustrations are provided to help readers understand the wide range of functions available in SSD for R and their application to data analysis and interpretation.


2009 ◽  
Vol 83 (3) ◽  
pp. 563-589 ◽  
Author(s):  
David T. Merrett ◽  
Simon Ville

An expanding economy, new technologies, and changing consumer preferences provided growth opportunities for firms in interwar Australia. This period saw an increase in the number of large-scale firms in mining, manufacturing, and a wide range of service industries. Firms unable to rely solely on retained earnings to fund expansion turned to the domestic stock exchanges. A new data set of capital raisings constructed from reports of prospectuses published in the financial press forms the basis for the conclusion that many firms used substantial injections of equity finance to augment internally generated sources of funds. That they were able to do so indicates a strong increase in the capacity of local stock exchanges and a greater willingness of individuals to hold part of their wealthin transferable securities.


SIMULATION ◽  
2012 ◽  
Vol 88 (12) ◽  
pp. 1438-1455
Author(s):  
Ciprian Dobre

The scale, complexity and worldwide geographical spread of the Large Hadron Collider (LHC) computing and data analysis problems are unprecedented in scientific research. The complexity of processing and accessing this data is increased substantially by the size and global span of the major experiments, combined with the limited wide-area network bandwidth available. This paper discusses the latest generation of the MONARC (MOdels of Networked Analysis at Regional Centers) simulation framework, as a design and modeling tool for large-scale distributed systems applied to high-energy physics experiments. We present a simulation study designed to evaluate the capabilities of the current real-world distributed infrastructures deployed to support existing LHC physics analysis processes and the means by which the experiments band together to meet the technical challenges posed by the storage, access and computing requirements of LHC data analysis. The Compact Muon Solenoid (CMS) experiment, in particular, uses a general-purpose detector to investigate a wide range of physics. We present a simulation study designed to evaluate the capability of its underlying distributed processing infrastructure to support the physics analysis processes. The results, made possible by the MONARC model, demonstrate that the LHC infrastructures are well suited to support the data processes envisioned by the CMS computing model.


2019 ◽  
Vol 80 (4) ◽  
pp. 617-637 ◽  
Author(s):  
Kathleen V. McGrath ◽  
Elizabeth A. Leighton ◽  
Mihaela Ene ◽  
Christine DiStefano ◽  
Diane M. Monrad

Survey research frequently involves the collection of data from multiple informants. Results, however, are usually analyzed by informant group, potentially ignoring important relationships across groups. When the same construct(s) are measured, integrative data analysis (IDA) allows pooling of data from multiple sources into one data set to examine information from multiple perspectives within the same analysis. Here, the IDA procedure is demonstrated via the examination of pooled data from student and teacher school climate surveys. This study contributes to the sparse literature regarding IDA applications in the social sciences, specifically in education. It also lays the groundwork for future educational researchers interested in the practical applications of the IDA framework to empirical data sets with complex model structures.


2020 ◽  
Author(s):  
Christoph Ogris ◽  
Yue Hu ◽  
Janine Arloth ◽  
Nikola S. Müller

AbstractConstantly decreasing costs of high-throughput profiling on many molecular levels generate vast amounts of so-called multi-omics data. Studying one biomedical question on two or more omic levels provides deeper insights into underlying molecular processes or disease pathophysiology. For the majority of multi-omics data projects, the data analysis is performed level-wise, followed by a combined interpretation of results. Few exceptions exist, for example the pairwise integration for quantitative trait analysis. However, the full potential of integrated data analysis is not leveraged yet, presumably due to the complexity of the data and the lacking toolsets. Here we propose a versatile approach, to perform a multi-level integrated analysis: The Knowledge guIded Multi-Omics Network inference approach, KiMONo. KiMONo performs network inference using statistical modeling on top of a powerful knowledge-guided strategy exploiting prior information from biological sources. Within the resulting network, nodes represent features of all input types and edges refer to associations between them, e.g. underlying a disease. Our method infers the network by combining sparse grouped-LASSO regression with a genomic position-confined Biogrid protein-protein interaction prior. In a comprehensive evaluation, we demonstrate that our method is robust to noise and still performs on low-sample size data. Applied to the five-level data set of the publicly available Pan-cancer collection, KiMONO integrated mutation, epigenetics, transcriptomics, proteomics and clinical information, detecting cancer specific omic features. Moreover, we analysed a four-level data set from a major depressive disorder cohort, including genetic, epigenetic, transcriptional and clinical data. Here we demonstrated KiMONo’s analytical power to identify expression quantitative trait methylation sites and loci and show it’s advantage to state-of-the-art methods. Our results show the general applicability to the full spectrum multi-omics data and demonstrating that KiMONo is a powerful approach towards leveraging the full potential of data sets. The method is freely available as an R package (https://github.com/cellmapslab/kimono).


2021 ◽  
Author(s):  
Abigail Z. Jacobs ◽  
Duncan J. Watts

Theories of organizations are sympathetic to long-standing ideas from network science that organizational networks should be regarded as multiscale and capable of displaying emergent properties. However, the historical difficulty of collecting individual-level network data for many (N ≫ 1) organizations, each of which comprises many (n ≫ 1) individuals, has hobbled efforts to develop specific, theoretically motivated hypotheses connecting micro- (i.e., individual-level) network structure with macro-organizational properties. In this paper we seek to stimulate such efforts with an exploratory analysis of a unique data set of aggregated, anonymized email data from an enterprise email system that includes 1.8 billion messages sent by 1.4 million users from 65 publicly traded U.S. firms spanning a wide range of sizes and 7 industrial sectors. We uncover wide heterogeneity among firms with respect to all measured network characteristics, and we find robust network and organizational variation as a result of size. Interestingly, we find no clear associations between organizational network structure and firm age, industry, or performance; however, we do find that centralization increases with geographical dispersion—a result that is not explained by network size. Although preliminary, these results raise new questions for organizational theory as well as new issues for collecting, processing, and interpreting digital network data. This paper was accepted by David Simchi-Levi, Special Issue of Management Science: 65th Anniversary.


Author(s):  
Ishtiaque Ahmed ◽  
◽  
Manan Darda ◽  
Neha Tikyani ◽  
Rachit Agrawal ◽  
...  

The COVID-19 pandemic has caused large-scale outbreaks in more than 150 countries worldwide, causing massive damage to the livelihood of many people. The capacity to identify contaminated patients early and get unique treatment is quite possibly the primary stride in the battle against COVID-19. One of the quickest ways to diagnose patients is to use radiography and radiology images to detect the disease. Early studies have shown that chest X-rays of patients infected with COVID-19 have unique abnormalities. To identify COVID-19 patients from chest X-ray images, we used various deep learning models based on previous studies. We first compiled a data set of 2,815 chest radiographs from public sources. The model produces reliable and stable results with an accuracy of 91.6%, a Positive Predictive Value of 80%, a Negative Predictive Value of 100%, specificity of 87.50%, and Sensitivity of 100%. It is observed that the CNN-based architecture can diagnose COVID19 disease. The parameters’ outcomes can be further improved by increasing the dataset size and by developing the CNN-based architecture for training the model.


2016 ◽  
Author(s):  
Xiang Zhu ◽  
Matthew Stephens

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.


Sign in / Sign up

Export Citation Format

Share Document