Faculty Opinions recommendation of Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors.

Twin studies indicate that 30-40% of the disease liability for depression can be attributed to genetic differences. Here, we assess the explanatory ability of polygenic scores (PGS) based on broad- (PGSBD) and clinical- (PGSMDD) depression summary statistics from the UK Biobank using independent cohorts of adults (N=210; 100% European Ancestry) and children (N=728; 70% European Ancestry) who have been extensively phenotyped for depression and related neurocognitive phenotypes. PGS associations with depression severity and diagnosis were generally modest, and larger in adults than children. Polygenic prediction of depression-related phenotypes was mixed and varied by PGS. Higher PGSBD, in adults, was associated with a higher likelihood of having suicidal ideation, increased brooding and anhedonia, and lower levels of cognitive reappraisal; PGSMDD was positively associated with brooding and negatively related to cognitive reappraisal. Overall, PGS based on both broad and clinical depression phenotypes have modest utility in adult and child samples of depression.

Download Full-text

Anthropometric Survey of U.S. Army Personnel: Pilot Summary Statistics, 1988

10.21236/ada241952 ◽

1991 ◽

Cited By ~ 2

Author(s):

Sarah M. Donelson ◽

Claire C. Gordon

Keyword(s):

Summary Statistics ◽

Army Personnel

Download Full-text

Summary Statistics of Implied Probability Density Functions and their Properties

SSRN Electronic Journal ◽

10.2139/ssrn.314392 ◽

2002 ◽

Cited By ~ 1

Author(s):

Damien P.G. Lynch ◽

Nikolaos Panigirtzoglou

Keyword(s):

Probability Density ◽

Probability Density Functions ◽

Density Functions ◽

Summary Statistics

Download Full-text

Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches

Current Bioinformatics ◽

10.2174/1574893615666200127122818 ◽

2020 ◽

Vol 15 ◽

Author(s):

Omer Irshad ◽

Muhammad Usman Ghani Khan

Keyword(s):

Data Integration ◽

Semantic Integration ◽

Biological Data ◽

Cellular System ◽

Formal Specifications ◽

Integration Model ◽

Geographically Dispersed ◽

Proposed Model ◽

Data Heterogeneity ◽

Biological Entities

Aim: To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations. Background: Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell. Objective: To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data. Method: We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with Result: To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers. Conclusion: Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax

Download Full-text

Fishing during the early human occupations of the Atacama Desert coast: what if we standardize the data?

Archaeological and Anthropological Sciences ◽

10.1007/s12520-021-01387-0 ◽

2021 ◽

Vol 13 (9) ◽

Author(s):

Sandra Rebolledo ◽

Philippe Béarez ◽

Débora Zurro

Keyword(s):

Regional Analysis ◽

Atacama Desert ◽

Coastal Communities ◽

Archaeological Sites ◽

Common Component ◽

Exploratory Approach ◽

Data Heterogeneity ◽

Terminal Pleistocene ◽

Methodological Approaches ◽

Early Human

Abstract The Atacama Desert coast (18–30° S) presents one of the earliest chronologies in the South America region, whose first occupations date from ~ 13,000 cal BP. Since that time, coastal and marine resources have been a common component at sites along the littoral zone. Fish species have been particularly important, as have the fishing technologies developed and used by the coastal communities. However, even though several archaeological sites have been studied, there is no systematic macro-regional analysis of early fisheries along the Atacama Desert coast. Furthermore, differences in theoretical and methodological approaches, as well as research objectives, hinder comparisons between ichthyoarchaeological assemblages. Here, we present a comparative analysis of the Atacama Desert fish data obtained from publications and gray literature from ten archaeological sites dating from the Terminal Pleistocene to the Early Holocene. Through the standardization of contextual and ichthyoarchaeological information, we compared data using NISP, MNI, and weight to calculate fish density, richness, and ubiquity, in order to identify similarities and differences between assemblages. This exploratory approach aims to contribute to studies of fish consumption in the area, as well as proposing new methodological questions and solutions regarding data heterogeneity in archaeozoology.

Download Full-text

Summary statistics for quantitative data

Statistics at Square One ◽

10.1002/9781119402350.ch3 ◽

2021 ◽

pp. 31-45

Keyword(s):

Quantitative Data ◽

Summary Statistics

Download Full-text

FRI0585 HIGH-THROUGHPUT METHODOLOGY FOR EMR-BASED IDENTIFICATION OF CLINICAL SUB-PHENOTYPES IN COMPLEX PATIENT POPULATIONS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3489 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 897.2-897

Author(s):

M. Maurits ◽

T. Huizinga ◽

M. Reinders ◽

S. Raychaudhuri ◽

E. Karlson ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Dimensionality Reduction ◽

High Throughput ◽

Brain Cancer ◽

Machine Learning Techniques ◽

Summary Statistics ◽

Medical Problems ◽

Learning Techniques ◽

Icd Codes

Background:Heterogeneity in disease populations complicates discovery of risk factors. To identify risk factors for subpopulations of diseases, we need analytical methods that can deal with unidentified disease subgroups.Objectives:Inspired by successful approaches from the Big Data field, we developed a high-throughput approach to identify subpopulations within patients with heterogeneous, complex diseases using the wealth of information available in Electronic Medical Records (EMRs).Methods:We extracted longitudinal healthcare-interaction records coded by 1,853 PheCodes[1] of the 64,819 patients from the Boston’s Partners-Biobank. Through dimensionality reduction using t-SNE[2] we created a 2D embedding of 32,424 of these patients (set A). We then identified distinct clusters post-t-SNE using DBscan[3] and visualized the relative importance of individual PheCodes within them using specialized spectrographs. We replicated this procedure in the remaining 32,395 records (set B).Results:Summary statistics of both sets were comparable (Table 1).Table 1.Summary statistics of the total Partners Biobank dataset and the 2 partitions.Set-Aset-BTotalEntries12,200,31112,177,13124,377,442Patients32,42432,39564,819Patientyears369,546.33368,597.92738,144.2unique ICD codes25,05624,95326,305unique Phecodes1,8511,8531,853We found 284 clusters in set A and 295 in set B, of which 63.4% from set A could be mapped to a cluster in set B with a median (range) correlation of 0.24 (0.03 – 0.58).Clusters represented similar yet distinct clinical phenotypes; e.g. patients diagnosed with “other headache syndrome” were separated into four distinct clusters characterized by migraines, neurofibromatosis, epilepsy or brain cancer, all resulting in patients presenting with headaches (Fig. 1 & 2). Though EMR databases tend to be noisy, our method was also able to differentiate misclassification from true cases; SLE patients with RA codes clustered separately from true RA cases.Figure 1.Two dimensional representation of Set A generated using dimensionality reduction (tSNE) and clustering (DBScan).Figure 2.Phenotype Spectrographs (PheSpecs) of four clusters characterized by “Other headache syndromes”, driven by codes relating to migraine, epilepsy, neurofibromatosis or brain cancer.Conclusion:We have shown that EMR data can be used to identify and visualize latent structure in patient categorizations, using an approach based on dimension reduction and clustering machine learning techniques. Our method can identify misclassified patients as well as separate patients with similar problems into subsets with different associated medical problems. Our approach adds a new and powerful tool to aid in the discovery of novel risk factors in complex, heterogeneous diseases.References:[1] Denny, J.C. et al. Bioinformatics (2010)[2]van der Maaten et al. Journal of Machine Learning Research (2008)[3] Ester, M. et al. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. (1996)Disclosure of Interests:Marc Maurits: None declared, Thomas Huizinga Grant/research support from: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Consultant of: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Marcel Reinders: None declared, Soumya Raychaudhuri: None declared, Elizabeth Karlson: None declared, Erik van den Akker: None declared, Rachel Knevel: None declared

Download Full-text