A System for Phenotype Harmonization in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program

Abstract Genotype-phenotype association studies often combine phenotype data from multiple studies to increase power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data sharing mechanisms. This system was developed for the National Heart, Lung and Blood Institute’s Trans-Omics for Precision Medicine program, which is generating genomic and other omics data for >80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants from up to 17 studies per phenotype (participants recruited 1948-2012). We discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled-access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include (1) the code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify or extend these harmonizations to additional studies; and (2) results of labeling thousands of phenotype variables with controlled vocabulary terms.

Download Full-text

A system for phenotype harmonization in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program

10.1101/2020.06.18.146423 ◽

2020 ◽

Author(s):

Adrienne M. Stilp ◽

Leslie S. Emery ◽

Jai G. Broome ◽

Erin J. Buth ◽

Alyna T. Khan ◽

...

Keyword(s):

Precision Medicine ◽

Association Studies ◽

Controlled Vocabulary ◽

National Institutes Of Health ◽

Design Data ◽

Data Set ◽

Data Repositories ◽

Phenotype Data ◽

National Heart Lung ◽

Centralized System

Genotype-phenotype association studies often combine phenotype data from multiple studies to increase power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data sharing mechanisms. This system was developed for the National Heart, Lung and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other omics data for >80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants from up to 17 TOPMed studies per phenotype. We discuss the challenges faced in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled-access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include (1) the code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify or extend these harmonizations to additional studies; and (2) results of labeling thousands of phenotype variables with controlled vocabulary terms.

Download Full-text

Precision Medicine Landscape of Genomic Testing for Patients With Cancer in the National Institutes of Health All of Us Database Using Informatics Approaches

JCO Clinical Cancer Informatics ◽

10.1200/cci.21.00152 ◽

2022 ◽

Author(s):

Jay G. Ronquillo ◽

William T. Lester

Keyword(s):

Precision Medicine ◽

Race And Ethnicity ◽

Gene Mutations ◽

National Institutes Of Health ◽

Genomic Testing ◽

Biomedical Data ◽

Data Repositories ◽

Cancer Data ◽

Patients With Cancer ◽

Significant Difference

PURPOSE The rapid growth of biomedical data ecosystems has catalyzed research for oncology and precision medicine. We leverage federal cloud-based precision medicine databases and tools to better understand the current landscape of precision medicine and genomic testing for patients with cancer. METHODS Retrospective observational study of genomic testing for patients with cancer in the National Institutes of Health All of Us Research Program, with the cancer cohort defined as having at least two documented or reported cancer diagnoses. RESULTS There were 5,678 (1.8%) All of Us participants in the cancer cohort, with a significant difference between cancer status by age category, sex, race, and ethnicity ( P < .001 for all). There were 295 (5.2%) patients with cancer who received genomic testing compared with 6,734 (2.2%) of noncancer patients, with 752 genomic tests commonly focused on gene mutations (primarily pharmacogenomics), molecular pathology, or clinical cytogenetic reports. CONCLUSION Although not yet ubiquitous, diverse clinical genomic analyses in oncology can set the stage to grow the practice of precision medicine by integrating research patient data repositories, cancer data ecosystems, and biomedical informatics.

Download Full-text

Benefits of Compiling and Analyzing Hydraulic-Design Data for Bridges

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211023757 ◽

2021 ◽

pp. 036119812110237

Author(s):

Stephen T. Benedict ◽

Thomas P. Knight

Keyword(s):

South Carolina ◽

Bridge Scour ◽

Hydraulic Characteristics ◽

Bridge Design ◽

Design Data ◽

Data Set ◽

Engineering Knowledge ◽

Hydraulic Design ◽

Strong Measure ◽

Design Characteristics

The hydraulic design of bridges is a discipline that requires a strong measure of engineering judgment. Developing good engineering judgment can take years of experience, and generally increases one project at a time. A supplemental tool that can promote the development of engineering knowledge and judgment is to compile, analyze, and graphically present hydraulic data associated with stream and bridge-design characteristics from previously analyzed bridges. If the data set is sufficiently large, graphs developed from such an effort can provide the engineer with an enhanced picture of stream and bridge-design characteristics, helping them further develop their engineering knowledge and judgment. Furthermore, such graphs can function as project scoping tools and hydraulic-design review tools. Using selected data from approximately 300 bridge-scour studies in South Carolina, previously conducted by the U.S. Geological Survey, and limited hydraulic bridge-design data for approximately 200 bridges in South Carolina, trends in stream and bridge-hydraulic characteristics were evaluated including channel width, floodplain width, flood flow depths, stream slopes, bridge backwater, bridge flow velocity, and bridge lengths. Selected relationships are presented in this paper and should serve as a valuable tool for better understanding stream and bridge-hydraulic characteristics in South Carolina.

Download Full-text

Mentoring upcoming researchers for non-communicable diseases’ research and practice in Malawi

Trials ◽

10.1186/s13063-020-05006-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Adamson S. Muula ◽

Mina C. Hosseinipour ◽

Martha Makwero ◽

Johnstone Kumwenda ◽

Prosper Lutala ◽

...

Keyword(s):

Research Capacity ◽

Communicable Diseases ◽

Lessons Learned ◽

National Institutes Of Health ◽

Non Communicable Diseases ◽

Mentorship Program ◽

Research And Practice ◽

Blood Institute ◽

Research Mentorship ◽

National Heart Lung

AbstractThe Malawi College of Medicine and its partners are building non-communicable diseases’ (NCDs’) research capacity through a grant from the National Heart, Lung and Blood Institute (NHLBI) of the National Institutes of Health. Several strategies are being implemented including research mentorship for junior researchers interested to build careers in NCDs’ research. In this article, we present the rationale for and our experiences with this mentorship program over its 2 years of implementation. Lessons learned and the challenges are also shared.

Download Full-text

A randomized controlled trial of low-dose hormone therapy on myocardial ischemia in postmenopausal women with no obstructive coronary artery disease: Results from the National Institutes of Health/National Heart, Lung, and Blood Institute–sponsored Women's Ischemia Syndrome Evaluation (WISE)

American Heart Journal ◽

10.1016/j.ahj.2010.03.024 ◽

2010 ◽

Vol 159 (6) ◽

pp. 987.e1-987.e7 ◽

Cited By ~ 22

Author(s):

C. Noel Bairey Merz ◽

Marian B. Olson ◽

Candace McClure ◽

Yu-Ching Yang ◽

James Symons ◽

...

Keyword(s):

Coronary Artery Disease ◽

Randomized Controlled Trial ◽

Low Dose ◽

Controlled Trial ◽

Obstructive Coronary Artery Disease ◽

National Institutes Of Health ◽

Blood Institute ◽

Randomized Controlled ◽

National Heart Lung ◽

Artery Disease

Download Full-text

Association ofEMCNwith Susceptibility to Rheumatoid Arthritis in a Japanese Population

The Journal of Rheumatology ◽

10.3899/jrheum.100263 ◽

2010 ◽

Vol 38 (2) ◽

pp. 221-228 ◽

Cited By ~ 5

Author(s):

KAZUMASA NISHIMOTO ◽

KATSUNORI IKARI ◽

HIROTAKA KANEKO ◽

SO TSUKAHARA ◽

YUTA KOCHI ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Association Studies ◽

Functional Study ◽

Positive Staining ◽

Nucleotide Polymorphisms ◽

Data Set ◽

Nuclear Factors ◽

Validation Set ◽

Synovial Tissues ◽

Stratified Analysis

Objective.Endomucin, an endothelial-specific sialomucin, is thought to facilitate “lymphocyte homing” to synovial tissues, resulting in the major histopathologies of rheumatoid arthritis (RA). We examined the association between RA susceptibility and the gene coding endomucin,EMCN.Methods.Association studies were conducted with 2 DNA sample sets (initial set of 1504 patients, 752 controls; and validation set, 1113 patients, 940 controls) using 6 tag single-nucleotide polymorphisms (SNP) from the Japanese HapMap database. Immunohistochemistry for the expression of endomucin was conducted with synovial tissues from 4 patients with RA during total knee arthroplasty. Electromobility shift assays were performed for the functional study of identified polymorphisms.Results.Within the initial sample set, the strongest evidence of an association with RA susceptibility was SNP rs3775369 (OR 1.20, p = 0.0075). While the subsequent replication study did not initially confirm the observed significant association (OR 1.13, p = 0.062), an in-depth stratified analysis revealed significant association in patients testing positive to anti-cyclic citrullinated peptide (anti-CCP) antibody in the replication data set (OR 1.15, p = 0.044). Investigating 2 sample sets, significant associations were detected in overall and stratified samples with anti-CCP antibody status (OR 1.17, p = 0.0015). Positive staining for endomucin was detected in all patients. The allele associated with RA susceptibility had a higher binding affinity for HEK298-derived nuclear factors compared to the nonsusceptible allelic variant of rs3775369.Conclusion.A significant association betweenEMCNand RA susceptibility was detected in our Japanese study population. TheEMCNallele conferring RA susceptibility may also contribute to the pathogenesis of RA.

Download Full-text

Outcome and Profile of Women and Men Presenting With Acute Coronary Syndromes: A Report From TIMI IIIB fn1fn1The TIMI IIIB Clinical Centers were supported by Grant R01-HL42311, the Central Units by Grant R01-HL42419 and the Data Coordinating Center by Grant R01-HL42428 from the National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland.

Journal of the American College of Cardiology ◽

10.1016/s0735-1097(97)00107-1 ◽

1997 ◽

Vol 30 (1) ◽

pp. 141-148 ◽

Cited By ~ 166

Author(s):

Judith S. Hochman ◽

Carolyn H. McCabe ◽

Peter H. Stone ◽

Richard C. Becker ◽

Christopher P. Cannon ◽

...

Keyword(s):

Acute Coronary Syndromes ◽

National Institutes Of Health ◽

Blood Institute ◽

National Heart ◽

National Heart Lung ◽

Coronary Syndromes

Download Full-text

Funding Avenues for Research in Emergency Medicine at the National Institutes of Health and the National Heart, Lung, and Blood Institute

Academic Emergency Medicine ◽

10.1111/j.1553-2712.1996.tb03421.x ◽

1996 ◽

Vol 3 (3) ◽

pp. 202-204 ◽

Cited By ~ 4

Author(s):

Denise G. Simons-Morton

Keyword(s):

Emergency Medicine ◽

National Institutes Of Health ◽

Blood Institute ◽

National Heart ◽

National Heart Lung ◽

Research In Emergency Medicine

Download Full-text

Climate, snow, and soil moisture data set for the Tuolumne and Merced river watersheds, California, USA

Earth System Science Data ◽

10.5194/essd-11-101-2019 ◽

2019 ◽

Vol 11 (1) ◽

pp. 101-110 ◽

Cited By ~ 1

Author(s):

James W. Roche ◽

Robert Rice ◽

Xiande Meng ◽

Daniel R. Cayan ◽

Michael D. Dettinger ◽

...

Keyword(s):

Soil Moisture ◽

Sierra Nevada ◽

Land Surface ◽

Forest Canopy ◽

Snow Water Equivalent ◽

Process Models ◽

Series Data ◽

Climate Data ◽

Data Set ◽

Data Repositories

Abstract. We present hourly climate data to force land surface process models and assessments over the Merced and Tuolumne watersheds in the Sierra Nevada, California, for the water year 2010–2014 period. Climate data (38 stations) include temperature and humidity (23), precipitation (13), solar radiation (8), and wind speed and direction (8), spanning an elevation range of 333 to 2987 m. Each data set contains raw data as obtained from the source (Level 0), data that are serially continuous with noise and nonphysical points removed (Level 1), and, where possible, data that are gap filled using linear interpolation or regression with a nearby station record (Level 2). All stations chosen for this data set were known or documented to be regularly maintained and components checked and calibrated during the period. Additional time-series data included are available snow water equivalent records from automated stations (8) and manual snow courses (22), as well as distributed snow depth and co-located soil moisture measurements (2–6) from four locations spanning the rain–snow transition zone in the center of the domain. Spatial data layers pertinent to snowpack modeling in this data set are basin polygons and 100 m resolution rasters of elevation, vegetation type, forest canopy cover, tree height, transmissivity, and extinction coefficient. All data are available from online data repositories (https://doi.org/10.6071/M3FH3D).

Download Full-text

Whip: Communicate and Test What to Expect from Data

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25317 ◽

2018 ◽

Vol 2 ◽

pp. e25317

Author(s):

Stijn Van Hoey ◽

Peter Desmet

Keyword(s):

Data Quality ◽

Controlled Vocabulary ◽

Published Data ◽

Data Set ◽

Data Publication ◽

Use Of Data ◽

Maximum Utility ◽

Group 2 ◽

Publication Guidelines ◽

Available Information

The ability to communicate and assess the quality and fitness for use of data is crucial to ensure maximum utility and re-use. Data consumers have certain requirements for the data they seek and need to be able to check if a data set conforms with these requirements. Data publishers aim to provide data with the highest possible quality and need to be able to identify potential errors that can be addressed with the available information at hand. The development and adoption of data publication guidelines is one approach to define and meet those requirements. However, the use of a guideline, the mapping decisions, and the requirements a dataset is expected to meet, are generally not communicated with the provided data. Moreover, these guidelines are typically intended for humans only. In this talk, we will present 'whip': a proposed syntax for data specifications. With whip, one can define column-based constraints for tabular (tidy) data using a number of rules, e.g. how data is structured following Darwin Core, how a term uses controlled vocabulary values, or what the expected minimum and maximum values are. These rules are human- and machine-readable, which communicates the specifications, and allows to automatically validate those in pipelines for data publication and quality assessment, such as Kurator. Whip can be formatted as a (yaml) text file that can be provided with the published data, communicating the specifications a dataset is expected to meet. The scope of these specifications can be specific to a dataset, but can also be used to express expected data quality and fitness for use of a publisher, consumer or community, allowing bottom-up and top-down adoption. As such, these specifications are complementary to the core set of data quality tests as currently under development by the TDWG Biodiversity Data Quality Task 2 Group 2. Whip rules are currently generic, but more specific ones can be defined to address requirements for biodiversity information.

Download Full-text