scholarly journals A System for Phenotype Harmonization in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program

Author(s):  
Adrienne M Stilp ◽  
Leslie S Emery ◽  
Jai G Broome ◽  
Erin J Buth ◽  
Alyna T Khan ◽  
...  

Abstract Genotype-phenotype association studies often combine phenotype data from multiple studies to increase power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data sharing mechanisms. This system was developed for the National Heart, Lung and Blood Institute’s Trans-Omics for Precision Medicine program, which is generating genomic and other omics data for >80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants from up to 17 studies per phenotype (participants recruited 1948-2012). We discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled-access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include (1) the code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify or extend these harmonizations to additional studies; and (2) results of labeling thousands of phenotype variables with controlled vocabulary terms.

2020 ◽  
Author(s):  
Adrienne M. Stilp ◽  
Leslie S. Emery ◽  
Jai G. Broome ◽  
Erin J. Buth ◽  
Alyna T. Khan ◽  
...  

Genotype-phenotype association studies often combine phenotype data from multiple studies to increase power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data sharing mechanisms. This system was developed for the National Heart, Lung and Blood Institute’s Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other omics data for >80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants from up to 17 TOPMed studies per phenotype. We discuss the challenges faced in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled-access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include (1) the code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify or extend these harmonizations to additional studies; and (2) results of labeling thousands of phenotype variables with controlled vocabulary terms.


Author(s):  
Jay G. Ronquillo ◽  
William T. Lester

PURPOSE The rapid growth of biomedical data ecosystems has catalyzed research for oncology and precision medicine. We leverage federal cloud-based precision medicine databases and tools to better understand the current landscape of precision medicine and genomic testing for patients with cancer. METHODS Retrospective observational study of genomic testing for patients with cancer in the National Institutes of Health All of Us Research Program, with the cancer cohort defined as having at least two documented or reported cancer diagnoses. RESULTS There were 5,678 (1.8%) All of Us participants in the cancer cohort, with a significant difference between cancer status by age category, sex, race, and ethnicity ( P < .001 for all). There were 295 (5.2%) patients with cancer who received genomic testing compared with 6,734 (2.2%) of noncancer patients, with 752 genomic tests commonly focused on gene mutations (primarily pharmacogenomics), molecular pathology, or clinical cytogenetic reports. CONCLUSION Although not yet ubiquitous, diverse clinical genomic analyses in oncology can set the stage to grow the practice of precision medicine by integrating research patient data repositories, cancer data ecosystems, and biomedical informatics.


Author(s):  
Stephen T. Benedict ◽  
Thomas P. Knight

The hydraulic design of bridges is a discipline that requires a strong measure of engineering judgment. Developing good engineering judgment can take years of experience, and generally increases one project at a time. A supplemental tool that can promote the development of engineering knowledge and judgment is to compile, analyze, and graphically present hydraulic data associated with stream and bridge-design characteristics from previously analyzed bridges. If the data set is sufficiently large, graphs developed from such an effort can provide the engineer with an enhanced picture of stream and bridge-design characteristics, helping them further develop their engineering knowledge and judgment. Furthermore, such graphs can function as project scoping tools and hydraulic-design review tools. Using selected data from approximately 300 bridge-scour studies in South Carolina, previously conducted by the U.S. Geological Survey, and limited hydraulic bridge-design data for approximately 200 bridges in South Carolina, trends in stream and bridge-hydraulic characteristics were evaluated including channel width, floodplain width, flood flow depths, stream slopes, bridge backwater, bridge flow velocity, and bridge lengths. Selected relationships are presented in this paper and should serve as a valuable tool for better understanding stream and bridge-hydraulic characteristics in South Carolina.


Trials ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Adamson S. Muula ◽  
Mina C. Hosseinipour ◽  
Martha Makwero ◽  
Johnstone Kumwenda ◽  
Prosper Lutala ◽  
...  

AbstractThe Malawi College of Medicine and its partners are building non-communicable diseases’ (NCDs’) research capacity through a grant from the National Heart, Lung and Blood Institute (NHLBI) of the National Institutes of Health. Several strategies are being implemented including research mentorship for junior researchers interested to build careers in NCDs’ research. In this article, we present the rationale for and our experiences with this mentorship program over its 2 years of implementation. Lessons learned and the challenges are also shared.


2010 ◽  
Vol 38 (2) ◽  
pp. 221-228 ◽  
Author(s):  
KAZUMASA NISHIMOTO ◽  
KATSUNORI IKARI ◽  
HIROTAKA KANEKO ◽  
SO TSUKAHARA ◽  
YUTA KOCHI ◽  
...  

Objective.Endomucin, an endothelial-specific sialomucin, is thought to facilitate “lymphocyte homing” to synovial tissues, resulting in the major histopathologies of rheumatoid arthritis (RA). We examined the association between RA susceptibility and the gene coding endomucin,EMCN.Methods.Association studies were conducted with 2 DNA sample sets (initial set of 1504 patients, 752 controls; and validation set, 1113 patients, 940 controls) using 6 tag single-nucleotide polymorphisms (SNP) from the Japanese HapMap database. Immunohistochemistry for the expression of endomucin was conducted with synovial tissues from 4 patients with RA during total knee arthroplasty. Electromobility shift assays were performed for the functional study of identified polymorphisms.Results.Within the initial sample set, the strongest evidence of an association with RA susceptibility was SNP rs3775369 (OR 1.20, p = 0.0075). While the subsequent replication study did not initially confirm the observed significant association (OR 1.13, p = 0.062), an in-depth stratified analysis revealed significant association in patients testing positive to anti-cyclic citrullinated peptide (anti-CCP) antibody in the replication data set (OR 1.15, p = 0.044). Investigating 2 sample sets, significant associations were detected in overall and stratified samples with anti-CCP antibody status (OR 1.17, p = 0.0015). Positive staining for endomucin was detected in all patients. The allele associated with RA susceptibility had a higher binding affinity for HEK298-derived nuclear factors compared to the nonsusceptible allelic variant of rs3775369.Conclusion.A significant association betweenEMCNand RA susceptibility was detected in our Japanese study population. TheEMCNallele conferring RA susceptibility may also contribute to the pathogenesis of RA.


2019 ◽  
Vol 11 (1) ◽  
pp. 101-110 ◽  
Author(s):  
James W. Roche ◽  
Robert Rice ◽  
Xiande Meng ◽  
Daniel R. Cayan ◽  
Michael D. Dettinger ◽  
...  

Abstract. We present hourly climate data to force land surface process models and assessments over the Merced and Tuolumne watersheds in the Sierra Nevada, California, for the water year 2010–2014 period. Climate data (38 stations) include temperature and humidity (23), precipitation (13), solar radiation (8), and wind speed and direction (8), spanning an elevation range of 333 to 2987 m. Each data set contains raw data as obtained from the source (Level 0), data that are serially continuous with noise and nonphysical points removed (Level 1), and, where possible, data that are gap filled using linear interpolation or regression with a nearby station record (Level 2). All stations chosen for this data set were known or documented to be regularly maintained and components checked and calibrated during the period. Additional time-series data included are available snow water equivalent records from automated stations (8) and manual snow courses (22), as well as distributed snow depth and co-located soil moisture measurements (2–6) from four locations spanning the rain–snow transition zone in the center of the domain. Spatial data layers pertinent to snowpack modeling in this data set are basin polygons and 100 m resolution rasters of elevation, vegetation type, forest canopy cover, tree height, transmissivity, and extinction coefficient. All data are available from online data repositories (https://doi.org/10.6071/M3FH3D).


2018 ◽  
Vol 2 ◽  
pp. e25317
Author(s):  
Stijn Van Hoey ◽  
Peter Desmet

The ability to communicate and assess the quality and fitness for use of data is crucial to ensure maximum utility and re-use. Data consumers have certain requirements for the data they seek and need to be able to check if a data set conforms with these requirements. Data publishers aim to provide data with the highest possible quality and need to be able to identify potential errors that can be addressed with the available information at hand. The development and adoption of data publication guidelines is one approach to define and meet those requirements. However, the use of a guideline, the mapping decisions, and the requirements a dataset is expected to meet, are generally not communicated with the provided data. Moreover, these guidelines are typically intended for humans only. In this talk, we will present 'whip': a proposed syntax for data specifications. With whip, one can define column-based constraints for tabular (tidy) data using a number of rules, e.g. how data is structured following Darwin Core, how a term uses controlled vocabulary values, or what the expected minimum and maximum values are. These rules are human- and machine-readable, which communicates the specifications, and allows to automatically validate those in pipelines for data publication and quality assessment, such as Kurator. Whip can be formatted as a (yaml) text file that can be provided with the published data, communicating the specifications a dataset is expected to meet. The scope of these specifications can be specific to a dataset, but can also be used to express expected data quality and fitness for use of a publisher, consumer or community, allowing bottom-up and top-down adoption. As such, these specifications are complementary to the core set of data quality tests as currently under development by the TDWG Biodiversity Data Quality Task 2 Group 2. Whip rules are currently generic, but more specific ones can be defined to address requirements for biodiversity information.


Sign in / Sign up

Export Citation Format

Share Document