scholarly journals Defining Phenotypes from Clinical Data to Drive Genomic Research

2018 ◽  
Vol 1 (1) ◽  
pp. 69-92 ◽  
Author(s):  
Jamie R. Robinson ◽  
Wei-Qi Wei ◽  
Dan M. Roden ◽  
Joshua C. Denny

The rise in available longitudinal patient information in electronic health records (EHRs) and their coupling to DNA biobanks have resulted in a dramatic increase in genomic research using EHR data for phenotypic information. EHRs have the benefit of providing a deep and broad data source of health-related phenotypes, including drug response traits, expanding the phenomes available to researchers for discovery. The earliest efforts at repurposing EHR data for research involved manual chart review of limited numbers of patients but now typically involve applications of rule-based and machine learning algorithms operating on sometimes huge corpora for both genome-wide and phenome-wide approaches. In this review, we highlight the current methods, impact, challenges, and opportunities for repurposing clinical data to define patient phenotypes for genomic discovery. Use of EHR data has proven a powerful method for elucidating genomic influences on diseases, traits, and drug-response phenotypes and will continue to have increasing applications in large cohort studies.

2019 ◽  
Author(s):  
Laura V. Milko ◽  
Flavia Chen ◽  
Kee Chan ◽  
Amy M. Brower ◽  
Pankaj B. Agrawal ◽  
...  

ABSTRACTThe National Institutes of Health (NIH) funded the Newborn Sequencing In Genomic medicine and public HealTh (NSIGHT) Consortium to investigate the implications, challenges and opportunities associated with the possible use of genomic sequence information in the newborn period. Following announcement of the NSIGHT awardees in 2013, the Food and Drug Administration (FDA) contacted investigators and requested that pre-submissions to investigational device exemptions (IDE) be submitted for the use of genomic sequencing under Title 21 of the Code of Federal Regulations (21 CFR) part 812. IDE regulation permits clinical investigation of medical devices that have not been approved by the FDA. To our knowledge, this marked the first time the FDA determined that NIH-funded clinical genomic research projects are subject to IDE regulation. Here we review the history of and rationale behind FDA oversight of clinical research and the NSIGHT Consortium’s experiences in navigating the IDE process. Overall, NSIGHT investigators found that FDA’s application of existing IDE regulations and medical device definitions aligned imprecisely with the aims of publicly funded exploratory clinical research protocols. IDE risk assessments by the FDA were similar to, but distinct from, protocol risk assessments conducted by local Institutional Review Boards (IRBs), and had the potential to reflect novel oversight of emerging genomic technologies. However, the pre-IDE and IDE process delayed the start of NSIGHT research studies by an average of 10 months, and significantly limited the scope of investigation in two of the four NIH approved projects. Based on the experience of the NSIGHT Consortium, we conclude that policies and practices governing the development and use of novel genomic technologies in clinical research urgently need clarification in order to mitigate potentially conflicting or redundant oversight by IRBs, NIH, FDA, and state authorities.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ricardo José Gonzaga Pimenta ◽  
Alexandre Hild Aono ◽  
Roberto Carlos Villavicencio Burbano ◽  
Alisson Esdras Coutinho ◽  
Carla Cristina da Silva ◽  
...  

AbstractSugarcane yellow leaf (SCYL), caused by the sugarcane yellow leaf virus (SCYLV) is a major disease affecting sugarcane, a leading sugar and energy crop. Despite damages caused by SCYLV, the genetic base of resistance to this virus remains largely unknown. Several methodologies have arisen to identify molecular markers associated with SCYLV resistance, which are crucial for marker-assisted selection and understanding response mechanisms to this virus. We investigated the genetic base of SCYLV resistance using dominant and codominant markers and genotypes of interest for sugarcane breeding. A sugarcane panel inoculated with SCYLV was analyzed for SCYL symptoms, and viral titer was estimated by RT-qPCR. This panel was genotyped with 662 dominant markers and 70,888 SNPs and indels with allele proportion information. We used polyploid-adapted genome-wide association analyses and machine-learning algorithms coupled with feature selection methods to establish marker-trait associations. While each approach identified unique marker sets associated with phenotypes, convergences were observed between them and demonstrated their complementarity. Lastly, we annotated these markers, identifying genes encoding emblematic participants in virus resistance mechanisms and previously unreported candidates involved in viral responses. Our approach could accelerate sugarcane breeding targeting SCYLV resistance and facilitate studies on biological processes leading to this trait.


2019 ◽  
Author(s):  
Edward W Huang ◽  
Ameya Bhope ◽  
Jing Lim ◽  
Saurabh Sinha ◽  
Amin Emad

ABSTRACTPrediction of clinical drug response (CDR) of cancer patients, based on their clinical and molecular profiles obtained prior to administration of the drug, can play a significant role in individualized medicine. Machine learning models have the potential to address this issue, but training them requires data from a large number of patients treated with each drug, limiting their feasibility. While large databases of drug response and molecular profiles of preclinical in-vitro cancer cell lines (CCLs) exist for many drugs, it is unclear whether preclinical samples can be used to predict CDR of real patients.We designed a systematic approach to evaluate how well different algorithms, trained on gene expression and drug response of CCLs, can predict CDR of patients. Using data from two large databases, we evaluated various linear and non-linear algorithms, some of which utilized information on gene interactions. Then, we developed a new algorithm called TG-LASSO that explicitly integrates information on samples’ tissue of origin with gene expression profiles to improve prediction performance. Our results showed that regularized regression methods provide significantly accurate prediction. However, including the network information or common methods of including information on the tissue of origin did not improve the results. On the other hand, TG-LASSO improved the predictions and distinguished resistant and sensitive patients for 7 out of 13 drugs. Additionally, TG-LASSO identified genes associated with the drug response, including known targets and pathways involved in the drugs’ mechanism of action. Moreover, genes identified by TG-LASSO for multiple drugs in a tissue were associated with patient survival. In summary, our analysis suggests that preclinical samples can be used to predict CDR of patients and identify biomarkers of drug sensitivity and survival.AUTHOR SUMMARYCancer is among the leading causes of death globally and perdition of the drug response of patients to different treatments based on their clinical and molecular profiles can enable individualized cancer medicine. Machine learning algorithms have the potential to play a significant role in this task; but, these algorithms are designed based the premise that a large number of labeled training samples are available, and these samples are accurate representation of the profiles of real tumors. However, due to ethical and technical reasons, it is not possible to screen humans for many drugs, significantly limiting the size of training data. To overcome this data scarcity problem, machine learning models can be trained using large databases of preclinical samples (e.g. cancer cell line cultures). However, due to the major differences between preclinical samples and real tumors, it is unclear how accurately such preclinical-to-clinical computational models can predict the clinical drug response of cancer patients.Here, first we systematically evaluate a variety of different linear and nonlinear machine learning algorithms for this particular task using two large databases of preclinical (GDSC) and tumor samples (TCGA). Then, we present a novel method called TG-LASSO that utilizes a new approach for explicitly incorporating the tissue of origin of samples in the prediction task. Our results show that TG-LASSO outperforms all other algorithms and can accurately distinguish resistant and sensitive patients for the majority of the tested drugs. Follow-up analysis reveal that this method can also identify biomarkers of drug sensitivity in each cancer type.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2031 ◽  
Author(s):  
Israel Amirav ◽  
Mary Roduta Roberts ◽  
Huda Mussaffi ◽  
Avigdor Mandelberg ◽  
Yehudah Roth ◽  
...  

Rationale: Primary ciliary dyskinesia (PCD) is under diagnosed and underestimated. Most clinical research has used some form of questionnaires to capture data but none has been critically evaluated particularly with respect to its end-user feasibility and utility. Objective: To critically appraise a clinical data collection questionnaire for PCD used in a large national PCD consortium in order to apply conclusions in future PCD research. Methods: We describe the development, validation and revision process of a clinical questionnaire for PCD and its evaluation during a national clinical PCD study with respect to data collection and analysis, initial completion rates and user feedback. Results: 14 centers participating in the consortium successfully completed the revised version of the questionnaire for 173 patients with various completion rates for various items. While content and internal consistency analysis demonstrated validity, there were methodological deficiencies impacting completion rates and end-user utility. These deficiencies were addressed resulting in a more valid questionnaire. Conclusions: Our experience may be useful for future clinical research in PCD. Based on the feedback collected on the questionnaire through analysis of completion rates, judgmental analysis of the content, and feedback from experts and end users, we suggest a practicable framework for development of similar tools for various future PCD research.


2021 ◽  
Author(s):  
Moataz Dowaidar

The discovery of a genome-wide correlation with obesity-related genes hasrevealed new information about the genetics of obesity. Given the lowproportion of obesity heritability explained by available SNPs, it's not shockingthat these SNPs aren't scientifically effective as methods for assessing whowould acquire obesity. The roles of the majority of loci, the majority of whichmap to non-coding sequences, will take thorough analysis to determine theresponsible gene at each locus, which may not be the closest gene. Thismechanistic information, as well as the resulting elucidation of thepathophysiology of obesity, will allow the creation of new therapies, whichcould be the primary advantage of these genetic discoveries.Fortunately, a lack of mechanistic information hasn't stopped researchers fromusing SNPs and genetic risk ratings to shed light on how obesity biologyinteracts with environmental and lifestyle influences. These findings suggestthat an unhealthy lifestyle may amplify the genetic risk of obesity, despite thefact that environmental studies of obesity genes may be distorted byinaccuracies in diet and physical activity measurement. More research isrequired to confirm this theory and to identify the specific dietary components(such as sugar-sweetened beverages) that interfere with genetic variants. Thisstudy could contribute to personalized obesity prevention and care measures inthe future (pending confirmation in clinical trials of genetic-risk-guidedinterventions). Obesity genetics has offered researchers the opportunity toexamine causal interactions between obesity and its various possiblecomplications. However, since the majority of the studies discussed above wereconducted on people of European ethnicity, more research is required inminority ethnic groups with a high risk of obesity to understand the role ofbiology, climate, and relationships among these factors in explaining theirincreased risk.


2020 ◽  
Vol 20 (1_suppl) ◽  
pp. 31S-39S ◽  
Author(s):  
Jana E. Jones ◽  
Miya R. Asato ◽  
Mesha-Gay Brown ◽  
Julia L. Doss ◽  
Elizabeth A. Felton ◽  
...  

Epilepsy represents a complex spectrum disorder, with patients sharing seizures as a common symptom and manifesting a broad array of additional clinical phenotypes. To understand this disorder and treat individuals who live with epilepsy, it is important not only to identify pathogenic mechanisms underlying epilepsy but also to understand their relationships with other health-related factors. Benchmarks Area IV focuses on the impact of seizures and their treatment on quality of life, development, cognitive function, and other aspects and comorbidities that often affect individuals with epilepsy. Included in this review is a discussion on sudden unexpected death in epilepsy and other causes of mortality, a major area of research focus with still many unanswered questions. We also draw attention to special populations, such as individuals with nonepileptic seizures and pregnant women and their offspring. In this study, we review the progress made in these areas since the 2016 review of the Benchmarks Area IV and discuss challenges and opportunities for future study.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Carly A. Rodriguez ◽  
Emiliano Valle ◽  
Jerome Galea ◽  
Milagros Wong ◽  
Lenka Kolevic ◽  
...  

Abstract Background The global HIV burden among adolescents ages 10–19 is growing. This population concurrently confronts the multifaceted challenges of adolescence and living with HIV. With the goal of informing future interventions tailored to this group, we assessed sexual activity, HIV diagnosis disclosure, combination antiretroviral therapy (cART) adherence, and drug use among adolescents living with HIV (ALHIV) in Lima, Peru. Methods Adolescents at risk or with a history of suboptimal cART adherence completed a self-administered, health behaviors survey and participated in support group sessions, which were audio recorded and used as a qualitative data source. Additionally, we conducted in-depth interviews with caregivers and care providers of ALHIV. Thematic content analysis was performed on the group transcripts and in-depth interviews and integrated with data from the survey to describe adolescents’ health related behaviors. Results We enrolled 34 ALHIV, of which 32 (14 male, 18 female, median age 14.5 years) completed the health behavior survey. Nine (28%) adolescents reported prior sexual intercourse, a minority of whom (44%) reported using a condom. cART adherence was highest in the 10–12 age group with 89% reporting ≤2 missed doses in the last month, compared to 36% in adolescents 13 years or older. Over 80% of adolescents had never disclosed their HIV status to a friend or romantic partner. Adolescents, caregivers, and health service providers described sexual health misinformation and difficulty having conversations about sexual health and HIV. Conclusions In this group of ALHIV, adherence to cART declined with age and condom use among sexually active adolescents was low. Multifactorial interventions addressing sexual health, gaps in HIV-related knowledge, and management of disclosure and romantic relationships are urgently needed for this population.


Sign in / Sign up

Export Citation Format

Share Document