scholarly journals Leveraging Genetic Reports and Electronic Health Records for Predicting Primary Cancers Based on FHIR and RDF (Preprint)

2020 ◽  
Author(s):  
Nansu Zong ◽  
Victoria Ngo ◽  
Daniel J. Stone ◽  
Andrew Wen ◽  
Yiqing Zhao ◽  
...  

BACKGROUND Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnose, and treatments. A key research area focuses on early detection of primary cancers and the potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict unknown primaries. METHODS We extracted the genetic data elements from a collection of oncology genetic reports of 1,011 cancer patients, and corresponding phenotypical data from the Mayo Clinic electronic health records (EHRs). We modeled both genetic and EHR data with HL7 Fast Healthcare Interoperability Resources (FHIR). The semantic web Resource Description Framework (RDF) was employed to generate the network-based data representation (i.e., patient-phenotypic-genetic network). Based on RDF data graph, graph embedding algorithm Node2vec was applied to generate features, and then multiple machine learning and deep learning backbone models were adopted for cancer prediction. RESULTS With six machine-learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types and predicting unknown primaries. To demonstrate the interpretability, phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS Accurate prediction of cancer types can be achieved with existing EHR data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnose stage for cancer patients.

BMJ Open ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. e043487
Author(s):  
Hao Luo ◽  
Kui Kai Lau ◽  
Gloria H Y Wong ◽  
Wai-Chi Chan ◽  
Henry K F Mak ◽  
...  

IntroductionDementia is a group of disabling disorders that can be devastating for persons living with it and for their families. Data-informed decision-making strategies to identify individuals at high risk of dementia are essential to facilitate large-scale prevention and early intervention. This population-based case–control study aims to develop and validate a clinical algorithm for predicting dementia diagnosis, based on the cognitive footprint in personal and medical history.Methods and analysisWe will use territory-wide electronic health records from the Clinical Data Analysis and Reporting System (CDARS) in Hong Kong between 1 January 2001 and 31 December 2018. All individuals who were at least 65 years old by the end of 2018 will be identified from CDARS. A random sample of control individuals who did not receive any diagnosis of dementia will be matched with those who did receive such a diagnosis by age, gender and index date with 1:1 ratio. Exposure to potential protective/risk factors will be included in both conventional logistic regression and machine-learning models. Established risk factors of interest will include diabetes mellitus, midlife hypertension, midlife obesity, depression, head injuries and low education. Exploratory risk factors will include vascular disease, infectious disease and medication. The prediction accuracy of several state-of-the-art machine-learning algorithms will be compared.Ethics and disseminationThis study was approved by Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 18-225). Patients’ records are anonymised to protect privacy. Study results will be disseminated through peer-reviewed publications. Codes of the resulted dementia risk prediction algorithm will be made publicly available at the website of the Tools to Inform Policy: Chinese Communities’ Action in Response to Dementia project (https://www.tip-card.hku.hk/).


2021 ◽  
Author(s):  
Nawar Shara ◽  
Kelley M. Anderson ◽  
Noor Falah ◽  
Maryam F. Ahmad ◽  
Darya Tavazoei ◽  
...  

BACKGROUND Healthcare data are fragmenting as patients seek care from diverse sources. Consequently, patient care is negatively impacted by disparate health records. Machine learning (ML) offers a disruptive force in its ability to inform and improve patient care and outcomes [6]. However, the differences that exist in each individual’s health records, combined with the lack of health-data standards, in addition to systemic issues that render the data unreliable and that fail to create a single view of each patient, create challenges for ML. While these problems exist throughout healthcare, they are especially prevalent within maternal health, and exacerbate the maternal morbidity and mortality (MMM) crisis in the United States. OBJECTIVE Maternal patient records were extracted from the electronic health records (EHRs) of a large tertiary healthcare system and made into patient-specific, complete datasets through a systematic method so that a machine-learning-based (ML-based) risk-assessment algorithm could effectively identify maternal cardiovascular risk prior to evidence of diagnosis or intervention within the patient’s record. METHODS We outline the effort that was required to define the specifications of the computational systems, the dataset, and access to relevant systems, while ensuring data security, privacy laws, and policies were met. Data acquisition included the concatenation, anonymization, and normalization of health data across multiple EHRs in preparation for its use by a proprietary risk-stratification algorithm designed to establish patient-specific baselines to identify and establish cardiovascular risk based on deviations from the patient’s baselines to inform early interventions. RESULTS Patient records can be made actionable for the goal of effectively employing machine learning (ML), specifically to identify cardiovascular risk in pregnant patients. CONCLUSIONS Upon acquiring data, including the concatenation, anonymization, and normalization of said data across multiple EHRs, the use of a machine-learning-based (ML-based) tool can provide early identification of cardiovascular risk in pregnant patients. CLINICALTRIAL N/A


Circulation ◽  
2018 ◽  
Vol 137 (suppl_1) ◽  
Author(s):  
Tekeda F Ferguson ◽  
Sunayana Kumar ◽  
Denise Danos

Purpose: In conjunction with women being diagnosed earlier with breast cancer and a rapidly aging population, advances in cancer therapies have swiftly propelled cardiotoxicity as a major health concern for breast cancer patients. Frequent cardiotoxicity outcomes include: reduced left ventricular ejection fraction (LVEF), myocardial infarction, asymptomatic or hospitalized heart failure, arrhythmias, hypertension, and thromboembolism. The purpose of this study was to use an electronic health records system determine if an increased odds of heart disease was present among women with breast cancer. Methods: Data from the Research Action for Health Network (REACHnet) was used for the analysis. REACHnet is a clinical data research network that uses the common data model to extract electronic health records (EHR) from health networks in Louisiana (n=100,000).Women over the age of 30 with data (n=35,455) were included in the analysis. ICD-9 diagnosis codes were used to classify heart disease (HD) (Hypertensive HD, Ischemic HD, Pulmonary HD, and Other HD) and identify breast cancer patients. Additional EHR variables considered were smoking status, and patient vitals. Chi-square tests, crude, and adjusted logistic regression models were computed utilizing SAS 9.4. Results: Utilizing diagnoses codes our study team has estimated 28.6% of women over the age of 30 with a breast cancer diagnosis (n=816) also had a heart disease diagnosis, contrasted with 15.6% of women without a breast cancer diagnosis. Among patients with heart disease, there was no significant difference in the distribution of the type of heart disease diagnoses by breast cancer status (p=0.87). There was a 2.21 (1.89, 2.58) crude odds ratio of having a CVD diagnoses among breast cancer cases when referenced to cancer free women. After adjusting for age (30-49, 50-64, 65+), race (black/white), and comorbidities (obesity/overweight, diabetes, current smoker) there was an increased risk of heart disease (OR: 1.24 (1.05, 1.47)). Conclusion: The short-term and long-term consequences of cardiotoxicity on cancer treatment risk-to-benefit ratio, survivorship issues, and competing causes of mortality are increasingly being acknowledged. Our next efforts will include making advances in predictive risk modeling. Maximizing benefits while reducing cardiac risks needs to become a priority in oncologic management and monitoring for late-term toxic effects.


Author(s):  
Laxmi Kumari Pathak ◽  
Pooja Jha

Chronic kidney disease (CKD) is a disorder in which the kidneys are weakened and become unable to filter blood. It lowers the human ability to remain healthy. The field of biosciences has progressed and produced vast volumes of knowledge from electronic health records. Heart disorders, anemia, bone diseases, elevated potassium, and calcium are the very prevalent complications that arise from kidney failure. Early identification of CKD can improve the quality of life greatly. To achieve this, various machine learning techniques have been introduced so far that use the data in electronic health record (EHR) to predict CKD. This chapter studies various machine learning algorithms like support vector machine, random forest, probabilistic neural network, Apriori, ZeroR, OneR, naive Bayes, J48, IBk (k-nearest neighbor), ensemble method, etc. and compares their accuracy. The study aims in finding the best-suited technique from different methods of machine learning for the early detection of CKD by which medical professionals can interpret model predictions easily.


Sign in / Sign up

Export Citation Format

Share Document