scholarly journals Predicting a diagnosis of ankylosing spondylitis using primary care health records – a machine learning approach

Author(s):  
Jonathan Kennedy ◽  
Natasha Kennedy ◽  
Roxanne Cooksey ◽  
Ernest Choy ◽  
Stefan Siebert ◽  
...  

AbstractAnkylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are likely to be given a diagnosis of AS in future.The Secure Anonymised Information Linkage databank was used. Patients with ankylosing spondylitis were identified using their routine data and matched with controls who had no record of a diagnosis of ankylosing spondylitis or axial spondyloarthritis. Data was analysed separately for men and women. The model was developed using feature/variable selection and principal component analysis to develop decision trees. The decision tree with the highest average F value was selected and validated with a test dataset.The model for men indicated that lower back pain, uveitis, and NSAID use under age 20 is associated with AS development. The model for women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications. The models showed good prediction (positive predictive value 70%-80%) in test data but in the general population where prevalence is very low (0.09% of the population in this dataset) the positive predictive value would be very low (0.33%-0.25%).Machine learning can be used to help profile and understand the characteristics of people who will develop AS, and in test datasets with artificially high prevalence, will perform well. However, when applied to a general population with low prevalence rates, such as that in primary care, the positive predictive value for even the best model would be 1.4%. Multiple models may be needed to narrow down the population over time to improve the predictive value and therefore reduce the time to diagnosis of ankylosing spondylitis.

Author(s):  
Sylvia Aponte-Hao ◽  
Bria Mele ◽  
Dave Jackson ◽  
Alan Katz ◽  
Charles Leduc ◽  
...  

IntroductionFrailty is a geriatric syndrome that is predictive of heightened vulnerability for disability, hospitalization, and mortality. Annually an estimated 250,000 frail Canadians die, and this estimate is expected to double in the next 40 years, as Canadians grow older. Currently there is no single accepted clinical definition of frailty. Objectives and ApproachThe objective of this study was to develop an operational definition of frailty using machine learning that can be applied to a primary care electronic medical record (EMR) database. The Canadian Primary Care Sentinel Surveillance Network (CPCSSN) is a pan-Canadian network of primary care practices that collect de-identified patient information (such as encounter diagnoses, health conditions, and laboratory data) from EMRs. 780 patients from CPCSSN have were randomly selected and assessed by physicians using the Rockwood Clinical Frailty Scale (as frail or not frail), and their clinical characteristics from CPCSSN used to develop the definition using machine-learning. ResultsA total of 8,044 clinical features were extracted from these tables: billing, problem list, encounter diagnosis, labs, medications and referrals. A chi-squared automatic interaction detector (CHAID) approach was selected as the best approach. The bootstrapping process used a cost matrix that prioritized high sensitivity and positive predictive value. 10-fold cross validation was used for validity measures. Key features factored into the algorithm included: diagnosis of dementia (ICD-9 code 290), medications furosemide and vitamins, and use of key word “obstruction” within the billing table. The validation measures with 95% confidence intervals are as follows: sensitivity of 28% (95% CI: 21% to 36%), specificity of 94% (95% CI: 93% to 96%), positive predictive value of 53% (95% CI: 42% to 64%), negative predictive value of 86% (95% CI: 83% to 88%). Conclusion/ImplicationsNo other primary care specific frailty screening tools have sufficient validity. These results suggest heterogeneous diseases require clearly defined features and potentially more sophisticated algorithms to account for heterogeneity. Further research utilizing continuous features and continuous frailty scores may be more suitable in the creation of a case detection algorithm.


2019 ◽  
Author(s):  
Rayees Rahman ◽  
Arad Kodesh ◽  
Stephen Z Levine ◽  
Sven Sandin ◽  
Abraham Reichenberg ◽  
...  

AbstractImportanceCurrent approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, where most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome.ObjectiveDevelop a machine learning (ML) method predicting the diagnosis of ASD in offspring in a general population sample, using parental electronic medical records (EMR) available before childbirthDesignPrognostic study of EMR data within a single Israeli health maintenance organization, for the parents of 1,397 ASD children (ICD-9/10), and 94,741 non-ASD children born between January 1st, 1997 through December 31st, 2008. The complete EMR record of the parents was used to develop various ML models to predict the risk of having a child with ASD.Main outcomes and measuresRoutinely available parental sociodemographic information, medical histories and prescribed medications data until offspring’s birth were used to generate features to train various machine learning algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross validation, by computing C statistics, sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value, PPV).ResultsAll ML models tested had similar performance, achieving an average C statistics of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85% for predicting ASD in this dataset.Conclusion and relevanceML algorithms combined with EMR capture early life ASD risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.Key pointsQuestionCan autism risk in children be predicted using the pre-birth electronic medical record (EMR) of the parents?FindingsIn this population-based study that included 1,397 children with autism spectrum disorder (ASD) and 94,741 non-ASD children, we developed a machine learning classifier for predicting the likelihood of childhood diagnosis of ASD with an average C statistic of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85%.MeaningThe results presented serve as a proof-of-principle of the potential utility of EMR for the identification of a large proportion of future children at a high-risk of ASD.


2018 ◽  
Vol 23 (suppl_1) ◽  
pp. e37-e37
Author(s):  
Vinusha Gunaseelan ◽  
Patricia Parkin ◽  
Imaan Bayoumi ◽  
Patricia Jiang ◽  
Alexandra Medline ◽  
...  

Abstract BACKGROUND The Canadian Paediatric Society (CPS) recommends that every Canadian physician caring for young children provide an enhanced 18-month well-baby visit including the use of a developmental screening tool, such as the Nipissing District Developmental Screen (NDDS). The Province of Ontario implemented an enhanced 18-month well-baby visit specifically emphasizing the NDDS, which is now widely used in Ontario primary care. However, the diagnostic accuracy of the NDDS in identifying early developmental delays in real-world clinical settings is unknown. OBJECTIVES To assess the predictive validity of the NDDS in primary care for identifying developmental delay and prompting a specialist referral at the 18-month health supervision visit. DESIGN/METHODS This was a prospective longitudinal cohort study enrolling healthy children from primary care practices. Parents completed the 18-month NDDS during their child’s scheduled health supervision visit between January 2012 and February 2015. Using a standardized data collection form, research personnel abstracted data from the child’s health records regarding the child’s developmental outcomes following the 18-month assessment. Data collected included confirmed diagnoses of a development delay, specialist referrals, family history, and interventions. Research personnel were blind to the results of the NDDS. We assessed the diagnostic test properties of the NDDS with a confirmed diagnosis of developmental delay as the criterion measure. The specificity, sensitivity, positive predictive value, and negative predictive value were calculated, with 95% confidence intervals. RESULTS We included 255 children with a mean age of 18.5 months (range, 17.5–20.6) and 139 (55%) were male. 102 (40%) screened positive (1+ flag result on their NDDS). A total of 48 (19%) children were referred, and 23 (9%) had a confirmed diagnosis of a developmental delay (speech and language: 14; gross motor: 4; autism spectrum disorder: 3; global developmental delay: 1; developmental delay: 1). The sensitivity was 74% (95% CI: 52–90%), specificity was 63% (95% CI: 57–70%), positive predictive value was 17% (95% CI:10–25%), and the negative predictive value was 96% (95% CI: 92–99%). CONCLUSION For developmental screening tools, sensitivity between 70%-80% and specificity of 80% have been suggested. The NDDS has moderate sensitivity and specificity in identifying developmental delay at the 18-month health supervision visit. The 1+NDDS flag cut-point may lead to overdiagnosis with more children with typical development being referred, leading to longer wait times for specialist referrals among children in need. Future work includes investigating the diagnostic accuracy of combining the NDDS with other screening tools.


Author(s):  
O. Uspenskaya-Cadoz ◽  
C. Alamuri ◽  
L. Wang ◽  
M. Yang ◽  
S. Khinda ◽  
...  

Background: Recruiting patients for clinical trials of potential therapies for Alzheimer’s disease (AD) remains a major challenge, with demand for trial participants at an all-time high. The AD treatment R&D pipeline includes around 112 agents. In the United States alone, 150 clinical trials are seeking 70,000 participants. Most people with early cognitive impairment consult primary care providers, who may lack time, diagnostic skills and awareness of local clinical trials. Machine learning and predictive analytics offer promise to boost enrollment by predicting which patients have prodromal AD, and which will go on to develop AD. Objectives: The authors set out to develop a machine learning predictive model that identifies prodromal AD patients in the general population, to aid early AD detection by primary care physicians and timely referral to expert sites for biomarker confirmation of diagnosis and clinical trial enrollment. Design: The authors use a classification machine learning algorithm to extract patterns within healthcare claims and prescription data three years prior to AD diagnosis/AD drug initiation. Setting: The study focused on subjects included within proprietary IQVIA US data assets (claims and prescription databases). Patient information was extracted from January 2010 to July 2018, for cohorts aged between 50 and 85 years. Participants: A total of 88,298,289 subjects aged between 50 and 85 years were identified. For the positive cohort, 667,288 subjects were identified who had 24 months of medical history and at least one record with AD or AD treatment. For the negative cohort, 3,670,254 patients were selected who had a similar length of medical history and who were matched to positive cohort subjects based on the prevalence rate. The scoring cohort was selected based on availability of recent medical data of 2-5 years and included 72,670,283 subjects between the ages of 50 and 85 years. Intervention (if any): None. Measurements: A list of clinically-relevant and interpretable predictors was generated and extracted from the data sets for each subject, including pharmacological treatments (NDC/product), office/specialist visits (specialty), tests and procedures (HCPCS and CPT), and diagnosis (ICD). The positive cohort was defined as patients who have AD diagnosis/AD treatment with a 3 years offset as an estimate for prodromal AD diagnosis. Supervised ML techniques were used to develop algorithms to predict the occurrence of prodromal AD cases. The sample dataset was divided randomly into a training dataset and a test dataset. The classification models were trained and executed in the PySpark framework. Training and evaluation of LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, and GBTClassifier were executed using PySpark’s mllib module. The area under the precision-recall curve (AUCPR) was used to compare the results of the various models. Results: The AUCPRs are 0.426, 0.157, 0.436, and 0.440 for LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, and GBTClassifier, respectively, meaning that GBTClassifier (Gradient Boosted Tree) outperforms the other three classifiers. The GBT model identified 222,721 subjects in the prodromal AD stage with 80% precision. Some 76% of identified prodromal AD patients were in the primary care setting. Conclusions: Applying the developed predictive model to 72,670,283 U.S. residents, 222,721 prodromal AD patients were identified, the majority of whom were in the primary care setting. This could drive major advances in AD research by enabling more accurate and earlier prodromal AD diagnosis at the primary care physician level , which would facilitate timely referral to expert sites for in-depth assessment and potential enrolment in clinical trials.


2016 ◽  
Vol 76 (1) ◽  
pp. 119-125 ◽  
Author(s):  
Aase Haj Hensvold ◽  
Thomas Frisell ◽  
Patrik K E Magnusson ◽  
Rikard Holmdahl ◽  
Johan Askling ◽  
...  

ObjectiveAnti-citrullinated protein antibodies (ACPA) are highly specific for rheumatoid arthritis (RA), but the diagnostic accuracy of ACPA in the general population has not been thoroughly assessed. We aimed to assess the diagnostic accuracy of ACPA for RA in the general population and to further characterise the citrullinated peptide recognition pattern.MethodsSerum samples from a large population-representative twin cohort consisting of 12 590 individuals were analysed for the presence of ACPA using anti-CCP2 ELISA. All ACPA-positive samples were further tested on ELISAs for four peptide-specific ACPA. RA cases were identified by linkage to the Swedish National Patient Register at inclusion and after a median follow-up of 37 months (IQR 31–49).Results350 out of 12 590 individuals had a positive anti-CCP2 test, measuring ACPA. Of these, 103 had an RA diagnosis at the time of blood donation and inclusion. During a median follow-up of 3 years, an additional 21 of the remaining 247 ACPA-positive individuals developed RA. Overall, a positive anti-CCP2 test had a positive predictive value of 29% for prevalent RA at inclusion (negative predictive value of 99.6%). High titres (>3× cut-off) of anti-CCP2 increased the positive predictive value to 48% (negative predictive value of 99.5%). ACPA-positive individuals without RA had lower anti-CCP2 titres and fewer peptide-specific ACPA than ACPA-positive patients with RA and higher C reactive protein levels than ACPA-negative individuals without RA.ConclusionPresence of ACPA and especially high titres of anti-CCP2 have a high diagnostic accuracy for an RA diagnosis in a population setting.


2020 ◽  
Author(s):  
Raymond F Palmer ◽  
Carlos Roberto Jaén ◽  
Roger B. Perales ◽  
Rodolfo Rincon ◽  
Jacqueline Viramontes ◽  
...  

Abstract Background: The 50-item Quick Environmental Exposure and Sensitivity Inventory (QEESI) is a validated questionnaire used worldwide to assess intolerances to chemicals, foods, and/or drugs and has become the gold standard for assessing chemical intolerance (CI). Despite a reported prevalence of 8-33%, CI often goes undiagnosed in epidemiological studies and routine primary care. To enhance the QEESI’s utility, we developed the Brief Environmental Exposure and Sensitivity Inventory (BREESI) as a 3-item CI screening instrument. We tested the BREESI’s potential to predict whether an individual is likely to respond adversely to structurally unrelated chemicals, foods, and drugs. Methods: We recruited 286 adult participants from a university-based primary care clinic and through online participation. The positive and negative predictive values of the BREESI items were calculated against the full QEESI scores. Results: 90% of participants answering “yes” to all three items on the BREESI were classified as very suggestive of CI based upon the QEESI chemical intolerance and symptom scores both ≥ 40 (positive predictive value = 90%). For participants endorsing two items, 92% were classified as either very suggestive (39%) or Suggestive (53%) of CI (positive predictive value = 87%). Of those endorsing only one item, only 13% were found to be very suggestive of CI. However, 70% were classified as Suggestive. Of those answering “No” to all of the BREESI items, 99% were classified as not suggestive of CI (i.e., negative predictive value = 99%). Conclusions: The BREESI is a versatile screening tool for rapidly determining potential CI, with clinical and epidemiological applications. Together, the validated BREESI and QEESI provide much needed diagnostic tools that will help inform treatment protocols and teach health care professionals about Toxicant Induced Loss of Tolerance – the mechanism driving CI.


2020 ◽  
Vol 9 (12) ◽  
pp. 6115
Author(s):  
Mohammed AlAteeq ◽  
AbdelelahA Alseraihi ◽  
AbdulazizA Alhussaini ◽  
SultanA Binhasan ◽  
EmadA Ahmari

Author(s):  
Zirui Meng ◽  
Minjin Wang ◽  
Huan Song ◽  
Shuo Guo ◽  
Yanbing Zhou ◽  
...  

ABSTRACTBackgroundCOVID-19 has been spreading globally since emergence, but the diagnostic resources are relatively insufficient.ResultsIn order to effectively relieve the resource deficiency of diagnosing COVID-19, we developed a machine learning-based diagnosis model on basis of laboratory examinations indicators from a total of 620 samples, and subsequently implemented it as a COVID-19 diagnosis aid APP to facilitate promotion.ConclusionsExternal validation showed satisfiable model prediction performance (i.e., the positive predictive value and negative predictive value was 86.35% and 84.62%, respectively), which guarantees the promising use of this tool for extensive screening.


Sign in / Sign up

Export Citation Format

Share Document