cohort identification Latest Research Papers

The standard approach to expert-in-the-loop machine learning is active learning, where, repeatedly, an expert is asked to annotate one or more records and the machine finds a classifier that respects all annotations made until that point. We propose an alternative approach, IQRef , in which the expert iteratively designs a classifier and the machine helps him or her to determine how well it is performing and, importantly, when to stop, by reporting statistics on a fixed, hold-out sample of annotated records. We justify our approach based on prior work giving a theoretical model of how to re-use hold-out data. We compare the two approaches in the context of identifying a cohort of EHRs and examine their strengths and weaknesses through a case study arising from an optometric research problem. We conclude that both approaches are complementary, and we recommend that they both be employed in conjunction to address the problem of cohort identification in health research.

Download Full-text

Diagnostic accuracy of ICD code versus discharge summary-based query for endocarditis cohort identification

Medicine ◽

10.1097/md.0000000000028354 ◽

2021 ◽

Vol 100 (51) ◽

pp. e28354

Author(s):

H. Nina Kim ◽

Ayushi Gupta ◽

Kristine Lan ◽

Jenell Stewart ◽

Shireesha Dhanireddy ◽

...

Keyword(s):

Diagnostic Accuracy ◽

Discharge Summary ◽

Cohort Identification ◽

Code Versus

Download Full-text

Cohort Identification Using Semantic Web Technologies: Triplestores as Engines for Complex Computable Phenotyping

10.1101/2021.12.02.21267186 ◽

2021 ◽

Author(s):

Emily R. Pfaff ◽

Robert Bradford ◽

Marshall Clark ◽

James P. Balhoff ◽

Rujin Wang ◽

...

Keyword(s):

Risk Model ◽

Heterogeneous Data ◽

Opioid Use ◽

Social Vulnerability Index ◽

Electronic Health Record Data ◽

Semantic Web Technologies ◽

Opioid Prescription ◽

Post Surgery ◽

Cohort Identification ◽

Opioid Users

ABSTRACTBackgroundComputable phenotypes are increasingly important tools for patient cohort identification. As part of a study of risk of chronic opioid use after surgery, we used a Resource Description Framework (RDF) triplestore as our computable phenotyping platform, hypothesizing that the unique affordances of triplestores may aid in making complex computable phenotypes more interoperable and reproducible than traditional relational database queries.To identify and model risk for new chronic opioid users post-surgery, we loaded several heterogeneous data sources into a Blazegraph triplestore: (1) electronic health record data; (2) claims data; (3) American Community Survey data; and (4) Centers for Disease Control Social Vulnerability Index, opioid prescription rate, and drug poisoning rate data. We then ran a series of queries to execute each of the rules in our “new chronic opioid user” phenotype definition to ultimately arrive at our qualifying cohort.ResultsOf the 4,163 patients in the denominator, our computable phenotype identified 248 patients as new chronic opioid users after their index surgical procedure. After validation against charts, 228 of the 248 were revealed to be true positive cases, giving our phenotype a PPV of 0.92.ConclusionWe successfully used the triplestore to execute the new chronic opioid user phenotype logic, and in doing so noted some advantages of the triplestore in terms of schemalessness, interoperability, and reproducibility. Future work will use the triplestore to create the planned risk model and leverage the additional links with ontologies, and ontological reasoning.

Download Full-text

Transcriptome analysis of Plasmodium falciparum isolates from Benin reveals specific gene expression associated with cerebral malaria

10.1101/2021.11.08.467248 ◽

2021 ◽

Author(s):

Emilie Guillochon ◽

J&eacuter&eacutemy Fraering ◽

Valentin Joste ◽

Claire Kamaliddin ◽

Bertin Vianou ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Cerebral Malaria ◽

Specific Binding ◽

Surface Proteins ◽

Specific Gene ◽

Binding Motif ◽

Rna Seq ◽

Brain Endothelium ◽

Cohort Identification ◽

Upregulated Genes

The host and parasitic factors leading to cerebral malaria (CM) are not yet fully elucidated and CM Plasmodium falciparum isolates transcriptome profile remains largely unknown. Based on RNA-seq data from 15 CM and 15 uncomplicated malaria (UM) children from Benin, we identified an increased ring stage signature in CM parasites. Reduced circulating time may result from a higher adherence ability of CM isolates and consistent with this hypothesis, we measured an overexpression of var genes in CM. var genes domains expression was more restricted in CM isolates compared to UM, reflecting the specific binding to receptors in host brain endothelium capillaries. However, ICAM-1 binding motif was found expressed in both CM and UM, questioning its role in PfEMP1 adhesion to ICAM-1 receptor. UM isolates increased circulation time may also be modulated by a more efficient immune response against infected erythrocytes surface proteins, which we could not demonstrate on our cohort. Identification of deregulated genes involved in adhesion, excluding variant surface antigens, also supports the hypothesis of an increased CM adhesion capacity. Finally, numerous upregulated genes involved in entry into host pathway were found, reflecting a greater erythrocytes invasion capacity of CM parasites.

Download Full-text

Development of a repository of computable phenotype definitions using the clinical quality language

JAMIA Open ◽

10.1093/jamiaopen/ooab094 ◽

2021 ◽

Vol 4 (4) ◽

Author(s):

Pascal S Brandt ◽

Jennifer A Pacheco ◽

Luke V Rasmussen

Keyword(s):

Structured Data ◽

Future Research ◽

Test Cases ◽

Inclusion Criteria ◽

Specialized Knowledge ◽

Development Environment ◽

Clinical Quality ◽

Cohort Identification ◽

Value Sets ◽

Published Research

Abstract Objective The objective of this study is to create a repository of computable, technology-agnostic phenotype definitions for the purposes of analysis and automatic cohort identification. Materials and Methods We selected phenotype definitions from PheKB and excluded definitions that did not use structured data or were not used in published research. We translated these definitions into the Clinical Quality Language (CQL) and Fast Healthcare Interoperability Resources (FHIR) and validated them using code review and automated tests. Results A total of 33 phenotype definitions met our inclusion criteria. We developed 40 CQL libraries, 231 value sets, and 347 test cases. To support these test cases, a total of 1624 FHIR resources were created as test data. Discussion and Conclusion Although a number of challenges were encountered while translating the phenotypes into structured form, such as requiring specialized knowledge, or imprecise, ambiguous, and conflicting language, we have created a repository and a development environment that can be used for future research on computable phenotypes.

Download Full-text

Toward Using Twitter Data to Monitor Covid-19 Vaccine Safety in Pregnancy: A Proof-of-Concept Study of Cohort Identification (Preprint)

JMIR Formative Research ◽

10.2196/33792 ◽

2021 ◽

Author(s):

Ari Z Klein ◽

Karen O'Connor ◽

Graciela Gonzalez-Hernandez

Keyword(s):

Vaccine Safety ◽

Proof Of Concept ◽

Concept Study ◽

Twitter Data ◽

Cohort Identification ◽

In Pregnancy

Download Full-text

Agreement between neuroimages and reports for natural language processing-based detection of silent brain infarcts and white matter disease

BMC Neurology ◽

10.1186/s12883-021-02221-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Lester Y. Leung ◽

Sunyang Fu ◽

Patrick H. Luetmer ◽

David F. Kallmes ◽

Neel Madan ◽

...

Keyword(s):

Natural Language Processing ◽

White Matter ◽

Natural Language ◽

Language Processing ◽

Interrater Reliability ◽

Clinical Care ◽

Routine Care ◽

White Matter Disease ◽

Brain Infarcts ◽

Cohort Identification

Abstract Background There are numerous barriers to identifying patients with silent brain infarcts (SBIs) and white matter disease (WMD) in routine clinical care. A natural language processing (NLP) algorithm may identify patients from neuroimaging reports, but it is unclear if these reports contain reliable information on these findings. Methods Four radiology residents reviewed 1000 neuroimaging reports (RI) of patients age > 50 years without clinical histories of stroke, TIA, or dementia for the presence, acuity, and location of SBIs, and the presence and severity of WMD. Four neuroradiologists directly reviewed a subsample of 182 images (DR). An NLP algorithm was developed to identify findings in reports. We assessed interrater reliability for DR and RI, and agreement between these two and with NLP. Results For DR, interrater reliability was moderate for the presence of SBIs (k = 0.58, 95 % CI 0.46–0.69) and WMD (k = 0.49, 95 % CI 0.35–0.63), and moderate to substantial for characteristics of SBI and WMD. Agreement between DR and RI was substantial for the presence of SBIs and WMD, and fair to substantial for characteristics of SBIs and WMD. Agreement between NLP and DR was substantial for the presence of SBIs (k = 0.64, 95 % CI 0.53–0.76) and moderate (k = 0.52, 95 % CI 0.39–0.65) for the presence of WMD. Conclusions Neuroimaging reports in routine care capture the presence of SBIs and WMD. An NLP can identify these findings (comparable to direct imaging review) and can likely be used for cohort identification.

Download Full-text

Using Machine Learning for Early Prediction of Cardiogenic Shock in Patients with Acute Heart Failure

10.21203/rs.3.rs-453102/v1 ◽

2021 ◽

Author(s):

Faisal Rahman ◽

Noam Finkelstein ◽

Anton Alyakin ◽

Nisha Gilotra ◽

Jeff Trost ◽

...

Keyword(s):

Heart Failure ◽

High Risk ◽

Cardiogenic Shock ◽

Risk Model ◽

Acute Decompensated Heart Failure ◽

Clinical Care ◽

Delayed Diagnosis ◽

Low Risk ◽

Cohort Identification ◽

High Risk Cohort

Abstract Objective: Despite technological and treatment advancements over the past two decades, cardiogenic shock (CS) mortality has remained between 40-60%. A number of factors can lead to delayed diagnosis of CS, including gradual onset and nonspecific symptoms. Our objective was to develop an algorithm that can continuously monitor heart failure patients, and partition them into cohorts of high- and low-risk for CS.Methods: We retrospectively studied 24,461 patients hospitalized with acute decompensated heart failure, 265 of whom developed CS, in the Johns Hopkins Healthcare system. Our cohort identification approach is based on logistic regression, and makes use of vital signs, lab values, and medication administrations recorded during the normal course of care. Results: Our algorithm identified patients at high-risk of CS. Patients in the high-risk cohort had 10.2 times (95% confidence interval 6.1-17.2) higher prevalence of CS than those in the low-risk cohort. Patients who experienced cardiogenic shock while in the high-risk cohort were first deemed high-risk a median of 1.7 days (interquartile range 0.8 to 4.6) before cardiogenic shock diagnosis was made by their clinical team. Conclusions: This risk model was able to predict patients at higher risk of CS in a time frame that allowed a change in clinical care. Future studies need to evaluate if CS analysis of high-risk cohort identification may affect outcomes.

Download Full-text

Type 1 Diabetes Management With Technology: Patterns of Utilization and Effects on Glucose Control Using Real-World Evidence

10.2337/figshare.13641302 ◽

2021 ◽

Author(s):

Ran Sun ◽

Imon Banerjee ◽

Shengtian Sang ◽

Jennifer Joseph ◽

Jennifer Schneider ◽

...

Keyword(s):

Type 1 Diabetes ◽

Language Processing ◽

Diabetes Management ◽

Clinical Care ◽

Glucose Monitoring ◽

Diabetes Diagnosis ◽

Cohort Identification ◽

Patterns Of Utilization ◽

Device Use

Key Points · About one-third of patients with type 1 diabetes were found to use continuous glucose monitoring (CGM) and/or continuous subcutaneous insulin infusion (CSII) in routine clinical care. · Disparities exist in CGM and CSII adoption, with device use more common in patients of higher socioeconomic status. · Mining clinical narratives with natural language processing techniques can be applied successfully for medical device surveillance and cohort identification for observational studies. · CGM use in conjunction with CSII after type 1 diabetes diagnosis is more effective than other therapy regimens and may translate to improved long-term glycemic control.

Download Full-text

Type 1 Diabetes Management With Technology: Patterns of Utilization and Effects on Glucose Control Using Real-World Evidence

10.2337/figshare.13641302.v1 ◽

2021 ◽

Author(s):

Ran Sun ◽

Imon Banerjee ◽

Shengtian Sang ◽

Jennifer Joseph ◽

Jennifer Schneider ◽

...

Keyword(s):

Type 1 Diabetes ◽

Language Processing ◽

Diabetes Management ◽

Clinical Care ◽

Glucose Monitoring ◽

Diabetes Diagnosis ◽

Cohort Identification ◽

Patterns Of Utilization ◽

Device Use

Key Points · About one-third of patients with type 1 diabetes were found to use continuous glucose monitoring (CGM) and/or continuous subcutaneous insulin infusion (CSII) in routine clinical care. · Disparities exist in CGM and CSII adoption, with device use more common in patients of higher socioeconomic status. · Mining clinical narratives with natural language processing techniques can be applied successfully for medical device surveillance and cohort identification for observational studies. · CGM use in conjunction with CSII after type 1 diabetes diagnosis is more effective than other therapy regimens and may translate to improved long-term glycemic control.

Download Full-text

cohort identification
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Computer-Assisted Cohort Identification in Practice

Diagnostic accuracy of ICD code versus discharge summary-based query for endocarditis cohort identification

Cohort Identification Using Semantic Web Technologies: Triplestores as Engines for Complex Computable Phenotyping

Transcriptome analysis of Plasmodium falciparum isolates from Benin reveals specific gene expression associated with cerebral malaria

Development of a repository of computable phenotype definitions using the clinical quality language

Toward Using Twitter Data to Monitor Covid-19 Vaccine Safety in Pregnancy: A Proof-of-Concept Study of Cohort Identification (Preprint)

Agreement between neuroimages and reports for natural language processing-based detection of silent brain infarcts and white matter disease

Using Machine Learning for Early Prediction of Cardiogenic Shock in Patients with Acute Heart Failure

Type 1 Diabetes Management With Technology: Patterns of Utilization and Effects on Glucose Control Using Real-World Evidence

Type 1 Diabetes Management With Technology: Patterns of Utilization and Effects on Glucose Control Using Real-World Evidence

Export Citation Format

cohort identificationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Computer-Assisted Cohort Identification in Practice

Diagnostic accuracy of ICD code versus discharge summary-based query for endocarditis cohort identification

Cohort Identification Using Semantic Web Technologies: Triplestores as Engines for Complex Computable Phenotyping

Transcriptome analysis of Plasmodium falciparum isolates from Benin reveals specific gene expression associated with cerebral malaria

Development of a repository of computable phenotype definitions using the clinical quality language

Toward Using Twitter Data to Monitor Covid-19 Vaccine Safety in Pregnancy: A Proof-of-Concept Study of Cohort Identification (Preprint)

Agreement between neuroimages and reports for natural language processing-based detection of silent brain infarcts and white matter disease

Using Machine Learning for Early Prediction of Cardiogenic Shock in Patients with Acute Heart Failure

Type 1 Diabetes Management With Technology: Patterns of Utilization and Effects on Glucose Control Using Real-World Evidence

Type 1 Diabetes Management With Technology: Patterns of Utilization and Effects on Glucose Control Using Real-World Evidence

cohort identification
Recently Published Documents