Extracting Patient Case Profiles with Domain-Specific Semantic Categories
The fast growing content of online articles of clinical case studies provides a useful source for extracting domain-specific knowledge for improving healthcare systems. However, current studies are more focused on the abstract of a published case study which contains little information about the detailed case profiles of a patient, such as symptoms and signs, and important laboratory test results of the patient from the diagnostic and treatment procedures. This paper proposes a novel category set to cover a wide variety of semantics in the description of clinical case studies which distinguishes each unique patient case. A manually annotated corpus consisting of over 5000 sentences from 75 journal articles of clinical case studies has been created. A sentence classification system which identifies 13 classes of clinically relevant content has been developed. A golden standard for assessing the automatic classifications has been established by manual annotation. A maximum entropy (MaxEnt) classifier is shown to produce better results than a Support Vector Machine (SVM) classifier on the corpus.