BACKGROUND
Interoperability and secondary usage of data is a challenge in healthcare. Specifically, reuse of clinical free-text is an unresolved problem. SNOMED CT is growing into the universal language of healthcare and presents characteristics similar to a natural language. Its usage to represent clinical free-text could constitute a solution to improve interoperability.
OBJECTIVE
Although the usage of SNOMED and SNOMED CT has already been subject of review, its specific usage to process and represent unstructured data such as clinical free-text has not been the focus of an evaluation. This work aims at better understanding the use of SNOMED CT for NLP in medicine by reviewing its usage on clinical free-text.
METHODS
A scoping review has been performed on the topic, by searching on MedLine, Embase and Web of Science for publications featuring free-text processing and SNOMED CT. A recursive reference review was made to broaden the scope of the research. The review covered the type of data processed; the language targeted; the goal of the mapping to SNOMED CT; the method used; and finally, the specific software used.
RESULTS
A final set of 76 publications was selected for extensive study. The most frequent types of document are complementary exam reports (23.68%) and narrative notes (21.05%). The language focus is English in 90.79% of publications. The mapping to SNOMED CT is the final goal of the research in 21.05% of publications, part of the final goal in 32.89% and a step toward another goal in 46.05%.The main targets of the mapping to SNOMED CT are information extraction (38.94%), feature in a classification task (23.01%) and data normalization (20.35%). The method used for the mapping is rule-based in 69.74% of publications, manual in 14.47%, hybrid in 10.53%, and machine learning in 5.26%. 12 different software have been used to map text to SNOMED CT concepts, the most frequent being Medtex, MCVS and MTERMS. Full terminology was used in 64.47% of publications while only a subset of it was used in 30.26% publications. Post-coordination was proposed in 17.11% of publications and only 5.26% of publications mentioned specifically the usage of the SNOMED CT compositional grammar.
CONCLUSIONS
SNOMED CT has been largely used to process free-text data, most frequently with rule-based approaches, in English. However, to this date there is no easy solution for mapping free-text in to SNOMED CT concepts especially on languages different than English or if post-coordination is needed. Most of the solutions conceive SNOMED CT as a simple terminology rather than as a compositional bag of ontologies. Since 2012, the number of publications on this subject by year is decreasing. However, the need for formal semantic representation of free-text in healthcare is high and automatic encoding into a compositional ontology could be a way to achieve interoperability.