AbstractVast amounts of medical information are still recorded as unstructured text. The knowledge contained in this textual data has a great potential to improve clinical routine care, to support clinical research, and to advance personalization of medicine. To access this knowledge, the underlying data has to be semantically integrated – an essential prerequisite to which is information extraction from clinical documents.A body of work, and a good selection of openly available tools for information extraction and semantic integration in the medical domain exist, yet almost exclusively for English language documents. For German texts the situation is rather different: research work is sparse, tools are proprietary or unpublished, and rarely any freely available textual resources exist. In this survey, we (1) describe the challenges of information extraction from German medical documents and the hurdles posed to research in this area, (2) especially address the problems of missing German language resources and privacy implications, and (3) identify the steps necessary to overcome these hurdles and fuel research in semantic integration of textual clinical data.