Big data techniques for the secondary use of clinical data in the generation of medical knowledge. The MIMIC solution

Author(s):  
X. Borrat ◽  
L.A. Celi ◽  
C. Ferrando
2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
◽  

Abstract Countries have a wide range of lifestyles, environmental exposures and different health(care) systems providing a large natural experiment to be investigated. Through pan-European comparative studies, underlying determinants of population health can be explored and provide rich new insights into the dynamics of population health and care such as the safety, quality, effectiveness and costs of interventions. Additionally, in the big data era, secondary use of data has become one of the major cornerstones of digital transformation for health systems improvement. Several countries are reviewing governance models and regulatory framework for data reuse. Precision medicine and public health intelligence share the same population-based approach, as such, aligning secondary use of data initiatives will increase cost-efficiency of the data conversion value chain by ensuring that different stakeholders needs are accounted for since the beginning. At EU level, the European Commission has been raising awareness of the need to create adequate data ecosystems for innovative use of big data for health, specially ensuring responsible development and deployment of data science and artificial intelligence technologies in the medical and public health sectors. To this end, the Joint Action on Health Information (InfAct) is setting up the Distributed Infrastructure on Population Health (DIPoH). DIPoH provides a framework for international and multi-sectoral collaborations in health information. More specifically, DIPoH facilitates the sharing of research methods, data and results through participation of countries and already existing research networks. DIPoH's efforts include harmonization and interoperability, strengthening of the research capacity in MSs and providing European and worldwide perspectives to national data. In order to be embedded in the health information landscape, DIPoH aims to interact with existing (inter)national initiatives to identify common interfaces, to avoid duplication of the work and establish a sustainable long-term health information research infrastructure. In this workshop, InfAct lays down DIPoH's core elements in coherence with national and European initiatives and actors i.e. To-Reach, eHAction, the French Health Data Hub and ECHO. Pitch presentations on DIPoH and its national nodes will set the scene. In the format of a round table, possible collaborations with existing initiatives at (inter)national level will be debated with the audience. Synergies will be sought, reflections on community needs will be made and expectations on services will be discussed. The workshop will increase the knowledge of delegates around the latest health information infrastructure and initiatives that strive for better public health and health systems in countries. The workshop also serves as a capacity building activity to promote cooperation between initiatives and actors in the field. Key messages DIPoH an infrastructure aiming to interact with existing (inter)national initiatives to identify common interfaces, avoid duplication and enable a long-term health information research infrastructure. National nodes can improve coordination, communication and cooperation between health information stakeholders in a country, potentially reducing overlap and duplication of research and field-work.


2013 ◽  
Vol 07 (04) ◽  
pp. 377-405 ◽  
Author(s):  
TRAVIS GOODWIN ◽  
SANDA M. HARABAGIU

The introduction of electronic medical records (EMRs) enabled the access of unprecedented volumes of clinical data, both in structured and unstructured formats. A significant amount of this clinical data is expressed within the narrative portion of the EMRs, requiring natural language processing techniques to unlock the medical knowledge referred to by physicians. This knowledge, derived from the practice of medical care, complements medical knowledge already encoded in various structured biomedical ontologies. Moreover, the clinical knowledge derived from EMRs also exhibits relational information between medical concepts, derived from the cohesion property of clinical text, which is an attractive attribute that is currently missing from the vast biomedical knowledge bases. In this paper, we describe an automatic method of generating a graph of clinically related medical concepts by considering the belief values associated with those concepts. The belief value is an expression of the clinician's assertion that the concept is qualified as present, absent, suggested, hypothetical, ongoing, etc. Because the method detailed in this paper takes into account the hedging used by physicians when authoring EMRs, the resulting graph encodes qualified medical knowledge wherein each medical concept has an associated assertion (or belief value) and such qualified medical concepts are spanned by relations of different strengths, derived from the clinical contexts in which concepts are used. In this paper, we discuss the construction of a qualified medical knowledge graph (QMKG) and treat it as a BigData problem addressed by using MapReduce for deriving the weighted edges of the graph. To be able to assess the value of the QMKG, we demonstrate its usage for retrieving patient cohorts by enabling query expansion that produces greatly enhanced results against state-of-the-art methods.


2017 ◽  
Vol 6 (2) ◽  
pp. 12
Author(s):  
Abhith Pallegar

The objective of the paper is to elucidate how interconnected biological systems can be better mapped and understood using the rapidly growing area of Big Data. We can harness network efficiencies by analyzing diverse medical data and probe how we can effectively lower the economic cost of finding cures for rare diseases. Most rare diseases are due to genetic abnormalities, many forms of cancers develop due to genetic mutations. Finding cures for rare diseases requires us to understand the biology and biological processes of the human body. In this paper, we explore what the historical shift of focus from pharmacology to biotechnology means for accelerating biomedical solutions. With biotechnology playing a leading role in the field of medical research, we explore how network efficiencies can be harnessed by strengthening the existing knowledge base. Studying rare or orphan diseases provides rich observable statistical data that can be leveraged for finding solutions. Network effects can be squeezed from working with diverse data sets that enables us to generate the highest quality medical knowledge with the fewest resources. This paper examines gene manipulation technologies like Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) that can prevent diseases of genetic variety. We further explore the role of the emerging field of Big Data in analyzing large quantities of medical data with the rapid growth of computing power and some of the network efficiencies gained from this endeavor. 


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


2017 ◽  
Vol 26 (01) ◽  
pp. 28-37
Author(s):  
F. J. Martin-Sanchez ◽  
V. Aguiar-Pulido ◽  
G. H. Lopez-Campos ◽  
N. Peek ◽  
L. Sacchi

Summary Objectives: To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different areas of application, namely clinical research, genomic research, study of environmental factors, and population and health services research. This paper describes some of the informatics methods and Big Data resources developed in this context, such as electronic phenotyping, clinical research networks, biorepositories, screening data banks, and wide association studies. Lastly, some of the potential limitations of these approaches are discussed, focusing on confounding factors and data quality. Methods: A series of literature searches in main bibliographic databases have been conducted in order to assess the extent to which existing patient data has been repurposed for research. This contribution from the IMIA working group on “Data mining and Big Data analytics” focuses on the literature published during the last two years, covering the timeframe since the working group’s last survey. Results and Conclusions: Although most of the examples of secondary use of patient data lie in the arena of clinical and health services research, we have started to witness other important applications, particularly in the area of genomic research and the study of health effects of environmental factors. Further research is needed to characterize the economic impact of secondary use across the broad spectrum of translational research.


2020 ◽  
pp. practneurol-2020-002688
Author(s):  
Stephen D Auger ◽  
Benjamin M Jacobs ◽  
Ruth Dobson ◽  
Charles R Marshall ◽  
Alastair J Noyce

Modern clinical practice requires the integration and interpretation of ever-expanding volumes of clinical data. There is, therefore, an imperative to develop efficient ways to process and understand these large amounts of data. Neurologists work to understand the function of biological neural networks, but artificial neural networks and other forms of machine learning algorithm are likely to be increasingly encountered in clinical practice. As their use increases, clinicians will need to understand the basic principles and common types of algorithm. We aim to provide a coherent introduction to this jargon-heavy subject and equip neurologists with the tools to understand, critically appraise and apply insights from this burgeoning field.


2015 ◽  
Vol 22 (3) ◽  
pp. 600-607 ◽  
Author(s):  
Bonnie L Westra ◽  
Gail E Latimer ◽  
Susan A Matney ◽  
Jung In Park ◽  
Joyce Sensmeier ◽  
...  

Abstract Background There is wide recognition that, with the rapid implementation of electronic health records (EHRs), large data sets are available for research. However, essential standardized nursing data are seldom integrated into EHRs and clinical data repositories. There are many diverse activities that exist to implement standardized nursing languages in EHRs; however, these activities are not coordinated, resulting in duplicate efforts rather than building a shared learning environment and resources. Objective The purpose of this paper is to describe the historical context of nursing terminologies, challenges to the use of nursing data for purposes other than documentation of care, and a national action plan for implementing and using sharable and comparable nursing data for quality reporting and translational research. Methods In 2013 and 2014, the University of Minnesota School of Nursing hosted a diverse group of nurses to participate in the Nursing Knowledge: Big Data and Science to Transform Health Care consensus conferences. This consensus conference was held to develop a national action plan and harmonize existing and new efforts of multiple individuals and organizations to expedite integration of standardized nursing data within EHRs and ensure their availability in clinical data repositories for secondary use. This harmonization will address the implementation of standardized nursing terminologies and subsequent access to and use of clinical nursing data. Conclusion Foundational to integrating nursing data into clinical data repositories for big data and science, is the implementation of standardized nursing terminologies, common data models, and information structures within EHRs. The 2014 National Action Plan for Sharable and Comparable Nursing Data for Transforming Health and Healthcare builds on and leverages existing, but separate long standing efforts of many individuals and organizations. The plan is action focused, with accountability for coordinating and tracking progress designated.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Emilie Baro ◽  
Samuel Degoul ◽  
Régis Beuscart ◽  
Emmanuel Chazard

Objective.The aim of this study was to provide a definition of big data in healthcare.Methods.A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals(n)and the number of variables(p)for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed.Results.A total of 196 papers were included. Big data can be defined as datasets withLog⁡(n*p)≥7. Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues.Conclusion.Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data.


Sign in / Sign up

Export Citation Format

Share Document