scholarly journals EHR Big Data Deep Phenotyping

2014 ◽  
Vol 23 (01) ◽  
pp. 206-211 ◽  
Author(s):  
L. Lenert ◽  
G. Lopez-Campos ◽  
L. J. Frey

Summary Objectives: Given the quickening speed of discovery of variant disease drivers from combined patient genotype and phenotype data, the objective is to provide methodology using big data technology to support the definition of deep phenotypes in medical records. Methods: As the vast stores of genomic information increase with next generation sequencing, the importance of deep phenotyping increases. The growth of genomic data and adoption of Electronic Health Records (EHR) in medicine provides a unique opportunity to integrate phenotype and genotype data into medical records. The method by which collections of clinical findings and other health related data are leveraged to form meaningful phenotypes is an active area of research. Longitudinal data stored in EHRs provide a wealth of information that can be used to construct phenotypes of patients. We focus on a practical problem around data integration for deep phenotype identification within EHR data. The use of big data approaches are described that enable scalable markup of EHR events that can be used for semantic and temporal similarity analysis to support the identification of phenotype and genotype relationships. Conclusions: Stead and colleagues’ 2005 concept of using light standards to increase the productivity of software systems by riding on the wave of hardware/processing power is described as a harbinger for designing future healthcare systems. The big data solution, using flexible markup, provides a route to improved utilization of processing power for organizing patient records in genotype and phenotype research.

2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Emilie Baro ◽  
Samuel Degoul ◽  
Régis Beuscart ◽  
Emmanuel Chazard

Objective.The aim of this study was to provide a definition of big data in healthcare.Methods.A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals(n)and the number of variables(p)for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed.Results.A total of 196 papers were included. Big data can be defined as datasets withLog⁡(n*p)≥7. Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues.Conclusion.Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data.


2018 ◽  
Vol 27 (04) ◽  
pp. 1860006
Author(s):  
Nikolaos Tsapanos ◽  
Anastasios Tefas ◽  
Nikolaos Nikolaidis ◽  
Ioannis Pitas

Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non-linear separability of the input data. With respect to the challenges of Big Data research, a field that has established itself in the last few years and involves performing tasks on extremely large amounts of data, several adaptations of the Kernel k-Means have been proposed, each of which has different requirements in processing power and running time, while also incurring different trade-offs in performance. In this paper, we present several issues and techniques involving the usage of Kernel k-Means for Big Data clustering and how the combination of each component in a clustering framework fares in terms of resources, time and performance. We use experimental results, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem.


2022 ◽  
Vol 4 ◽  
Author(s):  
Michael Rapp ◽  
Moritz Kulessa ◽  
Eneldo Loza Mencía ◽  
Johannes Fürnkranz

Early outbreak detection is a key aspect in the containment of infectious diseases, as it enables the identification and isolation of infected individuals before the disease can spread to a larger population. Instead of detecting unexpected increases of infections by monitoring confirmed cases, syndromic surveillance aims at the detection of cases with early symptoms, which allows a more timely disclosure of outbreaks. However, the definition of these disease patterns is often challenging, as early symptoms are usually shared among many diseases and a particular disease can have several clinical pictures in the early phase of an infection. As a first step toward the goal to support epidemiologists in the process of defining reliable disease patterns, we present a novel, data-driven approach to discover such patterns in historic data. The key idea is to take into account the correlation between indicators in a health-related data source and the reported number of infections in the respective geographic region. In an preliminary experimental study, we use data from several emergency departments to discover disease patterns for three infectious diseases. Our results show the potential of the proposed approach to find patterns that correlate with the reported infections and to identify indicators that are related to the respective diseases. It also motivates the need for additional measures to overcome practical limitations, such as the requirement to deal with noisy and unbalanced data, and demonstrates the importance of incorporating feedback of domain experts into the learning procedure.


2021 ◽  
Author(s):  
PRANJAL KUMAR ◽  
Siddhartha Chauhan

Abstract Big data analysis and Artificial Intelligence have received significant attention recently in creating more opportunities in the health sector for aggregating or collecting large-scale data. Today, our genomes and microbiomes can be sequenced i.e., all information exchanged between physicians and patients in Electronic Health Records (EHR) can be collected and traced at least theoretically. Social media and mobile devices today obviously provide many health-related data regarding activity, diets, social contacts, and so on. However, it is increasingly difficult to use this information to answer health questions and, in particular, because the data comes from various domains and lives in different infrastructures and of course it also is very variable quality. The massive collection and aggregation of personal data come with a number of ethical policy, methodological, technological challenges. It should be acknowledged that large-scale clinical evidence remains to confirm the promise of Big Data and Artificial Intelligence (AI) in health care. This paper explores the complexities of big data & artificial intelligence in healthcare as well as the benefits and prospects.


Author(s):  
Anitha S. Pillai ◽  
Bindu Menon

Advancement in technology has paved the way for the growth of big data. We are able to exploit this data to a great extent as the costs of collecting, storing, and analyzing a large volume of data have plummeted considerably. There is an exponential increase in the amount of health-related data being generated by smart devices. Requisite for proper mining of the data for knowledge discovery and therapeutic product development is very essential. The expanding field of big data analytics is playing a vital role in healthcare practices and research. A large number of people are being affected by Alzheimer's Disease (AD), and as a result, it becomes very challenging for the family members to handle these individuals. The objective of this chapter is to highlight how deep learning can be used for the early diagnosis of AD and present the outcomes of research studies of both neurologists and computer scientists. The chapter gives introduction to big data, deep learning, AD, biomarkers, and brain images and concludes by suggesting blood biomarker as an ideal solution for early detection of AD.


2019 ◽  
Vol 6 (1) ◽  
pp. 205395171983011 ◽  
Author(s):  
Alessandro Blasimme ◽  
Effy Vayena ◽  
Ine Van Hoyweghen

In this paper, we discuss how access to health-related data by private insurers, other than affecting the interests of prospective policy-holders, can also influence their propensity to make personal data available for research purposes. We take the case of national precision medicine initiatives as an illustrative example of this possible tendency. Precision medicine pools together unprecedented amounts of genetic as well as phenotypic data. The possibility that private insurers could claim access to such rapidly accumulating biomedical Big Data or to health-related information derived from it would discourage people from enrolling in precision medicine studies. Should that be the case, the economic value of personal data for the insurance industry would end up affecting the public value of data as a scientific resource. In what follows we articulate three principles – trustworthiness, openness and evidence – to address this problem and tame its potentially harmful effects on the development of precision medicine and, more generally, on the advancement of medical science.


2020 ◽  
Vol 9 (4) ◽  
pp. 1107 ◽  
Author(s):  
Charat Thongprayoon ◽  
Wisit Kaewput ◽  
Karthik Kovvuru ◽  
Panupong Hansrivijit ◽  
Swetha R. Kanduri ◽  
...  

Kidney diseases form part of the major health burdens experienced all over the world. Kidney diseases are linked to high economic burden, deaths, and morbidity rates. The great importance of collecting a large quantity of health-related data among human cohorts, what scholars refer to as “big data”, has increasingly been identified, with the establishment of a large group of cohorts and the usage of electronic health records (EHRs) in nephrology and transplantation. These data are valuable, and can potentially be utilized by researchers to advance knowledge in the field. Furthermore, progress in big data is stimulating the flourishing of artificial intelligence (AI), which is an excellent tool for handling, and subsequently processing, a great amount of data and may be applied to highlight more information on the effectiveness of medicine in kidney-related complications for the purpose of more precise phenotype and outcome prediction. In this article, we discuss the advances and challenges in big data, the use of EHRs and AI, with great emphasis on the usage of nephrology and transplantation.


2018 ◽  
Author(s):  
Tuan-Yen Wu

UNSTRUCTURED Five V’s form the core concept of big data: Volume, Velocity, Variety, Veracity and Value. However, medical application lags behind in these five aspects. The vast majority of medical databases do not meet the definition of "big data". The solution is to expand database through automated medical records and connect various databases. However, great challenges are encountered on our path. Electronic medical records follow strict rules of relational database; integration of databases employ a tremendous amount of manpower and the privacy/security concerns are of paramount importance. Only if these issues be addressed correctly that we enjoy the convenience big data give us.


2022 ◽  
pp. 979-992
Author(s):  
Pavani Konagala

A large volume of data is stored electronically. It is very difficult to measure the total volume of that data. This large amount of data is coming from various sources such as stock exchange, which may generate terabytes of data every day, Facebook, which may take about one petabyte of storage, and internet archives, which may store up to two petabytes of data, etc. So, it is very difficult to manage that data using relational database management systems. With the massive data, reading and writing from and into the drive takes more time. So, the storage and analysis of this massive data has become a big problem. Big data gives the solution for these problems. It specifies the methods to store and analyze the large data sets. This chapter specifies a brief study of big data techniques to analyze these types of data. It includes a wide study of Hadoop characteristics, Hadoop architecture, advantages of big data and big data eco system. Further, this chapter includes a comprehensive study of Apache Hive for executing health-related data and deaths data of U.S. government.


Author(s):  
Anitha S. Pillai ◽  
Bindu Menon

Advancement in technology has paved the way for the growth of big data. We are able to exploit this data to a great extent as the costs of collecting, storing, and analyzing a large volume of data have plummeted considerably. There is an exponential increase in the amount of health-related data being generated by smart devices. Requisite for proper mining of the data for knowledge discovery and therapeutic product development is very essential. The expanding field of big data analytics is playing a vital role in healthcare practices and research. A large number of people are being affected by Alzheimer's Disease (AD), and as a result, it becomes very challenging for the family members to handle these individuals. The objective of this chapter is to highlight how deep learning can be used for the early diagnosis of AD and present the outcomes of research studies of both neurologists and computer scientists. The chapter gives introduction to big data, deep learning, AD, biomarkers, and brain images and concludes by suggesting blood biomarker as an ideal solution for early detection of AD.


Sign in / Sign up

Export Citation Format

Share Document