Large-Scale Analysis of Genetic and Clinical Patient Data

2018 ◽  
Vol 1 (1) ◽  
pp. 263-274 ◽  
Author(s):  
Marylyn D. Ritchie

Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Amir Bahmani ◽  
Arash Alavi ◽  
Thore Buergel ◽  
Sushil Upadhyayula ◽  
Qiwen Wang ◽  
...  

AbstractThe large amount of biomedical data derived from wearable sensors, electronic health records, and molecular profiling (e.g., genomics data) is rapidly transforming our healthcare systems. The increasing scale and scope of biomedical data not only is generating enormous opportunities for improving health outcomes but also raises new challenges ranging from data acquisition and storage to data analysis and utilization. To meet these challenges, we developed the Personal Health Dashboard (PHD), which utilizes state-of-the-art security and scalability technologies to provide an end-to-end solution for big biomedical data analytics. The PHD platform is an open-source software framework that can be easily configured and deployed to any big data health project to store, organize, and process complex biomedical data sets, support real-time data analysis at both the individual level and the cohort level, and ensure participant privacy at every step. In addition to presenting the system, we illustrate the use of the PHD framework for large-scale applications in emerging multi-omics disease studies, such as collecting and visualization of diverse data types (wearable, clinical, omics) at a personal level, investigation of insulin resistance, and an infrastructure for the detection of presymptomatic COVID-19.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 251
Author(s):  
John Van Horn ◽  
Sumiko Abe ◽  
José Luis Ambite ◽  
Teresa K. Attwood ◽  
Niall Beard ◽  
...  

The increasing richness and diversity of biomedical data types creates major organizational and analytical impediments to rapid translational impact in the context of training and education. As biomedical data-sets increase in size, variety and complexity, they challenge conventional methods for sharing, managing and analyzing those data. In May 2017, we convened a two-day meeting between the BD2K Training Coordinating Center (TCC), ELIXIR Training/TeSS, GOBLET, H3ABioNet, EMBL-ABR, bioCADDIE and the CSIRO, in Huntington Beach, California, to compare and contrast our respective activities, and how these might be leveraged for wider impact on an international scale. Discussions focused on the role of i) training for biomedical data science; ii) the need to promote core competencies, and the ii) development of career paths. These led to specific conversations about i) the values of standardizing and sharing data science training resources; ii) challenges in encouraging adoption of training material standards; iii) strategies and best practices for the personalization and customization of learning experiences; iv) processes of identifying stakeholders and determining how they should be accommodated; and v) discussions of joint partnerships to lead the world on data science training in ways that benefit all stakeholders. Generally, international cooperation was viewed as essential for accommodating the widest possible participation in the modern bioscience enterprise, providing skills in a truly “FAIR” manner, addressing the importance of data science understanding worldwide. Several recommendations for the exchange of educational frameworks are made, along with potential sources for support, and plans for further cooperative efforts are presented.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiujin Li ◽  
Hailiang Song ◽  
Zhe Zhang ◽  
Yunmao Huang ◽  
Qin Zhang ◽  
...  

Abstract Background With the emphasis on analysing genotype-by-environment interactions within the framework of genomic selection and genome-wide association analysis, there is an increasing demand for reliable tools that can be used to simulate large-scale genomic data in order to assess related approaches. Results We proposed a theory to simulate large-scale genomic data on genotype-by-environment interactions and added this new function to our developed tool GPOPSIM. Additionally, a simulated threshold trait with large-scale genomic data was also added. The validation of the simulated data indicated that GPOSPIM2.0 is an efficient tool for mimicking the phenotypic data of quantitative traits, threshold traits, and genetically correlated traits with large-scale genomic data while taking genotype-by-environment interactions into account. Conclusions This tool is useful for assessing genotype-by-environment interactions and threshold traits methods.


2021 ◽  
Author(s):  
Enrico Moiso ◽  
Paolo Provero

Alteration of metabolic pathways in cancer has been investigated for many years, beginning way before the discovery of the role of oncogenes and tumor suppressors, and the last few years have witnessed a renewed interest in this topic. Large-scale molecular and clinical data on tens of thousands of samples allow us today to tackle the problem from a general point of view. Here we show that trancriptomic profiles of tumors can be exploited to define metabolic cancer subtypes, that can be systematically investigated for association with other molecular and clinical data. We find thousands of significant associations between metabolic subtypes and molecular features such as somatic mutations, structural variants, epigenetic modifications, protein abundance and activation; and with clinical/phenotypic data including survival probability, tumor grade, and histological types. Our work provides a methodological framework and a rich database of statistical associations, accessible from https://metaminer.unito.it, that will contribute to the understanding of the role of metabolic alterations in cancer and to the development of precision therapeutic strategies.


2017 ◽  
pp. 83-99
Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


2020 ◽  
Vol 3 (1) ◽  
pp. 43-59
Author(s):  
Peter M. Kasson

Infectious disease research spans scales from the molecular to the global—from specific mechanisms of pathogen drug resistance, virulence, and replication to the movement of people, animals, and pathogens around the world. All of these research areas have been impacted by the recent growth of large-scale data sources and data analytics. Some of these advances rely on data or analytic methods that are common to most biomedical data science, while others leverage the unique nature of infectious disease, namely its communicability. This review outlines major research progress in the past few years and highlights some remaining opportunities, focusing on data or methodological approaches particular to infectious disease.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Yu-Bing Li ◽  
Xue-Zhong Zhou ◽  
Run-Shun Zhang ◽  
Ying-Hui Wang ◽  
Yonghong Peng ◽  
...  

Background. Traditional Chinese medicine (TCM) is an individualized medicine by observing the symptoms and signs (symptoms in brief) of patients. We aim to extract the meaningful herb-symptom relationships from large scale TCM clinical data.Methods. To investigate the correlations between symptoms and herbs held for patients, we use four clinical data sets collected from TCM outpatient clinical settings and calculate the similarities between patient pairs in terms of the herb constituents of their prescriptions and their manifesting symptoms by cosine measure. To address the large-scale multiple testing problems for the detection of herb-symptom associations and the dependence between herbs involving similar efficacies, we propose a network-based correlation analysis (NetCorrA) method to detect the herb-symptom associations.Results. The results show that there are strong positive correlations between symptom similarity and herb similarity, which indicates that herb-symptom correspondence is a clinical principle adhered to by most TCM physicians. Furthermore, the NetCorrA method obtains meaningful herb-symptom associations and performs better than the chi-square correlation method by filtering the false positive associations.Conclusions. Symptoms play significant roles for the prescriptions of herb treatment. The herb-symptom correspondence principle indicates that clinical phenotypic targets (i.e., symptoms) of herbs exist and would be valuable for further investigations.


2020 ◽  
Author(s):  
Silu Huang ◽  
Charles Blatti ◽  
Saurabh Sinha ◽  
Aditya Parameswaran

AbstractMotivationA common but critical task in genomic data analysis is finding features that separate and thereby help explain differences between two classes of biological objects, e.g., genes that explain the differences between healthy and diseased patients. As lower-cost, high-throughput experimental methods greatly increase the number of samples that are assayed as objects for analysis, computational methods are needed to quickly provide insights into high-dimensional datasets with tens of thousands of objects and features.ResultsWe develop an interactive exploration tool called Genvisage that rapidly discovers the most discriminative feature pairs that best separate two classes in a dataset, and displays the corresponding visualizations. Since quickly finding top feature pairs is computationally challenging, especially when the numbers of objects and features are large, we propose a suite of optimizations to make Genvisage more responsive and demonstrate that our optimizations lead to a 400X speedup over competitive baselines for multiple biological data sets. With this speedup, Genvisage enables the exploration of more large-scale datasets and alternate hypotheses in an interactive and interpretable fashion. We apply Genvisage to uncover pairs of genes whose transcriptomic responses significantly discriminate treatments of several chemotherapy drugs.AvailabilityFree webserver at http://genvisage.knoweng.org:443/ with source code at https://github.com/KnowEnG/Genvisage


2019 ◽  
Author(s):  
Zachary B. Abrams ◽  
Caitlin E. Coombes ◽  
Suli Li ◽  
Kevin R. Coombes

AbstractSummaryUnsupervised data analysis in many scientific disciplines is based on calculating distances between observations and finding ways to visualize those distances. These kinds of unsupervised analyses help researchers uncover patterns in large-scale data sets. However, researchers can select from a vast number of different distance metrics, each designed to highlight different aspects of different data types. There are also numerous visualization methods with their own strengths and weaknesses. To help researchers perform unsupervised analyses, we developed the Mercator R package. Mercator enables users to see important patterns in their data by generating multiple visualizations using different standard algorithms, making it particularly easy to compare and contrast the results arising from different metrics. By allowing users to select the distance metric that best fits their needs, Mercator helps researchers perform unsupervised analyses that use pattern identification through computation and visual inspection.Availability and ImplementationMercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html)[email protected] informationSupplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document