scholarly journals Scalable analysis of multi-modal biomedical data

GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Jaclyn Smith ◽  
Yao Shi ◽  
Michael Benedikt ◽  
Milos Nikolic

Abstract Background Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. Solution To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. Performance We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on “flattening” complex data structures, and runs efficiently when alternative approaches are unable to perform at all.

2020 ◽  
Author(s):  
Jaclyn M Smith ◽  
Yao Shi ◽  
Michael Benedikt ◽  
Milos Nikolic

Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on flattening complex data structures, and runs efficiently when alternative approaches are unable to perform at all.


2018 ◽  
Vol 1 (1) ◽  
pp. 263-274 ◽  
Author(s):  
Marylyn D. Ritchie

Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.


2013 ◽  
Vol 433-435 ◽  
pp. 1853-1856
Author(s):  
Ting Hong Zhao ◽  
Peng Fei Zhang ◽  
Hui Min Hou

rrigation district informative construction is an effective way to improve the management and to rational allocate and effectively utility irrigation water resources. This paper is directed against the characteristics such as large-scale monitoring data amount, complex data types, high real-time requirement, strong spatial correlation, etc. combine Multi-Agent theory with irrigation district information system together, and use GSM communication network as the communication network of system, established an agricultural irrigation district information system based on Multi-Agent and GSM, which can full utility intelligent of Agent and the good communication coordination of Multi-Agent system, so to provide comprehensive technical support for irrigation management and decision making.


2020 ◽  
Author(s):  
Cemal Erdem ◽  
Ethan M. Bensman ◽  
Arnab Mutsuddy ◽  
Michael M. Saint-Antoine ◽  
Mehdi Bouhaddou ◽  
...  

ABSTRACTThe current era of big biomedical data accumulation and availability brings data integration opportunities for leveraging its totality to make new discoveries and/or clinically predictive models. Black-box statistical and machine learning methods are powerful for such integration, but often cannot provide mechanistic reasoning, particularly on the single-cell level. While single-cell mechanistic models clearly enable such reasoning, they are predominantly “small-scale”, and struggle with the scalability and reusability required for meaningful data integration. Here, we present an open-source pipeline for scalable, single-cell mechanistic modeling from simple, annotated input files that can serve as a foundation for mechanistic data integration. As a test case, we convert one of the largest existing single-cell mechanistic models to this format, demonstrating robustness and reproducibility of the approach. We show that the model cell line context can be changed with simple replacement of input file parameter values. We next use this new model to test alternative mechanistic hypotheses for the experimental observations that interferon-gamma (IFNG) inhibits epidermal growth factor (EGF)-induced cell proliferation. Model- based analysis suggested, and experiments support that these observations are better explained by IFNG-induced SOCS1 expression sequestering activated EGF receptors, thereby downregulating AKT activity, as opposed to direct IFNG-induced upregulation of p21 expression. Overall, this new pipeline enables large-scale, single-cell, and mechanistically-transparent modeling as a data integration modality complementary to machine learning.


2021 ◽  
Vol 43 (1) ◽  
Author(s):  
Eric R. Labelle ◽  
Julia Kemmerer

Despite the extensive use of cut-to-length mechanized systems, harvester data remains largely underutilized by most stakeholders in Germany. Therefore, the goal of this study was to determine how business processes should be restructured to allow for a continuous use of forest machine data, with the main focus on harvester production data, along the German wood supply chain. We also wanted to identify possible benefits and challenges of the restructuring through a qualitative analysis of the newly designed business process. The Bavarian State Forest Enterprise was chosen for a case study approach. Based on expert interviews, the current and to-be processes were modeled. Results obtained from the qualitative data indicated that an integration of harvester data is achievable in Germany. Harvester data from forest operations can be provided to all subsequent activities along the supply chain. Core changes were the addition of a digital work order, the data exchange between harvester and forwarder, the pile order and the exchange of production data. Benefits for every stakeholder were determined. Through the reengineered process, harvesting and timber information are available and known at an earlier stage of the process, throughput information stations could be eliminated and working comfort could be improved. Ecological benefits could also be achieved through an anticipated reduction of CO2 emissions and protection of sensitive nature areas. Negative consequences of harvester data integration could appear in the social sphere and were in line with the reduction of personal contact. Challenges for the implementation in reality, besides the legal situation, could be the availability of on-board computers in forwarders, cost for new IT applications, willingness of stakeholders to cooperate and availability of internet access. Further research should be focused on the combination of harvester data with other data types and the practical implementation of the TB process.


2014 ◽  
Vol 42 (3) ◽  
pp. 344-355 ◽  
Author(s):  
Gail E. Henderson ◽  
Susan M. Wolf ◽  
Kristine J. Kuczynski ◽  
Steven Joffe ◽  
Richard R. Sharp ◽  
...  

Large-scale sequencing tests, including whole-exome and whole-genome sequencing (WES/WGS), are rapidly moving into clinical use. Sequencing is already being used clinically to identify therapeutic opportunities for cancer patients who have run out of conventional treatment options, to help diagnose children with puzzling neurodevelopmental conditions, and to clarify appropriate drug choices and dosing in individuals. To evaluate and support clinical applications of these technologies, the National Human Genome Research Institute (NHGRI) and National Cancer Institute (NCI) have funded studies on clinical and research sequencing under the Clinical Sequencing Exploratory Research (CSER) program as well as studies on return of results (RoR). Most of these studies use sequencing in real-world clinical settings and collect data on both the application of sequencing and the impact of receiving genomic findings on study participants. They are occurring in the context of controversy over how to obtain consent for exome and genome sequencing.


CONVERTER ◽  
2021 ◽  
pp. 153-168
Author(s):  
Junjie Wu, Et al.

Objectives: In order to explore the factors that lead to the difference of outcome between strategic change and organizational performance.Methods: This paper takes the correlation coefficient between strategic change and organizational performance as the effect value, and conducts Meta integration analysis and Meta regression analysis on 23 important literatures involving 7225 enterprise samples from 2008 to 2018.Results: Firstly, the meta-integration method is used to estimate the overall results of existing empirical studies. The results show that strategic change is significantly positively correlated with organizational performance. Second, there are too many moderating factors that lead to different conclusions of research on strategic change-organizational performance. Therefore, Meta regression method is used to explore the impact of 10 moderating factors on the relationship between the two. First, positive strategic change has a better moderating effect; Second, compared with small and medium-sized enterprises, the performance of strategic change of large-scale enterprises is better; Third, the more recent the year of publication, the less supportive the relationship between strategic change and organizational performance; Fourth, there is no significant difference between data types and the relationship between strategic change and organizational performance, but data time span has a significant negative moderating effect.Conclusions: This study shows that the quantitative literature study of meta analysis not only helps to resolve the existing theoretical disputes, but also helps to explore new theoretical studies in the context of COVID-19 epidemic in the future, meanwhile, it also provides a novel framework for quantitative literature analysis.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Amir Bahmani ◽  
Arash Alavi ◽  
Thore Buergel ◽  
Sushil Upadhyayula ◽  
Qiwen Wang ◽  
...  

AbstractThe large amount of biomedical data derived from wearable sensors, electronic health records, and molecular profiling (e.g., genomics data) is rapidly transforming our healthcare systems. The increasing scale and scope of biomedical data not only is generating enormous opportunities for improving health outcomes but also raises new challenges ranging from data acquisition and storage to data analysis and utilization. To meet these challenges, we developed the Personal Health Dashboard (PHD), which utilizes state-of-the-art security and scalability technologies to provide an end-to-end solution for big biomedical data analytics. The PHD platform is an open-source software framework that can be easily configured and deployed to any big data health project to store, organize, and process complex biomedical data sets, support real-time data analysis at both the individual level and the cohort level, and ensure participant privacy at every step. In addition to presenting the system, we illustrate the use of the PHD framework for large-scale applications in emerging multi-omics disease studies, such as collecting and visualization of diverse data types (wearable, clinical, omics) at a personal level, investigation of insulin resistance, and an infrastructure for the detection of presymptomatic COVID-19.


Author(s):  
David A. Chambers ◽  
Eitan Amir ◽  
Ramy R. Saleh ◽  
Danielle Rodin ◽  
Nancy L. Keating ◽  
...  

The concept of “big data” research—the aggregation and analysis of biologic, clinical, administrative, and other data sources to drive new advances in biomedical knowledge—has been embraced by the cancer research enterprise. Although much of the conversation has concentrated on the amalgamation of basic biologic data (e.g., genomics, metabolomics, tumor tissue), new opportunities to extend potential contributions of big data to clinical practice and policy abound. This article examines these opportunities through discussion of three major data sources: aggregated clinical trial data, administrative data (including insurance claims data), and data from electronic health records. We will discuss the benefits of data use to answer key oncology practice and policy research questions, along with limitations inherent in these complex data sources. Finally, the article will discuss overarching themes across data types and offer next steps for the research, practice, and policy communities. The use of multiple sources of big data has the promise of improving knowledge and providing more accurate data for clinicians and policy decision makers. In the future, optimization of machine learning may allow for current limitations of big data analyses to be attenuated, thereby resulting in improved patient care and outcomes.


Author(s):  
Geoff Neideck

IntroductionDemand continues to grow for accessible and large scale linked data assets to answer complex cross-sector, and cross-jurisdiction research questions. To meet this demand, a number of Multi-source, Enduring Linked Data Assets (MELDAs) have emerged including the National Integrated Health Service Infrastructure (NIHSI), National Disability Data Asset (NDDA) and Multi-Agency Data Integration Project (MADIP). Using these MELDAs has proven much more efficient than project-specific linkages, and provides consistent national data assets for multiple uses. However, the development of these assets raises new challenges, including complex data models, governance, and access arrangements, and new approaches to analysis. Objectives and ApproachThrough developing the NIHSI Analytical Asset in collaboration with state/territory and Federal Government partners, the AIHW has identified challenges in traditional linkage approaches, which require innovative approaches to ensure high quality linkage. As AIHW commences scoping on new MELDAs, we are taking lessons from building the NIHSI and applying them to future design. ResultsAIHW’s development of MELDAs across jurisdictions and portfolios provides new learnings on how to address advanced real world data integration issues. This review will focus on lessons learnt at the Australian Institute of Health and Welfare (AIHW) working with new data sharing arrangements, applications of technologies and innovative approaches to streamline MELDA processes. Conclusion / ImplicationsThe learnings from the AIHW development of MELDAs will assist others developing enduring assets to establish effective sharing arrangements, governance and technical solutions to ensure efficient management. These learnings will save time and resources, and prompt further discussion on a gold standard for building MELDAs moving forwards.


Sign in / Sign up

Export Citation Format

Share Document