Perspectives on Data Integration in Human Complex Disease Analysis

Biotechnology ◽

10.4018/978-1-5225-8903-7.ch075 ◽

2019 ◽

pp. 1826-1866

Author(s):

Kristel Van Steen ◽

Nuria Malats

Keyword(s):

Data Integration ◽

Complex Disease ◽

Data Generation ◽

Data Types ◽

Slow Pace ◽

Methods Development ◽

Data Source ◽

Molecular Phenotypes ◽

Disease Analysis ◽

Human Complex

The identification of causal or predictive variants/genes/mechanisms for disease-associated traits is characterized by “complex” networks of molecular phenotypes. Present technology and computer power allow building and processing large collections of these data types. However, the super-rapid data generation is counterweighted by a slow-pace for data integration methods development. Most currently available integrative analytic tools pertain to pairing omics data and focus on between-data source relationships, making strong assumptions about within-data source architectures. A limited number of initiatives exist aiming to find the most optimal ways to analyze multiple, possibly related, omics databases, and fully acknowledge the specific characteristics of each data type. A thorough understanding of the underlying assumptions of integrative methods is needed to draw sound conclusions afterwards. In this chapter, the authors discuss how the field of “integromics” has evolved and give pointers towards essential research developments in this context.

Download Full-text

MultiBaC: A strategy to remove batch effects between different omic data types

Statistical Methods in Medical Research ◽

10.1177/0962280220907365 ◽

2020 ◽

Vol 29 (10) ◽

pp. 2851-2864

Author(s):

Manuel Ugidos ◽

Sonia Tarazona ◽

José M Prats-Montalbán ◽

Alberto Ferrer ◽

Ana Conesa

Keyword(s):

Data Integration ◽

Biological Effects ◽

Meta Analysis ◽

Simulated Data ◽

Batch Effect ◽

Data Generation ◽

Batch Effects ◽

Data Types ◽

Integration Problem ◽

Omic Data

Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform—i.e. gene expression— is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.

Download Full-text

A dual ranking algorithm based on the multiplex network for heterogeneous complex disease analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3059046 ◽

2021 ◽

pp. 1-1

Author(s):

Xingyi Li ◽

Ju Xiang ◽

Fangxiang Wu ◽

Min Li

Keyword(s):

Complex Disease ◽

Ranking Algorithm ◽

Multiplex Network ◽

Disease Analysis

Download Full-text

Social epigenomics: are we at an impasse?

Epigenomics ◽

10.2217/epi-2020-0136 ◽

2021 ◽

Author(s):

Amy L Non

Keyword(s):

Child Maltreatment ◽

Data Integration ◽

Social Factors ◽

Complex Diseases ◽

Cross Cultural ◽

Data Types ◽

Social Scientists ◽

High Expectations ◽

The Future ◽

Collaborative Efforts

Aim: Social scientists have placed particularly high expectations on the study of epigenomics to explain how exposure to adverse social factors like poverty, child maltreatment and racism – particularly early in childhood – might contribute to complex diseases. However, progress has stalled, reflecting many of the same challenges faced in genomics, including overhype, lack of diversity in samples, limited replication and difficulty interpreting significance of findings. Materials & methods: This review focuses on the future of social epigenomics by discussing progress made, ongoing methodological and analytical challenges and suggestions for improvement. Results & conclusion: Recommendations include more diverse sample types, cross-cultural, longitudinal and multi-generational studies. True integration of social and epigenomic data will require increased access to both data types in publicly available databases, enhanced data integration frameworks, and more collaborative efforts between social scientists and geneticists.

Download Full-text

Data Integration Progression in Large Data Source Using Mapping Affinity

2014 7th International Conference on Advanced Software Engineering and Its Applications ◽

10.1109/asea.2014.11 ◽

2014 ◽

Cited By ~ 8

Author(s):

Bagrudeen Bazeer Ahamed ◽

Thirunavukarasu Ramkumar ◽

Shanmugasundaram Hariharan

Keyword(s):

Data Integration ◽

Large Data ◽

Data Source

Download Full-text

Data Integration of Heterogeneous Data Source in Multi-parameter Test Processing

Advances in Intelligent and Soft Computing - Software Engineering and Knowledge Engineering: Theory and Practice ◽

10.1007/978-3-642-25349-2_123 ◽

2012 ◽

pp. 929-934

Author(s):

Wang Guitang ◽

Liu Wenjuan ◽

Jiang Yuelong

Keyword(s):

Data Integration ◽

Heterogeneous Data ◽

Heterogeneous Data Source ◽

Data Source ◽

Parameter Test

Download Full-text

Semantic-Based Geospatial Data Integration With Unique Features

Geospatial Intelligence ◽

10.4018/978-1-5225-8054-6.ch012 ◽

2019 ◽

pp. 254-277 ◽

Cited By ~ 1

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Experimental Results ◽

Data Sources ◽

Data Format ◽

Access Protocols ◽

Data Source

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Download Full-text

Semantic Web and Geospatial Unique Features Based Geospatial Data Integration

Geospatial Intelligence ◽

10.4018/978-1-5225-8054-6.ch011 ◽

2019 ◽

pp. 230-253

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Semantic Web ◽

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Data Sources ◽

Modeling Process ◽

Translation Function ◽

Data Source

Since large amount of geospatial data are produced by various sources and stored in incompatible formats, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. First, we provide a uniform integration paradigm for users to retrieve geospatial data. Then, we align the retrieved geospatial data in the modeling process to eliminate heterogeneity with the help of Karma. Our main contribution focuses on addressing the third problem. Previous work has been done by defining a set of semantic rules for performing the linking process. However, the geospatial data has some specific geospatial relationships, which is significant for linking but cannot be solved by the Semantic Web techniques directly. We take advantage of such unique features about geospatial data to implement the linking process. In addition, the previous work will meet a complicated problem when the geospatial data sources are in different languages. In contrast, our proposed linking algorithms are endowed with translation function, which can save the translating cost among all the geospatial sources with different languages. Finally, the geospatial data is integrated by eliminating data redundancy and combining the complementary properties from the linked records. We mainly adopt four kinds of geospatial data sources, namely, OpenStreetMap(OSM), Wikmapia, USGS and EPA, to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Download Full-text

BigGIS With Hadoop in MapReduce Environment

Handbook of Research on Digital Research Methods and Architectural Tools in Urban Planning and Design - Advances in Civil and Industrial Engineering ◽

10.4018/978-1-5225-9238-9.ch002 ◽

2019 ◽

pp. 25-32

Author(s):

Nada M. Alhakkak

Keyword(s):

Big Data ◽

Scheduling Algorithm ◽

Real Data ◽

Map Reduce ◽

Data Types ◽

Simulated Environment ◽

Merge Sort ◽

Data Source ◽

Sort Algorithm ◽

And Storage

BigGIS is a new product that resulted from developing GIS in the “Big Data” area, which is used in storing and processing big geographical data and helps in solving its issues. This chapter describes an optimized Big GIS framework in Map Reduce Environment M2BG. The suggested framework has been integrated into Map Reduce Environment in order to solve the storage issues and get the benefit of the Hadoop environment. M2BG include two steps: Big GIS warehouse and Big GIS Map Reduce. The first step contains three main layers: Data Source and Storage Layer (DSSL), Data Processing Layer (DPL), and Data Analysis Layer (DAL). The second layer is responsible for clustering using swarms as inputs for the Hadoop phase. Then it is scheduled in the mapping part with the use of a preempted priority scheduling algorithm; some data types are classified as critical and some others are ordinary data type; the reduce part used, merge sort algorithm M2BG, should solve security and be implemented with real data in the simulated environment and later in the real world.

Download Full-text

Semantic-Based Geospatial Data Integration With Unique Features

Innovations, Developments, and Applications of Semantic Web and Information Systems - Advances in Web Technologies and Engineering ◽

10.4018/978-1-5225-5042-6.ch015 ◽

2018 ◽

pp. 393-416

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Experimental Results ◽

Data Sources ◽

Data Format ◽

Access Protocols ◽

Data Source

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Download Full-text