scholarly journals Challenges in the Integration of Omics and Non-Omics Data

Genes ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 238 ◽  
Author(s):  
Evangelina López de Maturana ◽  
Lola Alonso ◽  
Pablo Alarcón ◽  
Isabel Adoración Martín-Antoniano ◽  
Silvia Pineda ◽  
...  

Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm’s predictive ability. Only a small number of published studies performed a “real” integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.

Author(s):  
Vikrant Tiwari ◽  
Nimisha Sharma

In the absence of the detailed COVID-19 epidemiological data or large benchmark studies, an effort has been made to explore and correlate the relation of parameters like environment, economic indicators, and the large scale exposure of different prevalent diseases, with COVID-19 spread and severity amongst the different countries affected by COVID-19. Data for environmental, socio-economic and others important infectious diseases were collected from reliable and open source resources like World Health Organization, World Bank, etc. Further, this large data set is utilized to understand the COVID-19 worldwide spread using simple statistical tools. Important observations that are made in this study are the high degree of resemblance in the pattern of temperature and humidity distribution among the cities severely affected by COVID-19. Further, It is surprising to see that in spite of the presence of many environmental parameters that are considered favorable (like clean air, clean water, EPI, etc.), many countries are suffering with the severe consequences of this disease. Lastly a noticeable segregation among the locations affected by different prevalent diseases (like Malaria, HIV, Tuberculosis, and Cholera) was also observed. Among the considered environmental factors, temperature, humidity and EPI should be an important parameter in understanding and modelling COVID-19 spreads. Further, contrary to intuition, countries with strong economies, good health infrastructure and cleaner environment suffered disproportionately higher with the severity of this disease. Therefore, policymaker should sincerely review their country preparedness toward the potential future contagious diseases, weather natural or manmade.


2018 ◽  
Vol 62 (4) ◽  
pp. 563-574 ◽  
Author(s):  
Charlotte Ramon ◽  
Mattia G. Gollub ◽  
Jörg Stelling

At genome scale, it is not yet possible to devise detailed kinetic models for metabolism because data on the in vivo biochemistry are too sparse. Predictive large-scale models for metabolism most commonly use the constraint-based framework, in which network structures constrain possible metabolic phenotypes at steady state. However, these models commonly leave many possibilities open, making them less predictive than desired. With increasingly available –omics data, it is appealing to increase the predictive power of constraint-based models (CBMs) through data integration. Many corresponding methods have been developed, but data integration is still a challenge and existing methods perform less well than expected. Here, we review main approaches for the integration of different types of –omics data into CBMs focussing on the methods’ assumptions and limitations. We argue that key assumptions – often derived from single-enzyme kinetics – do not generally apply in the context of networks, thereby explaining current limitations. Emerging methods bridging CBMs and biochemical kinetics may allow for –omics data integration in a common framework to provide more accurate predictions.


2018 ◽  
Vol 3 (3) ◽  
pp. 2473011418S0025
Author(s):  
Jeff Houck ◽  
Jillian Santer ◽  
Judith Baumhauer

Category: Other Introduction/Purpose: The patient acceptable symptom state (PASS) is a validated question establishing if patients activity and symptoms are at a satisfactory low level for pain and function. Surprisingly, ~20% of foot and ankle patients at their initial visit present for care with an acceptable symptom state (i.e. PASS yes). These patients are important to identify to prevent over treatment and avoid excessive cost. It is also unclear what health domains (Pain Interference (PI), Physical Function (PF), or Depression (Dep)) influence a patients judgement of their PASS state (i.e. why they are seeking treatment). The purpose of this analysis is to document the prevalance of PASS state and determine the health domains that discriminate PASS patients and predict PASS state at the initiation of rehabilitation. Methods: Patient reported outcomes measurement information system (PROMIS) computer adaptive test (CAT) scales PF, pain PIand Dep and PASS ratings starting in summer 2017 were routinely collected for patient care. Of 746 unique patients in this data set, 114 patients had ICD-10 codes that were specific to the foot and ankle. Average age was 51years (±18) and 54.4% were female. Patients were seen an average of 19.8(±15.9) days from their referral and were billed as low (51.7%), moderate (44.7%) and high complexity (2.7%) evaluations per current procedural code (CPT) visits. ANOVA models were used to evaluate differences in PROMIS scales by PASS state (Yes/No). The area under receiver operator curve (AUC) was used to determine the predictive ability of each PROMIS scale to determine a PASS state. Thresholds for near 95% specificity were also calculated for a PASS Yes state for each PROMIS scale. Results: The prevalance of PASS Yes patients was 13.2% (15/114). Pass Yes patients were significantly better by an average of 7.2 to 8.0 points across all PROMIS health domains compared to PASS No patients (Table 1). ROC analysis suggested that Dep (AUC=0.73(0.07) p=0.005) was the highest predictor of PASS status followed by PI (AUC=0.70(0.08) p=0.012) and PF (AUC=0.69(0.07) p=0.18). The threshold PROMIS t-score values for determining PASS Yes with nearest 95% specificity were PF = 51.9, PI = 50.6, and Dep = 34. Conclusion: Surprising, yet consistent with previous data, 13.2% of patients at their initial physical therapy consultation rated themselves at an acceptable level of activity and symptoms. Health domains of physical function, pain interference, and depression were better in these patients and showed moderate ability (AUC~0.7) to identify these patients. The PROMIS thresholds suggest patients are identified by pain and physical function equal to the average of the US population (PROMIS T-Score ~50) and extremely low depression scores (34). Clinically it is important to recognize these patients and purposefully provide treatments that reinforce their self efficacy and prevent unnecessary costly treatments.


2017 ◽  
Author(s):  
Florian Rohart ◽  
Benoît Gautier ◽  
Amrit Singh ◽  
Kim-Anh Lê Cao

AbstractThe advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently.We introducemixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a system biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latestmixOmicsintegrative frameworks for the multivariate analyses of ‘omics data available from the package.


2019 ◽  
Vol 21 (2) ◽  
pp. 541-552 ◽  
Author(s):  
Cécile Chauvel ◽  
Alexei Novoloaca ◽  
Pierre Veyre ◽  
Frédéric Reynier ◽  
Jérémie Becker

Abstract Recent advances in sequencing, mass spectrometry and cytometry technologies have enabled researchers to collect large-scale omics data from the same set of biological samples. The joint analysis of multiple omics offers the opportunity to uncover coordinated cellular processes acting across different omic layers. In this work, we present a thorough comparison of a selection of recent integrative clustering approaches, including Bayesian (BCC and MDI) and matrix factorization approaches (iCluster, moCluster, JIVE and iNMF). Based on simulations, the methods were evaluated on their sensitivity and their ability to recover both the correct number of clusters and the simulated clustering at the common and data-specific levels. Standard non-integrative approaches were also included to quantify the added value of integrative methods. For most matrix factorization methods and one Bayesian approach (BCC), the shared and specific structures were successfully recovered with high and moderate accuracy, respectively. An opposite behavior was observed on non-integrative approaches, i.e. high performances on specific structures only. Finally, we applied the methods on the Cancer Genome Atlas breast cancer data set to check whether results based on experimental data were consistent with those obtained in the simulations.


2021 ◽  
Author(s):  
Michael F. Adamer ◽  
Sarah C. Brueningk ◽  
Alejandro Tejada-Arranz ◽  
Fabienne Estermann ◽  
Marek Basler ◽  
...  

With the steadily increasing abundance of omics data produced all over the world, sometimes decades apart and under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch effect removal for entire databases lies in the large number and coincide of both batches and desired, biological variation resulting in design matrix singularity. This problem currently cannot be solved by any common batch correction algorithm. In this study, we present reComBat, a regularised version of the empirical Bayes method to overcome this limitation. We demonstrate our approach for the harmonisation of public gene expression data of the human opportunistic pathogen Pseudomonas aeruginosa and study a several metrics to empirically demonstrate that batch effects are successfully mitigated while biologically meaningful gene expression variation is retained. reComBat fills the gap in batch correction approaches applicable to large scale, public omics databases and opens up new avenues for data driven analysis of complex biological processes beyond the scope of a single study.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.


Sign in / Sign up

Export Citation Format

Share Document