scholarly journals A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification

2018 ◽  
Author(s):  
Ren-Hua Chung ◽  
Chen-Yu Kang

AbstractAn integrative multi-omics analysis approach that combines multiple types of omics data including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and microbiomics, has become increasing popular for understanding the pathophysiology of complex diseases. Although many multi-omics analysis methods have been developed for complex disease studies, there is no simulation tool that simulates multiple types of omics data and models their relationships with disease status. Without such a tool, it is difficult to evaluate the multi-omics analysis methods on the same scale and to estimate the sample size or power when planning a new multi-omics disease study. We developed a multi-omics data simulator OmicsSIMLA, which simulates genomics (i.e., SNPs and copy number variations), epigenomics (i.e., whole-genome bisulphite sequencing), transcriptomics (i.e., RNA-seq), and proteomics (i.e., normalized reverse phase protein array) data at the whole-genome level. Furthermore, the relationships between different types of omics data, such as meQTLs (SNPs influencing methylation), eQTLs (SNPs influencing gene expression), and eQTM (methylation influencing gene expression), were modeled. More importantly, the relationships between these multi-omics data and the disease status were modeled as well. We used OmicsSIMLA to simulate a multi-omics dataset for breast cancer under a hypothetical disease model, and used the data to compare the performance among existing multi-omics analysis methods in terms of disease classification accuracy and run time. Our results demonstrated that complex disease mechanisms can be simulated by OmicsSIMLA, and a random forest-based method showed the highest prediction accuracy when the multi-omics data were properly normalized.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Hong-Yan Liu ◽  
Liyuan Zhou ◽  
Meng-Yue Zheng ◽  
Jia Huang ◽  
Shu Wan ◽  
...  

AbstractRare diseases are usually chronically debilitating or even life-threatening with diagnostic and therapeutic challenges in current clinical practice. It has been estimated that 80% of rare diseases are genetic in origin, and thus genome sequencing-based diagnosis offers a promising alternative for rare-disease management. In this study, 79 individuals from 16 independent families were performed for whole-genome sequencing (WGS) in an effort to identify the causative mutations for 16 distinct rare diseases that are largely clinically intractable. Comprehensive analysis of variations, including simple nucleotide variants (SNVs), copy-number variations (CNVs), and structural variations (SVs), was implemented using the WGS data. A flexible analysis pipeline that allowed a certain degree of misclassification of disease status was developed to facilitate the identification of causative variants. As a result, disease-causing variants were identified in 10 of the 16 investigated diseases, yielding a diagnostic rate of 62.5%. Additionally, new potentially pathogenic variants were discovered for two disorders, including IGF2/INS-IGF2 in mitochondrial disease and FBN3 in Klippel–Trenaunay–Weber syndrome. Our WGS analysis not only detected a CNV associated with 3p deletion syndrome but also captured a simple sequence repeat (SSR) variation associated with Machado–Joseph disease. To our knowledge, this is the first time the clinical WGS analysis of short-read sequences has been used successfully to identify a causative SSR variation that perfectly segregates with a repeat expansion disorder. After the WGS analysis, we confirmed the initial diagnosis for three of 10 established disorders and modified or corrected the initial diagnosis for the remaining seven disorders. In summary, clinical WGS is a powerful tool for the diagnosis of rare diseases, and its diagnostic clarity at molecular levels offers important benefits for the participating families.


2016 ◽  
Vol 14 (04) ◽  
pp. 1650015 ◽  
Author(s):  
Worrawat Engchuan ◽  
Asawin Meechai ◽  
Sissades Tongsima ◽  
Narumol Doungpan ◽  
Jonathan H. Chan

Cancer is a complex disease that cannot be diagnosed reliably using only single gene expression analysis. Using gene-set analysis on high throughput gene expression profiling controlled by various environmental factors is a commonly adopted technique used by the cancer research community. This work develops a comprehensive gene expression analysis tool (gene-set activity toolbox: (GAT)) that is implemented with data retriever, traditional data pre-processing, several gene-set analysis methods, network visualization and data mining tools. The gene-set analysis methods are used to identify subsets of phenotype-relevant genes that will be used to build a classification model. To evaluate GAT performance, we performed a cross-dataset validation study on three common cancers namely colorectal, breast and lung cancers. The results show that GAT can be used to build a reasonable disease diagnostic model and the predicted markers have biological relevance. GAT can be accessed from http://gat.sit.kmutt.ac.th where GAT’s java library for gene-set analysis, simple classification and a database with three cancer benchmark datasets can be downloaded.


BioTech ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 3
Author(s):  
Yinhao Du ◽  
Kun Fan ◽  
Xi Lu ◽  
Cen Wu

Gene-environment (G×E) interaction is critical for understanding the genetic basis of complex disease beyond genetic and environment main effects. In addition to existing tools for interaction studies, penalized variable selection emerges as a promising alternative for dissecting G×E interactions. Despite the success, variable selection is limited in terms of accounting for multidimensional measurements. Published variable selection methods cannot accommodate structured sparsity in the framework of integrating multiomics data for disease outcomes. In this paper, we have developed a novel variable selection method in order to integrate multi-omics measurements in G×E interaction studies. Extensive studies have already revealed that analyzing omics data across multi-platforms is not only sensible biologically, but also resulting in improved identification and prediction performance. Our integrative model can efficiently pinpoint important regulators of gene expressions through sparse dimensionality reduction, and link the disease outcomes to multiple effects in the integrative G×E studies through accommodating a sparse bi-level structure. The simulation studies show the integrative model leads to better identification of G×E interactions and regulators than alternative methods. In two G×E lung cancer studies with high dimensional multi-omics data, the integrative model leads to an improved prediction and findings with important biological implications.


Author(s):  
Jingyi Li ◽  
Mi-Ok Lee ◽  
Brian W Davis ◽  
Ping Wu ◽  
Shu-Man Hsieh-Li ◽  
...  

Abstract The Crest mutation in chicken shows incomplete dominance and causes a spectacular phenotype in which the small feathers normally present on the head are replaced by much larger feathers normally present only in dorsal skin. Using whole genome sequencing, we show that the crest phenotype is caused by a 197 bp duplication of an evolutionarily conserved sequence located in the intron of HOXC10 on chromosome 33. A diagnostic test showed that the duplication was present in all 54 crested chickens representing eight breeds and absent from all 433 non-crested chickens representing 214 populations. The mutation causes ectopic expression of at least five closely linked HOXC genes, including HOXC10, in cranial skin of crested chickens. The result is consistent with the interpretation that the crest feathers are caused by an altered body region identity. The upregulated HOXC gene expression is expanded to skull tissue of Polish chickens showing a large crest often associated with cerebral hernia, but not in Silkie chickens characterized by a small crest, both homozygous for the duplication. Thus, the 197 bp duplication is required for the development of a large crest and susceptibility to cerebral hernia because only crested chicken show this malformation. However, this mutation is not sufficient to cause herniation because this malformation is not present in breeds with a small crest, like Silkie chickens.


Author(s):  
Yifan Zhang ◽  
Weiwei Jiang ◽  
Jun Xu ◽  
Na Wu ◽  
Yang Wang ◽  
...  

ObjectiveThe gut microbiota is associated with nonalcoholic fatty liver disease (NAFLD). We isolated the Escherichia coli strain NF73-1 from the intestines of a NASH patient and then investigated its effect and underlying mechanism.Methods16S ribosomal RNA (16S rRNA) amplicon sequencing was used to detect bacterial profiles in healthy controls, NAFLD patients and NASH patients. Highly enriched E. coli strains were cultured and isolated from NASH patients. Whole-genome sequencing and comparative genomics were performed to investigate gene expression. Depending on the diet, male C57BL/6J mice were further grouped in normal diet (ND) and high-fat diet (HFD) groups. To avoid disturbing the bacterial microbiota, some of the ND and HFD mice were grouped as “bacteria-depleted” mice and treated with a cocktail of broad-spectrum antibiotic complex (ABX) from the 8th to 10th week. Then, E. coli NF73-1, the bacterial strain isolated from NASH patients, was administered transgastrically for 6 weeks to investigate its effect and mechanism in the pathogenic progression of NAFLD.ResultsThe relative abundance of Escherichia increased significantly in the mucosa of NAFLD patients, especially NASH patients. The results from whole-genome sequencing and comparative genomics showed a specific gene expression profile in E. coli strain NF73-1, which was isolated from the intestinal mucosa of NASH patients. E. coli NF73-1 accelerates NAFLD independently. Only in the HFD-NF73-1 and HFD-ABX-NF73-1 groups were EGFP-labeled E. coli NF73-1 detected in the liver and intestine. Subsequently, translocation of E. coli NF73-1 into the liver led to an increase in hepatic M1 macrophages via the TLR2/NLRP3 pathway. Hepatic M1 macrophages induced by E. coli NF73-1 activated mTOR-S6K1-SREBP-1/PPAR-α signaling, causing a metabolic switch from triglyceride oxidation toward triglyceride synthesis in NAFLD mice.ConclusionsE. coli NF73-1 is a critical trigger in the progression of NAFLD. E. coli NF73-1 might be a specific strain for NAFLD patients.


Biomolecules ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 565
Author(s):  
Satoshi Takahashi ◽  
Masamichi Takahashi ◽  
Shota Tanaka ◽  
Shunsaku Takayanagi ◽  
Hirokazu Takami ◽  
...  

Although the incidence of central nervous system (CNS) cancers is not high, it significantly reduces a patient’s quality of life and results in high mortality rates. A low incidence also means a low number of cases, which in turn means a low amount of information. To compensate, researchers have tried to increase the amount of information available from a single test using high-throughput technologies. This approach, referred to as single-omics analysis, has only been partially successful as one type of data may not be able to appropriately describe all the characteristics of a tumor. It is presently unclear what type of data can describe a particular clinical situation. One way to solve this problem is to use multi-omics data. When using many types of data, a selected data type or a combination of them may effectively resolve a clinical question. Hence, we conducted a comprehensive survey of papers in the field of neuro-oncology that used multi-omics data for analysis and found that most of the papers utilized machine learning techniques. This fact shows that it is useful to utilize machine learning techniques in multi-omics analysis. In this review, we discuss the current status of multi-omics analysis in the field of neuro-oncology and the importance of using machine learning techniques.


2012 ◽  
Vol 10 (01) ◽  
pp. 1240007 ◽  
Author(s):  
CHENGCHENG SHEN ◽  
YING LIU

Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.


Sign in / Sign up

Export Citation Format

Share Document