Pattern Differentiations and Formulations for Heterogeneous Genomic Data through Hybrid Approaches

Author(s):  
Arpad Kelemen ◽  
Yulan Liang

Pattern differentiations and formulations are two main research tracks for heterogeneous genomic data pattern analysis. In this chapter, we develop hybrid methods to tackle the major challenges of power and reproducibility of the dynamic differential gene temporal patterns. The significant differentially expressed genes are selected not only from significant statistical analysis of microarrays but also supergenes resulting from singular value decomposition for extracting the gene components which can maximize the total predictor variability. Furthermore, hybrid clustering methods are developed based on resulting profiles from several clustering methods. We demonstrate the developed hybrid analysis through an application to a time course gene expression data from interferon-b-1a treated multiple sclerosis patients. The resulting integrated-condensed clusters and overrepresented gene lists demonstrate that the hybrid methods can successfully be applied. The post analysis includes function analysis and pathway discovery to validate the findings of the hybrid methods.

2018 ◽  
Author(s):  
Yeping Lina Qiu ◽  
Hong Zheng ◽  
Olivier Gevaert

AbstractMotivationThe presence of missing values is a frequent problem encountered in genomic data analysis. Lost data can be an obstacle to downstream analyses that require complete data matrices. State-of-the-art imputation techniques including Singular Value Decomposition (SVD) and K-Nearest Neighbors (KNN) based methods usually achieve good performances, but are computationally expensive especially for large datasets such as those involved in pan-cancer analysis.ResultsThis study describes a new method: a denoising autoencoder with partial loss (DAPL) as a deep learning based alternative for data imputation. Results on pan-cancer gene expression data and DNA methylation data from over 11,000 samples demonstrate significant improvement over standard denoising autoencoder for both data missing-at-random cases with a range of missing percentages, and missing-not-at-random cases based on expression level and GC-content. We discuss the advantages of DAPL over traditional imputation methods and show that it achieves comparable or better performance with less computational burden.Availabilityhttps://github.com/gevaertlab/[email protected]


Author(s):  
Hongchuan Cheng ◽  
Yimin Zhang ◽  
Wenjia Lu ◽  
Zhou Yang

To obtain the fault features of the bearing, a method based on variational mode decomposition (VMD), singular value decomposition (SVD) is proposed for fault diagnosis by Gath–Geva (G–G) fuzzy clustering. Firstly, the original signals are decomposed into mode components by VMD accurately and adaptively, and the spatial condition matrix (SCM) can be obtained. The SCM utilized as the reconstruction matrix of SVD can inherit the time delay parameter and embedded dimension automatically, and then the first three singular values from the SCM are used as fault eigenvalues to decrease the feature dimension and improve the computational efficiency. G–G clustering, one of the unsupervised machine learning fuzzy clustering techniques, is employed to obtain the clustering centers and membership matrices under various bearing faults. Finally, Hamming approach degree between the test samples and the known cluster centers is calculated to realize the bearing fault identification. By comparing with EEMD and EMD based on a recursive decomposition algorithm, VMD adopts a novel completely nonrecursive method to avoid mode mixing and end effects. Furthermore, the IMF components calculated from VMD include large amounts of fault information. G–G clustering is not limited by the shapes, sizes and densities in comparison with other clustering methods. VMD and G–G clustering are more suitable for fault diagnosis of the bearing system, and the results of experiment and engineering analysis show that the proposed method can diagnose bearing faults accurately and effectively.


mBio ◽  
2017 ◽  
Vol 8 (6) ◽  
Author(s):  
Jake V. Bailey ◽  
Beverly E. Flood ◽  
Elizabeth Ricci ◽  
Nathalie Delherbe

ABSTRACT The largest known bacteria, Thiomargarita spp., have yet to be isolated in pure culture, but their large size allows for individual cells to be monitored in time course experiments or to be individually sorted for omics-based investigations. Here we investigated the metabolism of individual cells of Thiomargarita spp. by using a novel application of a tetrazolium-based dye that measures oxidoreductase activity. When coupled with microscopy, staining of the cells with a tetrazolium-formazan dye allows metabolic responses in Thiomargarita spp. to be to be tracked in the absence of observable cell division. Additionally, the metabolic activity of Thiomargarita sp. cells can be differentiated from the metabolism of other microbes in specimens that contain adherent bacteria. The results of our redox dye-based assay suggest that Thiomargarita is the most metabolically versatile under anoxic conditions, where it appears to express cellular oxidoreductase activity in response to the electron donors succinate, acetate, citrate, formate, thiosulfate, H2, and H2S. Under hypoxic conditions, formazan staining results suggest the metabolism of succinate and likely acetate, citrate, and H2S. Cells incubated under oxic conditions showed the weakest formazan staining response, and then only to H2S, citrate, and perhaps succinate. These results provide experimental validation of recent genomic studies of Candidatus Thiomargarita nelsonii that suggest metabolic plasticity and mixotrophic metabolism. The cellular oxidoreductase response of bacteria attached to the exterior of Thiomargarita also supports the possibility of trophic interactions between these largest of known bacteria and attached epibionts. IMPORTANCE The metabolic potential of many microorganisms that cannot be grown in the laboratory is known only from genomic data. Genomes of Thiomargarita spp. suggest that these largest of known bacteria are mixotrophs, combining lithotrophic metabolism with organic carbon degradation. Our use of a redox-sensitive tetrazolium dye to query the metabolism of these bacteria provides an independent line of evidence that corroborates the apparent metabolic plasticity of Thiomargarita observed in recently produced genomes. Finding new cultivation-independent means of testing genomic results is critical to testing genome-derived hypotheses on the metabolic potentials of uncultivated microorganisms. IMPORTANCE The metabolic potential of many microorganisms that cannot be grown in the laboratory is known only from genomic data. Genomes of Thiomargarita spp. suggest that these largest of known bacteria are mixotrophs, combining lithotrophic metabolism with organic carbon degradation. Our use of a redox-sensitive tetrazolium dye to query the metabolism of these bacteria provides an independent line of evidence that corroborates the apparent metabolic plasticity of Thiomargarita observed in recently produced genomes. Finding new cultivation-independent means of testing genomic results is critical to testing genome-derived hypotheses on the metabolic potentials of uncultivated microorganisms.


Genes ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 410
Author(s):  
Katia Cappelli ◽  
Samanta Mecocci ◽  
Silvia Gioiosa ◽  
Andrea Giontella ◽  
Maurizio Silvestrelli ◽  
...  

Physical exercise is universally recognized as stressful. Among the “sport species”, the horse is probably the most appropriate model for investigating the genomic response to stress due to the homogeneity of its genetic background. The aim of this work is to dissect the whole transcription modulation in Peripheral Blood Mononuclear Cells (PBMCs) after exercise with a time course framework focusing on unexplored regions related to introns and intergenic portions. PBMCs NGS from five 3 year old Sardinian Anglo-Arab racehorses collected at rest and after a 2000 m race was performed. Apart from differential gene expression ascertainment between the two time points the complexity of transcription for alternative transcripts was identified. Interestingly, we noted a transcription shift from the coding to the non-coding regions. We further investigated the possible causes of this phenomenon focusing on genomic repeats, using a differential expression approach and finding a strong general up-regulation of repetitive elements such as LINE. Since their modulation is also associated with the “exonization”, the recruitment of repeats that act with regulatory functions, suggesting that there might be an active regulation of this transcriptional shift. Thanks to an innovative bioinformatic approach, our study could represent a model for the transcriptomic investigation of stress.


2004 ◽  
Vol 27 (4) ◽  
pp. 623-631 ◽  
Author(s):  
Ivan G. Costa ◽  
Francisco de A. T. de Carvalho ◽  
Marcílio C. P. de Souto

Cancers ◽  
2019 ◽  
Vol 11 (10) ◽  
pp. 1434 ◽  
Author(s):  
Max Pfeffer ◽  
André Uschmajew ◽  
Adriana Amaro ◽  
Ulrich Pfeffer

Uveal melanoma (UM) is a rare cancer that is well characterized at the molecular level. Two to four classes have been identified by the analyses of gene expression (mRNA, ncRNA), DNA copy number, DNA-methylation and somatic mutations yet no factual integration of these data has been reported. We therefore applied novel algorithms for data fusion, joint Singular Value Decomposition (jSVD) and joint Constrained Matrix Factorization (jCMF), as well as similarity network fusion (SNF), for the integration of gene expression, methylation and copy number data that we applied to the Cancer Genome Atlas (TCGA) UM dataset. Variant features that most strongly impact on definition of classes were extracted for biological interpretation of the classes. Data fusion allows for the identification of the two to four classes previously described. Not all of these classes are evident at all levels indicating that integrative analyses add to genomic discrimination power. The classes are also characterized by different frequencies of somatic mutations in putative driver genes (GNAQ, GNA11, SF3B1, BAP1). Innovative data fusion techniques confirm, as expected, the existence of two main types of uveal melanoma mainly characterized by copy number alterations. Subtypes were also confirmed but are somewhat less defined. Data fusion allows for real integration of multi-domain genomic data.


2007 ◽  
Vol 13 (9) ◽  
pp. 1138-1145 ◽  
Author(s):  
T. Kümpfel ◽  
M. Schwan ◽  
Th. Pollmächer ◽  
A. Yassouridis ◽  
M. Uhr ◽  
...  

During initiation of interferon-beta (IFN-β) therapy, many multiple sclerosis (MS) patients experience systemic side effects which may depend on the time point of IFN-β injection. We investigated the time course of plasma hormone-, cytokine- and cytokine-receptor concentrations after the first injection of IFN-β either at 8.00 a.m. (group A) or at 6.00 p.m. (group B) and quantified clinical side effects within the first 9 h in 16 medication free patients with relapsing-remitting MS. This investigation was repeated after 6-month IFN-β therapy. Plasma ACTH and cortisol concentrations followed their physiological rhythms, with lower levels in the evening compared to the morning, but raised earlier and stronger in group B after IFN-β administration. IFN-β injection in the evening led to a prompter increase of plasma IL-6 concentrations and temperature during the first hours and correlated to more intense clinical side effects compared to group A. Plasma IL-10 concentrations increased more in group A compared to group B, but sTNF-RI and sTNF-RII concentrations raised 7 h after IFN-β injection only in group B. Acute effects on plasma hormone and cytokine concentrations adapted after 6-month IFN-β treatment, while diurnal variations were still present. Baseline sTNF-RII concentrations were elevated after 6-month IFN-β therapy only in group A. Our results show that time point of IFN-β injection has differential effects on acute changes of plasma hormone and cytokine concentrations and is related to systemic side effects. This may have implications on the tolerability and effectiveness of IFN-β therapy. Multiple Sclerosis 2007; 13: 1138—1145. http://msj.sagepub.com


Author(s):  
Olivier Gascuel ◽  
Bernadette Bouchon-Meunier ◽  
Gilles Caraux ◽  
Patrick Gallinari ◽  
Alain Guénoche ◽  
...  

Supervised classification has already been the subject of numerous studies in the fields of Statistics, Pattern Recognition and Artificial Intelligence under various appellations which include discriminant analysis, discrimination and concept learning. Many practical applications relating to this field have been developed. New methods have appeared in recent years, due to developments concerning Neural Networks and Machine Learning. These "hybrid" approaches share one common factor in that they combine symbolic and numerical aspects. The former are characterized by the representation of knowledge, the latter by the introduction of frequencies and probabilistic criteria. In the present study, we shall present a certain number of hybrid methods, conceived (or improved) by members of the SYMENU research group. These methods issue mainly from Machine Learning and from research on Classification Trees done in Statistics, and they may also be qualified as "rule-based". They shall be compared with other more classical approaches. This comparison will be based on a detailed description of each of the twelve methods envisaged, and on the results obtained concerning the "Waveform Recognition Problem" proposed by Breiman et al.,4 which is difficult for rule based approaches.


Sign in / Sign up

Export Citation Format

Share Document