scholarly journals Functional Modeling of High-Dimensional Data: A Manifold Learning Approach

Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 406
Author(s):  
Harold A. Hernández-Roig ◽  
M. Carmen Aguilera-Morillo ◽  
Rosa E. Lillo

This paper introduces stringing via Manifold Learning (ML-stringing), an alternative to the original stringing based on Unidimensional Scaling (UDS). Our proposal is framed within a wider class of methods that map high-dimensional observations to the infinite space of functions, allowing the use of Functional Data Analysis (FDA). Stringing handles general high-dimensional data as scrambled realizations of an unknown stochastic process. Therefore, the essential feature of the method is a rearrangement of the observed values. Motivated by the linear nature of UDS and the increasing number of applications to biosciences (e.g., functional modeling of gene expression arrays and single nucleotide polymorphisms, or the classification of neuroimages) we aim to recover more complex relations between predictors through ML. In simulation studies, it is shown that ML-stringing achieves higher-quality orderings and that, in general, this leads to improvements in the functional representation and modeling of the data. The versatility of our method is also illustrated with an application to a colon cancer study that deals with high-dimensional gene expression arrays. This paper shows that ML-stringing is a feasible alternative to the UDS-based version. Also, it opens a window to new contributions to the field of FDA and the study of high-dimensional data.

2015 ◽  
Vol 23 (3) ◽  
pp. 617-626 ◽  
Author(s):  
Nophar Geifman ◽  
Sanchita Bhattacharya ◽  
Atul J Butte

Abstract Objective Cytokines play a central role in both health and disease, modulating immune responses and acting as diagnostic markers and therapeutic targets. This work takes a systems-level approach for integration and examination of immune patterns, such as cytokine gene expression with information from biomedical literature, and applies it in the context of disease, with the objective of identifying potentially useful relationships and areas for future research. Results We present herein the integration and analysis of immune-related knowledge, namely, information derived from biomedical literature and gene expression arrays. Cytokine-disease associations were captured from over 2.4 million PubMed records, in the form of Medical Subject Headings descriptor co-occurrences, as well as from gene expression arrays. Clustering of cytokine-disease co-occurrences from biomedical literature is shown to reflect current medical knowledge as well as potentially novel relationships between diseases. A correlation analysis of cytokine gene expression in a variety of diseases revealed compelling relationships. Finally, a novel analysis comparing cytokine gene expression in different diseases to parallel associations captured from the biomedical literature was used to examine which associations are interesting for further investigation. Discussion We demonstrate the usefulness of capturing Medical Subject Headings descriptor co-occurrences from biomedical publications in the generation of valid and potentially useful hypotheses. Furthermore, integrating and comparing descriptor co-occurrences with gene expression data was shown to be useful in detecting new, potentially fruitful, and unaddressed areas of research. Conclusion Using integrated large-scale data captured from the scientific literature and experimental data, a better understanding of the immune mechanisms underlying disease can be achieved and applied to research.


2008 ◽  
Vol 18 (9) ◽  
pp. 1509-1517 ◽  
Author(s):  
J. C. Marioni ◽  
C. E. Mason ◽  
S. M. Mane ◽  
M. Stephens ◽  
Y. Gilad

2014 ◽  
Vol 89 (5) ◽  
pp. 2469-2482 ◽  
Author(s):  
Jacqueline Smith ◽  
Jean-Remy Sadeyen ◽  
Colin Butter ◽  
Pete Kaiser ◽  
David W. Burt

ABSTRACTChicken whole-genome gene expression arrays were used to analyze the host response to infection by infectious bursal disease virus (IBDV). Spleen and bursal tissue were examined from control and infected birds at 2, 3, and 4 days postinfection from two lines that differ in their resistance to IBDV infection. The host response was evaluated over this period, and differences between susceptible and resistant chicken lines were examined. Antiviral genes, includingIFNA,IFNG,MX1,IFITM1,IFITM3, andIFITM5, were upregulated in response to infection. Evaluation of this gene expression data allowed us to predict several genes as candidates for involvement in resistance to IBDV.IMPORTANCEInfectious bursal disease (IBD) is of economic importance to the poultry industry and thus is also important for food security. Vaccines are available, but field strains of the virus are of increasing virulence. There is thus an urgent need to explore new control solutions, one of which would be to breed birds with greater resistance to IBD. This goal is perhaps uniquely achievable with poultry, of all farm animal species, since the genetics of 85% of the 60 billion chickens produced worldwide each year is under the control of essentially two breeding companies. In a comprehensive study, we attempt here to identify global transcriptomic differences in the target organ of the virus between chicken lines that differ in resistance and to predict candidate resistance genes.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 3360-3360
Author(s):  
Erik Wendlandt ◽  
Guido J. Tricot ◽  
Benjamin Darbro ◽  
Fenghuang Zhan

Abstract Background: Multiple myeloma is the second most common blood borne neoplasia, accounting for nearly 10% of all diagnosed hematologic malignancies and has a disproportionately high incidence in elderly populations. Here we explored copy number variations using the high fidelity CytoScan HD arrays to develop a detailed map of copy number variations and identify novel mediators of disease progression. The results from CytoScan HD microarrays provide a detailed view of the entire genome with a resolution up to 25kb. Furthermore, 750,000 single-nucleotide polymorphisms are included and the array provides information about loss of heterozygosity and uniparental disomy. Materials and methods: CytoScan HD arrays were performed on 97 myeloma patient samples to identify cytogenetic regions important to the development and progression of the disease. Gene expression profiles from 351 patients were analyzed to identify genes with a change in gene expression of 1.5 fold or more. Data from CytoScan and gene expression arrays was combined to perform chromosomal positional enrichment analysis to identify cytogenetic driver lesions, or lesions that provide a small, but significant growth and survival advantage to the cell. Furthermore, Kaplan-Meier, log-rank test and Hazard ratio analyses were performed to identify gene within the driver lesions that have a significant impact on survival when dysregulated. Results: The results from the CytoScan HD analysis closely mirrored what has been shown by FISH and SNP arrays, with gains to the odd numbered chromosomes, specifically 3, 5, 7, 9, 11, 15 and 17 as well as losses to chromosomes 1p and 13. Interestingly, we identified gains to a small region within chromosome 8p, contrary to published reports demonstrating a large scale loss of this region. We identified numerous genes within this region that are important for survival and their overexpression resulted in a decreased progression free survival. For example, Cathepsin B (CTSB) is encoded for in chromosome 8p22-p21 with an increased gene expression of at least 1.5 fold over normal controls, among others. Furthermore, Cathepsin B, a cysteine protease, has been linked to cancer of the ileum, suggesting that a similar role may be present within myeloma. We then integrated the 97 copy number profiles results with 351 myeloma gene expression profiles to identify cytogenetic driver lesions in myeloma important for disease development, progression and poor clinical outcome. Chromosomal positional enrichment analysis was employed to identify global myeloma cytogenetic driver aneuploidies as well as develop unique cytogenetic copy number profiles. Our results identified portions of chromosomes 1q, 3, 8p, 9, 13q and 16q, among others, as important driver lesions with changes to these regions providing growth advantages to the cell. Furthermore, our analysis identified five unique cytogenetic classifications based on common cytogenetic lesions. We continue to explore these driver regions to identify lesions important for the oncogenic properties of the larger regions. Conclusion: The data presented here represents a novel and highly sensitive approach for the identification of novel copy number variations and driver lesions. Furthermore, correlations between copy number variations and gene expression arrays identified novel targets important for disease progression and patient survival. CytoScan HD arrays in conjunction with gene expression analysis provided a high resolution image of important cytogenetic lesions in myeloma and identified potentially important therapeutic targets for drug development. Further work is needed to validate our findings and determine the therapeutic efficacy of the identified targets. Disclosures No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document