Complex Genetic Interactions/Data Mining/Dimensionality Reduction

2021 ◽  
pp. 265-277
Author(s):  
William S. Bush ◽  
Stephen D. Turner
2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Jennifer Luyapan ◽  
Xuemei Ji ◽  
Siting Li ◽  
Xiangjun Xiao ◽  
Dakai Zhu ◽  
...  

Abstract Background Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset. Conclusions From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.


2016 ◽  
Vol 64 ◽  
pp. 247-260 ◽  
Author(s):  
Rima Houari ◽  
Ahcène Bounceur ◽  
M-Tahar Kechadi ◽  
A-Kamel Tari ◽  
Reinhardt Euler

Author(s):  
Lambodar Jena ◽  
Ramakrushna Swain ◽  
N.K. kamila

This paper proposes a layered modular architecture to adaptively perform data mining tasks in large sensor networks. The architecture consists in a lower layer which performs data aggregation in a modular fashion and in an upper layer which employs an adaptive local learning technique to extract a prediction model from the aggregated information. The rationale of the approach is that a modular aggregation of sensor data can serve jointly two purposes: first, the organization of sensors in clusters, then reducing the communication effort, second, the dimensionality reduction of the data mining task, then improving the accuracy of the sensing task . Here we show that some of the algorithms developed within the artificial neuralnetworks tradition can be easily adopted to wireless sensor-network platforms and will meet several aspects of the constraints for data mining in sensor networks like: limited communication bandwidth, limited computing resources, limited power supply, and the need for fault-tolerance. The analysis of the dimensionality reduction obtained from the outputs of the neural-networks clustering algorithms shows that the communication costs of the proposed approach are significantly smaller, which is an important consideration in sensor-networks due to limited power supply. In this paper we will present two possible implementations of the ART and FuzzyART neuralnetworks algorithms, which are unsupervised learning methods for categorization of the sensory inputs. They are tested on a data obtained from a set of several nodes, equipped with several sensors each.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Michael W. Dorrity ◽  
Lauren M. Saunders ◽  
Christine Queitsch ◽  
Stanley Fields ◽  
Cole Trapnell

2021 ◽  
Author(s):  
Neeraj Kumar ◽  
Upendra Kumar

Abstract Information and Communication Technologies, to a long extent, have a major influence on our social life, economy as well as on worldwide security. Holistically, computer networks embrace the Information Technology. Although the world is never free from people having malicious intents i.e. cyber criminals, network intruders etc. To counter this, Intrusion Detection System (IDS) plays a very significant role in identifying the network intrusions by performing various data analysis tasks. In order to develop robust IDS with accuracy in intrusion detection, various papers have been published over the years using different classification techniques of Data Mining (DM) and Machine Learning (ML) based hybrid approach. The present paper is an in-depth analysis of two focal aspects of Network Intrusion Detection System that includes various pre-processing methods in the form of dimensionality reduction and an assortment of classification techniques. This paper also includes comparative algorithmic analysis of DM and ML techniques, which applied to design an intelligent IDS. An experiment al comparative analysis has been carried out in support the verdicts of this work using ‘Python’ language on ‘kddcup99’ dataset as benchmark . Experimental analysis had been done in which we had found more impact on dimensionality reduction and MLP performed well in the true classification to establish secure network. The motive behind this effort is to detect different kinds of malware as early as possible with accuracy, to provide enhanced observant among various existing techniques that may help the fascinated researchers for future potential works.


2019 ◽  
Author(s):  
Michael W. Dorrity ◽  
Lauren M. Saunders ◽  
Christine Queitsch ◽  
Stanley Fields ◽  
Cole Trapnell

Dimensionality reduction is often used to visualize complex expression profiling data. Here, we use the Uniform Manifold Approximation and Projection (UMAP) method on published transcript profiles of 1484 single gene deletions of Saccharomyces cerevisiae. Proximity in low-dimensional UMAP space identifies clusters of genes that correspond to protein complexes and pathways, and finds novel protein interactions even within well-characterized complexes. This approach is more sensitive than previous methods and should be broadly useful as additional transcriptome datasets become available for other organisms.


Sign in / Sign up

Export Citation Format

Share Document