CHOOSING SNPs USING FEATURE SELECTION

2006 ◽  
Vol 04 (02) ◽  
pp. 241-257 ◽  
Author(s):  
TU MINH PHUONG ◽  
ZHEN LIN ◽  
RUSS B. ALTMAN

A major challenge for genomewide disease association studies is the high cost of genotyping large number of single nucleotide polymorphisms (SNPs). The correlations between SNPs, however, make it possible to select a parsimonious set of informative SNPs, known as "tagging" SNPs, able to capture most variation in a population. Considerable research interest has recently focused on the development of methods for finding such SNPs. In this paper, we present an efficient method for finding tagging SNPs. The method does not involve computation-intensive search for SNP subsets but discards redundant SNPs using a feature selection algorithm. In contrast to most existing methods, the method presented here does not limit itself to using only correlations between SNPs in local groups. By using correlations that occur across different chromosomal regions, the method can reduce the number of globally redundant SNPs. Experimental results show that the number of tagging SNPs selected by our method is smaller than by using block-based methods. Supplementary website: .

Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1238
Author(s):  
Supanat Chamchuen ◽  
Apirat Siritaratiwat ◽  
Pradit Fuangfoo ◽  
Puripong Suthisopapan ◽  
Pirat Khunkitti

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.


2016 ◽  
Vol 283 (1835) ◽  
pp. 20160569 ◽  
Author(s):  
M. E. Goddard ◽  
K. E. Kemper ◽  
I. M. MacLeod ◽  
A. J. Chamberlain ◽  
B. J. Hayes

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.


2010 ◽  
Vol 30 (6) ◽  
pp. 1411-1420 ◽  
Author(s):  
Jason B. Wright ◽  
Seth J. Brown ◽  
Michael D. Cole

ABSTRACT Genome-wide association studies have mapped many single-nucleotide polymorphisms (SNPs) that are linked to cancer risk, but the mechanism by which most SNPs promote cancer remains undefined. The rs6983267 SNP at 8q24 has been associated with many cancers, yet the SNP falls 335 kb from the nearest gene, c-MYC. We show that the beta-catenin-TCF4 transcription factor complex binds preferentially to the cancer risk-associated rs6983267(G) allele in colon cancer cells. We also show that the rs6983267 SNP has enhancer-related histone marks and can form a 335-kb chromatin loop to interact with the c-MYC promoter. Finally, we show that the SNP has no effect on the efficiency of chromatin looping to the c-MYC promoter but that the cancer risk-associated SNP enhances the expression of the linked c-MYC allele. Thus, cancer risk is a direct consequence of elevated c-MYC expression from increased distal enhancer activity and not from reorganization/creation of the large chromatin loop. The findings of these studies support a mechanism for intergenic SNPs that can promote cancer through the regulation of distal genes by utilizing preexisting large chromatin loops.


Sign in / Sign up

Export Citation Format

Share Document