ExTree—Explainable Genetic Feature Coupling Tree Using Fuzzy Mapping for Dimensionality Reduction with Application to NACA 0012 Airfoils Self-Noise Data Set

Author(s):  
Javier Viaña ◽  
Kelly Cohen
2018 ◽  
Vol 14 (4) ◽  
pp. 20-37 ◽  
Author(s):  
Yinglei Song ◽  
Yongzhong Li ◽  
Junfeng Qu

This article develops a new approach for supervised dimensionality reduction. This approach considers both global and local structures of a labelled data set and maximizes a new objective that includes the effects from both of them. The objective can be approximately optimized by solving an eigenvalue problem. The approach is evaluated based on a few benchmark data sets and image databases. Its performance is also compared with a few other existing approaches for dimensionality reduction. Testing results show that, on average, this new approach can achieve more accurate results for dimensionality reduction than existing approaches.


Author(s):  
R. Kiran Kumar ◽  
B. Saichandana ◽  
K. Srinivas

<p>This paper presents genetic algorithm based band selection and classification on hyperspectral image data set. Hyperspectral remote sensors collect image data for a large number of narrow, adjacent spectral bands. Every pixel in hyperspectral image involves a continuous spectrum that is used to classify the objects with great detail and precision. In this paper, first filtering based on 2-D Empirical mode decomposition method is used to remove any noisy components in each band of the hyperspectral data. After filtering, band selection is done using genetic algorithm in-order to remove bands that convey less information. This dimensionality reduction minimizes many requirements such as storage space, computational load, communication bandwidth etc which is imposed on the unsupervised classification algorithms. Next image fusion is performed on the selected hyperspectral bands to selectively merge the maximum possible features from the selected images to form a single image. This fused image is classified using genetic algorithm. Three different indices, such as K-means Index (KMI) and Jm measure are used as objective functions. This method increases classification accuracy and performance of hyperspectral image than without dimensionality reduction.</p>


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Jennifer Luyapan ◽  
Xuemei Ji ◽  
Siting Li ◽  
Xiangjun Xiao ◽  
Dakai Zhu ◽  
...  

Abstract Background Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset. Conclusions From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.


2012 ◽  
Vol 188 (3) ◽  
pp. 1173-1187 ◽  
Author(s):  
J. Verbeke ◽  
L. Boschi ◽  
L. Stehly ◽  
E. Kissling ◽  
A. Michelini

2017 ◽  
Vol 10 (13) ◽  
pp. 355 ◽  
Author(s):  
Reshma Remesh ◽  
Pattabiraman. V

Dimensionality reduction techniques are used to reduce the complexity for analysis of high dimensional data sets. The raw input data set may have large dimensions and it might consume time and lead to wrong predictions if unnecessary data attributes are been considered for analysis. So using dimensionality reduction techniques one can reduce the dimensions of input data towards accurate prediction with less cost. In this paper the different machine learning approaches used for dimensionality reductions such as PCA, SVD, LDA, Kernel Principal Component Analysis and Artificial Neural Network  have been studied.


2019 ◽  
Vol 21 (3) ◽  
pp. 1047-1057 ◽  
Author(s):  
Zhen Chen ◽  
Pei Zhao ◽  
Fuyi Li ◽  
Tatiana T Marquez-Lago ◽  
André Leier ◽  
...  

Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.


Atmosphere ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 142 ◽  
Author(s):  
Ana del Águila ◽  
Dmitry Efremenko ◽  
Víctor Molina García ◽  
Jian Xu

The new generation of atmospheric composition sensors such as TROPOMI is capable of providing spectra of high spatial and spectral resolution. To process this vast amount of spectral information, fast radiative transfer models (RTMs) are required. In this regard, we analyzed the efficiency of two acceleration techniques based on the principal component analysis (PCA) for simulating the Hartley-Huggins band spectra. In the first one, the PCA is used to map the data set of optical properties of the atmosphere to a lower-dimensional subspace, in which the correction function for an approximate but fast RTM is derived. The second technique is based on the dimensionality reduction of the data set of spectral radiances. Once the empirical orthogonal functions are found, the whole spectrum can be reconstructed by performing radiative transfer computations only for a specific subset of spectral points. We considered a clear-sky atmosphere where the optical properties are defined by Rayleigh scattering and trace gas absorption. Clouds can be integrated into the model as Lambertian reflectors. High computational performance is achieved by combining both techniques without losing accuracy. We found that for the Hartley-Huggins band, the combined use of these techniques yields an accuracy better than 0.05% while the speedup factor is about 20. This innovative combination of both PCA-based techniques can be applied in future works as an efficient approach for simulating the spectral radiances in other spectral regions.


Geophysics ◽  
1995 ◽  
Vol 60 (4) ◽  
pp. 978-997 ◽  
Author(s):  
Jacob B. U. Haldorsen ◽  
Douglas E. Miller ◽  
John J. Walsh

We describe a method for extracting and deconvolving a signal generated by a drill bit and collected by an array of surface geophones. The drill‐noise signature is reduced to an effective impulse by means of a multichannel Wiener deconvolution technique, producing a walk‐away reverse vertical seismic profile (VSP) sampled almost continuously in depth. We show how the multichannel technique accounts for noise and for internal drill‐string reflections, automatically limiting the deconvolved data to frequencies containing significant energy. We have acquired and processed a data set from a well in Germany while drilling at a depth of almost 4000 m. The subsurface image derived from these data compares well with corresponding images from a 3-D surface seismic survey, a zero‐offset VSP survey, and a walk‐away VSP survey acquired using conventional wireline techniques. The effective bandwidth of the deconvolved drill‐noise data is comparable to the bandwidth of surface seismic data but significantly smaller than what can be achieved with wireline VSP techniques. Although the processing algorithm does not require the use of sensors mounted on the drill string, these sensors provide a very economic way to compress the data. The sensors on the drill string were also used for accurate timing of the deconvolved drill‐noise data.


Author(s):  
Hidir Selcuk Nogay ◽  
Tahir Cetin Akinci ◽  
Musa Yilmaz

AbstractCeramic materials are an indispensable part of our lives. Today, ceramic materials are mainly used in construction and kitchenware production. The fact that some deformations cannot be seen with the naked eye in the ceramic industry leads to a loss of time in the detection of deformations in the products. Delays that may occur in the elimination of deformations and in the planning of the production process cause the products with deformation to be excessive, which adversely affects the quality. In this study, a deep learning model based on acoustic noise data and transfer learning techniques was designed to detect cracks in ceramic plates. In order to create a data set, noise curves were obtained by applying the same magnitude impact to the ceramic experiment plates by impact pendulum. For experimental application, ceramic plates with three invisible cracks and one undamaged ceramic plate were used. The deep learning model was trained and tested for crack detection in ceramic plates by the data set obtained from the noise graphs. As a result, 99.50% accuracy was achieved with the deep learning model based on acoustic noise.


Author(s):  
Hugo D. Rebelo ◽  
Lucas A. F. de Oliveira ◽  
Gustavo M. Almeida ◽  
César A. M. Sotomayor ◽  
Geraldo L. Rochocz ◽  
...  

Customer’s satisfaction is crucial for companies worldwide. An integrated strategy composes omnichannel communication systems, in which chabot is widely used. This system is supervised, and the key point is that the required training data are originally unlabelled. Labelling data manually is unfeasible mainly nowadays due to the considerable volume. Moreover, customer behaviour is often hidden in the data even for experts. This work proposes a methodology to find unknown entities and intents automatically using unsupervised learning. This is based on natural language processing (NLP) for text data preparation and on machine learning (ML) for clustering model identification. Several combinations for preprocessing, vectorisation, dimensionality reduction and clustering techniques, were investigated. The case study refers to a Brazilian electric energy company, with a data set of failed customer queries, that is, not met by the company for any reason. They correspond to about 30% (4,044 queries) of the original data set. The best identified intent model employed stemming for preprocessing, word frequency analysis for vectorisation, latent Dirichlet allocation (LDA) for dimensionality reduction, and mini-batch [Formula: see text]-means for clustering. This system was able to allocate 62% of the failed queries in one of the seven found intents. For instance, this new labelled data can be used for the training of NLP-based chatbots contributing to a greater generalisation capacity, and ultimately, to increase customer satisfaction.


Sign in / Sign up

Export Citation Format

Share Document