scholarly journals Neural ADMIXTURE: rapid population clustering with autoencoders

2021 ◽  
Author(s):  
Albert Dominguez Mantes ◽  
Daniel Mas Montserrat ◽  
Carlos Bustamante ◽  
Xavier Giró-i-Nietó ◽  
Alexander G Ioannidis

Characterizing the genetic substructure of large cohorts has become increasingly important as genetic association and prediction studies are extended to massive, increasingly diverse, biobanks. ADMIXTURE and STRUCTURE are widely used unsupervised clustering algorithms for characterizing such ancestral genetic structure. These methods decompose individual genomes into fractional cluster assignments with each cluster representing a vector of DNA marker frequencies. The assignments, and clusters, provide an interpretable representation for geneticists to describe population substructure at the sample level. However, with the rapidly increasing size of population biobanks and the growing numbers of variants genotyped (or sequenced) per sample, such traditional methods become computationally intractable. Furthermore, multiple runs with different hyperparameters are required to properly depict the population clustering using these traditional methods, increasing the computational burden. This can lead to days of compute. In this work we present Neural ADMIXTURE, a neural network autoencoder that follows the same modeling assumptions as ADMIXTURE, providing similar (or better) clustering, while reducing the compute time by orders of magnitude. In addition, this network can include multiple outputs, providing the equivalent results as running the original ADMIXTURE algorithm many times with different numbers of clusters. These models can also be stored, allowing later cluster assignment to be performed with a linear computational time.

Author(s):  
Usman Akhtar ◽  
Mehdi Hassan

The availability of a huge amount of heterogeneous data from different sources to the Internet has been termed as the problem of Big Data. Clustering is widely used as a knowledge discovery tool that separate the data into manageable parts. There is a need of clustering algorithms that scale on big databases. In this chapter we have explored various schemes that have been used to tackle the big databases. Statistical features have been extracted and most important and relevant features have been extracted from the given dataset. Reduce and irrelevant features have been eliminated and most important features have been selected by genetic algorithms (GA).Clustering with reduced feature sets requires lower computational time and resources. Experiments have been performed at standard datasets and results indicate that the proposed scheme based clustering offers high clustering accuracy. To check the clustering quality various quality measures have been computed and it has been observed that the proposed methodology results improved significantly. It has been observed that the proposed technique offers high quality clustering.


Web Services ◽  
2019 ◽  
pp. 413-430
Author(s):  
Usman Akhtar ◽  
Mehdi Hassan

The availability of a huge amount of heterogeneous data from different sources to the Internet has been termed as the problem of Big Data. Clustering is widely used as a knowledge discovery tool that separate the data into manageable parts. There is a need of clustering algorithms that scale on big databases. In this chapter we have explored various schemes that have been used to tackle the big databases. Statistical features have been extracted and most important and relevant features have been extracted from the given dataset. Reduce and irrelevant features have been eliminated and most important features have been selected by genetic algorithms (GA). Clustering with reduced feature sets requires lower computational time and resources. Experiments have been performed at standard datasets and results indicate that the proposed scheme based clustering offers high clustering accuracy. To check the clustering quality various quality measures have been computed and it has been observed that the proposed methodology results improved significantly. It has been observed that the proposed technique offers high quality clustering.


Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1381 ◽  
Author(s):  
Zain Ul Abiden Akhtar ◽  
Hongyu Wang

Driver distraction and fatigue are among the leading contributing factors in various fatal accidents. Driver activity monitoring can effectively reduce the number of roadway accidents. Besides the traditional methods that rely on camera or wearable devices, wireless technology for driver’s activity monitoring has emerged with remarkable attention. With substantial progress in WiFi-based device-free localization and activity recognition, radio-image features have achieved better recognition performance using the proficiency of image descriptors. The major drawback of image features is computational complexity, which increases exponentially, with the growth of irrelevant information in an image. It is still unresolved how to choose appropriate radio-image features to alleviate the expensive computational burden. This paper explores a computational efficient wireless technique that could recognize the attentive and inattentive status of a driver leveraging Channel State Information (CSI) of WiFi signals. In this novel research work, we demonstrate an efficient scheme to extract the representative features from the discriminant components of radio-images to reduce the computational cost with significant improvement in recognition accuracy. Specifically, we addressed the problem of the computational burden by efficacious use of Gabor filters with gray level statistical features. The presented low-cost solution requires neither sophisticated camera support to capture images nor any special hardware to carry with the user. This novel framework is evaluated in terms of activity recognition accuracy. To ensure the reliability of the suggested scheme, we analyzed the results by adopting different evaluation metrics. Experimental results show that the presented prototype outperforms the traditional methods with an average recognition accuracy of 93.1 % in promising application scenarios. This ubiquitous model leads to improve the system performance significantly for the diverse scale of applications. In the realm of intelligent vehicles and assisted driving systems, the proposed wireless solution can effectively characterize the driving maneuvers, primary tasks, driver distraction, and fatigue by exploiting radio-image descriptors.


2020 ◽  
pp. 002029402091986
Author(s):  
Xiaocui Yuan ◽  
Huawei Chen ◽  
Baoling Liu

Clustering analysis is one of the most important techniques in point cloud processing, such as registration, segmentation, and outlier detection. However, most of the existing clustering algorithms exhibit a low computational efficiency with the high demand for computational resources, especially for large data processing. Sometimes, clusters and outliers are inseparable, especially for those point clouds with outliers. Most of the cluster-based algorithms can well identify cluster outliers but sparse outliers. We develop a novel clustering method, called spatial neighborhood connected region labeling. The method defines spatial connectivity criterion, finds points connections based on the connectivity criterion among the k-nearest neighborhood region and classifies connected points to the same cluster. Our method can accurately and quickly classify datasets using only one parameter k. Comparing with K-means, hierarchical clustering and density-based spatial clustering of applications with noise methods, our method provides better accuracy using less computational time for data clustering. For applications in the outlier detection of the point cloud, our method can identify not only cluster outliers, but also sparse outliers. More accurate detection results are achieved compared to the state-of-art outlier detection methods, such as local outlier factor and density-based spatial clustering of applications with noise.


Author(s):  
Debby Cintia Ganesha Putri ◽  
Jenq-Shiou Leu ◽  
Pavel Seda

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and, Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used Average Similarity, Computational Time, Association Rule with Apriori algorithm, and Clustering Performance Evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies-Bouldin Index.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Frumen Olivas ◽  
Ivan Amaya ◽  
José Carlos Ortiz-Bayliss ◽  
Santiago E. Conant-Pablos ◽  
Hugo Terashima-Marín

Hyperheuristics rise as powerful techniques that get good results in less computational time than exact methods like dynamic programming or branch and bound. These exact methods promise the global best solution, but with a high computational time. In this matter, hyperheuristics do not promise the global best solution, but they promise a good solution in a lot less computational time. On the contrary, fuzzy logic provides the tools to model complex problems in a more natural way. With this in mind, this paper proposes a fuzzy hyperheuristic approach, which is a combination of a fuzzy inference system with a selection hyperheuristic. The fuzzy system needs the optimization of its fuzzy rules due to the lack of expert knowledge; indeed, traditional hyperheuristics also need an optimization of their rules. The fuzzy rules are optimized by genetic algorithms, and for the rules of the traditional methods, we use particle swarm optimization. The genetic algorithm will also reduce the number of fuzzy rules, in order to find the best minimal fuzzy rules, whereas traditional methods already use very few rules. Experimental results show the advantage of using our approach instead of a traditional selection hyperheuristic in 3200 instances of the 0/1 knapsack problem.


Author(s):  
Dhanalakshmi Samiappan ◽  
S. Latha ◽  
T. Rama Rao ◽  
Deepak Verma ◽  
CSA Sriharsha

Enhancing the image to remove noise, preserving the useful features and edges are the most important tasks in image analysis. In this paper, Significant Cluster Identification for Maximum Edge Preservation (SCI-MEP), which works in parallel with clustering algorithms and improved efficiency of the machine learning aptitude, is proposed. Affinity propagation (AP) is a base method to obtain clusters from a learnt dictionary, with an adaptive window selection, which are then refined using SCI-MEP to preserve the semantic components of the image. Since only the significant clusters are worked upon, the computational time drastically reduces. The flexibility of SCI-MEP allows it to be integrated with any clustering algorithm to improve its efficiency. The method is tested and verified to remove Gaussian noise, rain noise and speckle noise from images. Our results have shown that SCI-MEP considerably optimizes the existing algorithms in terms of performance evaluation metrics.


Author(s):  
Massimo Donelli

In this chapter, a methodology for the unsupervised design of microwave devices, circuits, and systems is considered. More specifically, the application of the Particle Swarm Optimizer and its integration with electromagnetic simulators is discussed in the framework of the microwave circuits and devices design and optimization. The idea is to automatically modify the characteristics of the device in an unsupervised way, with the goal of improve the device performances. Such kind of CAD tool could be the solution to reduce the time to market and keep the commercial predominance, since they do not require expert microwave engineers and it can reduce the computational time typical of the standard design methodologies. To assess the potentialities of the proposed method, a selected set of examples concerning the design of microwave planar devices such as filters, splitters, and other microwave devices under various operative conditions and frequency bands are reported and discussed. The chapter also includes a brief discussion concerning different strategies, such as parallel computation, to reduce the computational burden and the elaboration time. The obtained results seem to confirm the capabilities of the proposed method as effectiveness microwave CAD tool for the unsupervised design of microwave devices, circuits, and systems. The chapter ends with some conclusions and considerations related to ideas for future works.


Author(s):  
Florian Dupuy ◽  
Olivier Mestre ◽  
Mathieu Serrurier ◽  
Valentin Kivachuk Burdá ◽  
Michaël Zamo ◽  
...  

AbstractCloud cover provides crucial information for many applications such as planning land observation missions from space. It remains however a challenging variable to forecast, and Numerical Weather Prediction (NWP) models suffer from significant biases, hence justifying the use of statistical post-processing techniques. In this study, ARPEGE (Météo-France global NWP) cloud cover is post-processed using a convolutional neural network (CNN). CNN is the most popular machine learning tool to deal with images. In our case, CNN allows the integration of spatial information contained in NWP outputs. We use a gridded cloud cover product derived from satellite observations over Europe as ground truth, and predictors are spatial fields of various variables produced by ARPEGE at the corresponding lead time. We show that a simple U-Net architecture (a particular type of CNN) produces significant improvements over Europe. Moreover, the U-Net outclasses more traditional machine learning methods used operationally such as a random forest and a logistic quantile regression. When using a large number of predictors, a first step toward interpretation is to produce a ranking of predictors by importance. Traditional methods of ranking (permutation importance, sequential selection, . . . ) need important computational resources. We introduced a weighting predictor layer prior to the traditional U-Net architecture in order to produce such a ranking. The small number of additional weights to train (the same as the number of predictors) does not impact the computational time, representing a huge advantage compared to traditional methods.


Author(s):  
Vasu Deep ◽  
Himanshu Sharma

This work is belonging to K-means clustering algorithms classifier is used with this algorithm to classified data and Min Max normalization technique also used is to enhance the results of this work over simply K- Means algorithm. K-means algorithm is a clustering algorithm and basically used for discovering the cluster within a dataset. Here cancer dataset is used for this research work and dataset is classified in two categories – Cancer and Non-Cancer, after execution of the implemented algorithm with SVM and Normalization technique. The initial point selection effects on the results of the algorithm, both in the number of clusters found and their centroids. In this work enhance the k-means clustering algorithm methods are discussed. This technique helps to improve efficiency, accuracy, performance and computational time. Some enhanced variations improve the efficiency and accuracy of algorithm. The main of all methods is to decrees the number of iterations which will less computational time. K-means algorithm in clustering is most popular technique which is widely used technique in data mining. Various enhancements done on K-mean are collected, so by using these enhancements one can build a new proposed algorithm which will be more efficient, accurate and less time consuming than the previous work. More focus of this studies is to decrease the number of iterations which is less time consuming and second one is to gain more accuracy using normalization technique overall belonging to improve time and accuracy than previous studies.


Sign in / Sign up

Export Citation Format

Share Document