A deep learning framework for characterization of genotype data

Population Structure ◽

Deep Learning ◽

Principal Component ◽

Data Transformation ◽

Component Analysis ◽

Classification Model ◽

Genotype Data

ABSTRACTDimensionality reduction is a data transformation technique widely used in various fields of genomics research, with principal component analysis one of the most frequently employed methods. Application of principal component analysis to genotype data is known to capture genetic similarity between individuals, and is used for visualization of genetic variation, identification of population structure as well as ancestry mapping. However, the method is based on a linear model that is sensitive to characteristics of data such as correlation of single-nucleotide polymorphisms due to linkage disequilibrium, resulting in limitations in its ability to capture complex population structure.Deep learning models are a type of nonlinear machine learning method in which the features used in data transformation are decided by the model in a data-driven manner, rather than by the researcher, and have been shown to present a promising alternative to traditional statistical methods for various applications in omics research. In this paper, we propose a deep learning model based on a convolutional autoencoder architecture for dimensionality reduction of genotype data.Using a highly diverse cohort of human samples, we demonstrate that the model can identify population clusters and provide richer visual information in comparison to principal component analysis, and also yield a more accurate population classification model. We also discuss the use of the methodology for more general characterization of genotype data, showing that models of a similar architecture can be used as a genetic clustering method, comparing results to the ADMIXTURE software frequently used in population genetic studies.

Intelligent Computing and Applications - Advances in Intelligent Systems and Computing ◽

Probabilistic Principal Component Analysis (PPCA) Based Dimensionality Reduction and Deep Learning for Cancer Classification

10.1007/978-981-15-5566-4_31 ◽

2020 ◽

pp. 353-368

Author(s):

D. Menaga ◽

S. Revathi

Keyword(s):

Deep Learning ◽

Probabilistic Principal Component Analysis

Principal Component ◽

Component Analysis ◽

Cancer Classification ◽

A Face Categorization Algorithm Based on Convolutional Neural Networks and Principal Component Analysis

Успехи кибернетики / Russian Journal of Cybernetics ◽

10.51790/2712-9942-2020-1-3-1 ◽

2021 ◽

pp. 6-14

Author(s):

А.О. Алексанян ◽

С.О. Старков ◽

К.В. Моисеев

Keyword(s):

Neural Network ◽

Deep Learning ◽

Principal Component ◽

Component Analysis ◽

Study Objective ◽

Open Set ◽

Deep Learning Neural Network

Данная статья затрагивает проблему распознавания лиц при решении задачи идентификации, где в качестве входных данных для последующей классификации используются вектора-признаки, полученные в результате работы сети глубокого обучения. Немногие существующие алгоритмы способны проводить классификацию на открытых наборах (open-set classification) с достаточно высокой степенью надежности. Общепринятым подходом к проведению классификации является применение классификатора на основании порогового значения. Такой подход обладает рядом существенных недостатков, что и является причиной низкого качества классификации на открытых наборах. Из основных недостатков можно выделить следующие. Во-первых, отсутствие фиксированного порога — невозможно подобрать универсальный порог для каждого лица. Во-вторых, увеличение порога ведет к снижению качества классификации. И, в-третьих, при пороговой классификации одному лицу может соответствовать сразу большое количество классов. В связи с этим мы предлагаем использование метода главных компонент в качестве дополнительного способа понижения размерности, вдобавок к выделению ключевых признаков лица сетью глубокого обучения, для дальнейшей классификации векторов-признаков. Геометрически применение метода главных компонент к векторам-признакам и проведение дальнейшей классификации равносильно поиску пространства меньшей размерности, в котором проекции исходных векторов будут хорошо разделимы. Идея понижения размерности логически вытекает из предположения, что не все компоненты N-мерных векторов-признаков несут значимый вклад в описание человеческого лица и что лишь некоторые компоненты образуют большую часть дисперсии. Таким образом, выделение только значимых компонентов из векторов-признаков позволяет производить разделение классов на основании самых вариативных признаков, без изучения при этом менее информативных данных и без сравнения вектора в пространстве большой размерности. The study objective is face recognition for identification purposes. The input data to be classified are attribute vectors generated by a deep learning neural network. The few existing algorithms can perform sufficiently reliable openset classification. The common approach to classification is using a classification threshold. It has several disadvantages leading to the low quality of openset classifications. The key disadvantages are as follows. First, there is no set threshold: it is impossible to find a common threshold suitable for every face. Second, the higher the threshold, the lower the quality of classification. Third, with the threshold classification more than one class can match a face. For this reason, we proposed to apply the principal component analysis as an extra dimensionality reduction tool besides identifying the key face attributes by a deep learning neural network for subsequent classification of the attribute vectors. In geometric terms, the principal component analysis application to attribute vectors with subsequent classification is similar to a search for a lowdimension space where the projections of the source vectors can be easily separated. The dimensionality reduction concept is based on the assumption that not all the components on Ndimensional attribute vectors are relevant for the human face representation, and only some of them produce the larger part of the dispersion. Therefore, by selecting only the relevant components of the attribute vectors we can separate the classes using the most variable attributes while skipping the less informative data and not comparing the vectors in a highdimensional space.

Analysis of bath motion in MM-SQC dynamics via dimensionality reduction approach: Principal component analysis

The Journal of Chemical Physics ◽

10.1063/5.0039743 ◽

2021 ◽

Vol 154 (9) ◽

pp. 094122

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Principal Component ◽

Component Analysis ◽

Reduction Approach

Characterization of PCBs by principal component analysis (PCA of PCB)

Marine Pollution Bulletin ◽

10.1016/0025-326x(89)90273-7 ◽

1989 ◽

Vol 20 (1) ◽

pp. 26-27 ◽

Cited By ~ 19

Author(s):

V. Zitko

Keyword(s):

Principal Component ◽

Component Analysis

Characterization of sequentially grafted polysaccharide coatings using time-of-flight secondary ion mass spectrometry (ToF-SIMS) and principal component analysis (PCA)

Surface and Interface Analysis ◽

10.1002/sia.1446 ◽

2002 ◽

Vol 33 (12) ◽

pp. 924-931 ◽

Cited By ~ 13

Author(s):

Sally L. McArthur ◽

Matthew S. Wagner ◽

Patrick G. Hartley ◽

Keith M. McLean ◽

Hans J. Griesser ◽

...

Keyword(s):

Mass Spectrometry ◽

Secondary Ion Mass Spectrometry ◽

Time Of Flight ◽

Principal Component ◽

Component Analysis ◽

Tof Sims ◽

Ion Mass Spectrometry ◽

Secondary Ion

Characterization of Adsorbed Protein Films by Time-of-Flight Secondary Ion Mass Spectrometry with Principal Component Analysis

Langmuir ◽

10.1021/la001209t ◽

2001 ◽

Vol 17 (15) ◽

pp. 4649-4660 ◽

Cited By ~ 312

Author(s):

M. S. Wagner ◽

David G. Castner

Keyword(s):

Mass Spectrometry ◽

Secondary Ion Mass Spectrometry ◽

Time Of Flight ◽

Principal Component ◽

Component Analysis ◽

Protein Films ◽

Adsorbed Protein ◽

Secondary Ion

Proceedings of the Institution of Mechanical Engineers Part J Journal of Engineering Tribology ◽

Characterization of fretting wear experiments on spline couplings by principal component analysis

10.1177/1350650116682162 ◽

2016 ◽

Vol 231 (7) ◽

pp. 860-868 ◽

Cited By ~ 2

Author(s):

Waqar Qureshi ◽

Francesca Cura ◽

Andrea Mura

Keyword(s):

Experimental Data ◽

Principal Component ◽

Component Analysis ◽

Fretting Wear ◽

Aircraft Industry ◽

Surface Movement ◽

Noisy Measurements ◽

Quasi Static Process

Fretting wear is a quasi-static process in which repeated relative surface movement of components results in wear and fatigue. Fretting wear is quite significant in the case of spline couplings which are frequently used in the aircraft industry to transfer torque and power. Fretting wear depends on materials, pressure distribution, torque, rotational speeds, lubrication, surface finish, misalignment between spline shafts, etc. The presence of so many factors makes it difficult to conduct experiments for better models of fretting wear and it is the case whenever a mathematical model is sought from experimental data which is prone to noisy measurements, outliers and redundant variables. This work develops a principal component analysis based method, using a criterion which is insensitive to outliers, to realize a better design and interpret experiments on fretting wear. The proposed method can be extended to other cases too.

2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) ◽

Video Shot Boundary Detection Using Principal Component Analysis (PCA) and Deep Learning

10.1109/ecti-con51831.2021.9454775 ◽

2021 ◽

Author(s):

Dipanita Chakraborty ◽

Werapon Chiracharit ◽

Kosin Chamnongthai

Keyword(s):

Deep Learning ◽

Principal Component ◽

Boundary Detection ◽

Component Analysis ◽

Shot Boundary Detection ◽

Video Shot ◽

Shot Boundary

Physical-oriented and machine learning-based emission modeling in a diesel compression ignition engine: Dimensionality reduction and regression

International Journal of Engine Research ◽

10.1177/14680874211070736 ◽

2022 ◽

pp. 146808742110707

Author(s):

Aran Mohammad ◽

Reza Rezaei ◽

Christopher Hayduk ◽

Thaddaeus Delebinski ◽

Saeid Shahpouri ◽

...

Keyword(s):

Support Vector Machine ◽

Factor Analysis ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Support Vector ◽

Emission Models ◽

Emission Modeling

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.

Dimensionality Reduction with Principal Component Analysis

Mathematics for Machine Learning ◽

10.1017/9781108679930.012 ◽

2020 ◽

pp. 286-313

Keyword(s):

Principal Component ◽

Component Analysis