scholarly journals Analysis of 46,046 SARS-CoV-2 whole-genomes leveraging principal component analysis (PCA)

2020 ◽  
Author(s):  
Christiane Scherer ◽  
James Grover ◽  
Darby Kammeraad ◽  
Gabe Rudy ◽  
Andreas Scherer

AbstractSince the beginning of the global SARS-CoV-2 pandemic, there have been a number of efforts to understand the mutations and clusters of genetic lines of the SARS-CoV-2 virus. Until now, phylogenetic analysis methods have been used for this purpose. Here we show that Principal Component Analysis (PCA), which is widely used in population genetics, can not only help us to understand existing findings about the mutation processes of the virus, but can also provide even deeper insights into these processes while being less sensitive to sequencing gaps. Here we describe a comprehensive analysis of a 46,046 SARS-CoV-2 genome sequence dataset downloaded from the GISAID database in June of this year.SummaryPCA provides deep insights into the analysis of large data sets of SARS-CoV-2 genomes, revealing virus lineages that have thus far been unnoticed.

2011 ◽  
Vol 33 (5) ◽  
pp. 2580-2594 ◽  
Author(s):  
Nathan Halko ◽  
Per-Gunnar Martinsson ◽  
Yoel Shkolnisky ◽  
Mark Tygert

1993 ◽  
Vol 13 (1) ◽  
pp. 5-14 ◽  
Author(s):  
K. J. Friston ◽  
C. D. Frith ◽  
P. F. Liddle ◽  
R. S. J. Frackowiak

The distributed brain systems associated with performance of a verbal fluency task were identified in a nondirected correlational analysis of neurophysiological data obtained with positron tomography. This analysis used a recursive principal-component analysis developed specifically for large data sets. This analysis is interpreted in terms of functional connectivity, defined as the temporal correlation of a neurophysiological index measured in different brain areas. The results suggest that the variance in neurophysiological measurements, introduced experimentally, was accounted for by two independent principal components. The first, and considerably larger, highlighted an intentional brain system seen in previous studies of verbal fluency. The second identified a distributed brain system including the anterior cingulate and Wernicke's area that reflected monotonic time effects. We propose that this system has an attentional bias.


Kursor ◽  
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Annisa Eka Haryati ◽  
Sugiyarto Sugiyarto ◽  
Rizki Desi Arindra Putri

Multivariate statistics have related problems with large data dimensions. One method that can be used is principal component analysis (PCA). Principal component analysis (PCA) is a technique used to reduce data dimensions consisting of several dependent variables while maintaining variance in the data. PCA can be used to stabilize measurements in statistical analysis, one of which is cluster analysis. Fuzzy clustering is a method of grouping based on membership values ​​that includes fuzzy sets as a weighting basis for grouping. In this study, the fuzzy clustering method used is Fuzzy Subtractive Clustering (FSC) and Fuzzy C-Means (FCM) with a combination of the Minkowski Chebysev distance. The purpose of this study was to compare the cluster results obtained from the FSC and FCM using the DBI validity index. The results obtained indicate that the results of clustering using FCM are better than the FSC.


Author(s):  
Petr Praus

In this chapter the principals and applications of principal component analysis (PCA) applied on hydrological data are presented. Four case studies showed the possibility of PCA to obtain information about wastewater treatment process, drinking water quality in a city network and to find similarities in the data sets of ground water quality results and water-related images. In the first case study, the composition of raw and cleaned wastewater was characterised and its temporal changes were displayed. In the second case study, drinking water samples were divided into clusters in consistency with their sampling localities. In the case study III, the similar samples of ground water were recognised by the calculation of cosine similarity, the Euclidean and Manhattan distances. In the case study IV, 32 water-related images were transformed into a large image matrix whose dimensionality was reduced by PCA. The images were clustered using the PCA scatter plots.


1995 ◽  
Vol 69 (S42) ◽  
pp. 1-19 ◽  
Author(s):  
Pierre J. Lespérance ◽  
Sylvain Desbiens

The thorax of Hypodicranotus has ten segments and a spine on the eighth. The ages of Erratencrinurus s.l. spicatus and Erratencrinurus (Erratencrinurus?) vigilans in the Lake St. John district do not confirm their temporal roles leading to subgenera of Erratencrinurus, as has been recently suggested. Phylogenetic analyses of large data sets of species previously referred to Encrinuroides and Physemataspis yield a minimal length cladogram containing 18 species. Encrinuroides is restricted to four species, two of which have biogeographic affinities with Iapetus. These results lead to three clades, named the Walencrinuroides n. gen. clade, Frencrinuroides n. gen. clade, and finally the Physemataspis clade, with an enlarged concept of the genus with the erection of Physemataspis (Prophysemataspis) n. subgen. These last three clades are restricted to North America and Scotland, with alternating predominance of one region. Walencrinuroides s.l. gelaisi n. gen. n. sp. is described. New morphological data on Erratencrinurus s.l. spicatus confirm its close relationship with the clades discussed above. Data are insufficient for phylogenetic analysis of selected cheirurine species here surveyed. Eye position, glabellar segmentation, and pygidial shape differentiate the genera Ceraurus and Gabriceraurus; emended diagnoses of these genera are presented. Ceraurus globulobatus and C. matranseris are distinct, but morphologically close to one another. The status of Gabriceraurus dentatus can be stabilized on its extant types.


Sign in / Sign up

Export Citation Format

Share Document