pairwise distance matrix
Recently Published Documents


TOTAL DOCUMENTS

6
(FIVE YEARS 2)

H-INDEX

3
(FIVE YEARS 1)

2020 ◽  
Vol 8 (4) ◽  
pp. 309-324
Author(s):  
Zhihua Yan ◽  
Xijin Tang

AbstractOnline media have brought tremendous changes to civic life, public opinions, and government administration. Compared with traditional media, online media not only allow individuals to browse news and express their views more freely, but also accelerate the transmission of opinions and expand influence. As public opinions may arouse societal unrest, it is worth detecting the primary topics and uncovering the evolution trends of public opinions for societal administration. Various algorithms are developed to deal with the huge volume of unstructured online media data. In this study, dynamic topic model is employed to explore topic content evolution and prevalence evolution using the original posts published from 2013 to 2017 on the Tianya Zatan Board of Tianya Club, which is one of the most popular BBS in China. Based on semantic similarities, topics are grouped into three themes: Family life, societal affairs, and government administration. The evolution of topic prevalence and content are affected by emergent incidents. Topics on family life become popular, while themes “societal affairs” and “government administration” with bigger standard deviations are more likely to be influenced by emergent hot events. Content evolution represented by monthly pairwise distance matrix is very easy to find change points of topic content.


2020 ◽  
Vol 21 (3) ◽  
pp. 944 ◽  
Author(s):  
Valery V. Panyukov ◽  
Sergey S. Kiselev ◽  
Olga N. Ozoline

The need for a comparative analysis of natural metagenomes stimulated the development of new methods for their taxonomic profiling. Alignment-free approaches based on the search for marker k-mers turned out to be capable of identifying not only species, but also strains of microorganisms with known genomes. Here, we evaluated the ability of genus-specific k-mers to distinguish eight phylogroups of Escherichia coli (A, B1, C, E, D, F, G, B2) and assessed the presence of their unique 22-mers in clinical samples from microbiomes of four healthy people and four patients with Crohn’s disease. We found that a phylogenetic tree inferred from the pairwise distance matrix for unique 18-mers and 22-mers of 124 genomes was fully consistent with the topology of the tree, obtained with concatenated aligned sequences of orthologous genes. Therefore, we propose strain-specific “barcodes” for rapid phylotyping. Using unique 22-mers for taxonomic analysis, we detected microbes of all groups in human microbiomes; however, their presence in the five samples was significantly different. Pointing to the intraspecies heterogeneity of E. coli in the natural microflora, this also indicates the feasibility of further studies of the role of this heterogeneity in maintaining population homeostasis.


2016 ◽  
Author(s):  
Brendan Halpin

AbstractAnalysts doing cluster analysis sometimes want the data to tell them the optimum number of clusters. Common "stopping rules" use the Calinski-Harabasz pseudo-F statistic and Duda-Hart indices, which are based on squared Euclidean distances between cases. Cluster analysis operates on a pairwise matrix of distances between the objects clusters, which are usually created from the observed variables. However, approaches such as expert judgement or algorithmic pattern-recognition (as used for instance in sequence analysis) often output matrices of pairwise similarity or difference whose relationship to the observed variables is much less direct. Built-in Stata utilities allow calculation of the CH and DH indices when cluster analysis starts from variables, but not with cluster analysis that starts from a pairwise distance matrix (unless the distances are squared Euclidean distances defined on variables which are still available). In this note I present two small Stata utilities that will calculate the CH and DH statistics from the distance matrix, if the distances are squared Euclidean. If the distances have another metric, these utilities can be seen as calculating a pseudo-CH pseudo-F or pseudo-DH statistic, potentially extending their use to new applications.-- Brendan Halpin, Head, Department of Sociology, University of Limerick, IrelandTel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F1-002 x 3147mailto:[email protected] ULSociology on Facebook: http://on.fb.me/fjIK9thttp://teaching.sociology.ul.ie/bhalpin/wordpress twitter:@ULSociology


2005 ◽  
Vol 56 (12) ◽  
pp. 1339 ◽  
Author(s):  
H. Yuan ◽  
G. Yan ◽  
K. H. M. Siddique ◽  
H. Yang

Narrow-leafed lupin is a major winter grain legume crop in the Australian farming system and a number of commercial cultivars are currently available to growers. A significant level of polymorphism was detected in narrow-leafed lupin cultivars by the randomly amplified microsatellite polymorphism (RAMP) approach, suggesting that cultivars harbour considerable DNA variation. Seventy-seven cultivar-specific markers were found among the 23 lupin cultivars examined and a dichotomous fingerprinting key was developed for the molecular identification of lupin cultivars. Cluster analysis of pairwise distance matrix computed from RAMP profiles grouped the 23 cultivars into 4–5 clusters, which generally agreed with their pedigree relationships.


Sign in / Sign up

Export Citation Format

Share Document