scholarly journals CLUSTER ANALYSIS IDENTIFIES VARIABLES RELATED TO PROGNOSIS OF BREAST CANCER DISEASE

2021 ◽  
Vol 39 (4) ◽  
Author(s):  
Neyva Maria Lopes Romeiro ◽  
Mara Caroline Torres dos SANTOS ◽  
Carolina PANIS ◽  
Tiago Viana Flor de SANTANA ◽  
Paulo Laerte NATTI ◽  
...  

This work presents a cluster analysis approach aiming to determine distinct groups based on clinicopathological data from patients with breast cancer (BC). For this purpose, the clinical variables were considered: age at diagnosis, weight, height, lymph nodal invasion (LN), tumor-node-metastasis (TNM) staging and body mass index (BMI). Ward's hierarchical clustering algorithm was used to form specific groups. Based on this, BC patients were separated into four groups. The Kruskal-Wallis test was performed to assess the differences among the clusters. The intensity of the influence of variables on the prognosis of BC was also evaluated by calculating the Spearman's correlation. Positive correlations were obtained between weight and BMI, TNM and LN invasion in all analyzes. Negative correlations between BMI and height were obtained in some of the analyzes. Finally, a new correlation was obtained, based on this approach, between weight and TNM, demonstrating that the trophic-adipose status of BC patients can be directly related to disease staging.

Author(s):  
Ailong Fan ◽  
Xinping Yan ◽  
Qizhi Yin ◽  
Xing Sun ◽  
Di Zhang

This article examines the distribution characteristics of the navigational environment in the Yangtze River trunk line using several information collection sensors installed on ships that navigate in this line. Through experiments on these ships, data of energy consumption and the navigational environment are collected. Water flow and waterway depth are proved to be the main influencing factors on the ship energy consumption via Spearman’s correlation analysis. Next, data of water velocity and waterway depth that cover the entire trunk line are graded using the k-means clustering algorithm. To build an evaluation matrix of navigational environment, the frequency distribution of each grade in different Yangtze River legs is counted statistically, and on this basis, similar legs are clustered using the hierarchical clustering algorithm. In this way, the waterway partition in the Yangtze River trunk line is completed. Furthermore, the distribution of the energy consumption of ships in different legs is also calculated. The study results indicate that not only the navigational environment of the Yangtze River trunk line but also the energy consumption level of ships have distinctive regional differences. Finally, the laws of the Yangtze River navigational environment are analyzed, and the corresponding energy-saving navigation strategies are proposed, which are useful for crews to operate their ships in energy-efficient and safe conditions.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


2021 ◽  
Vol 13 (3) ◽  
pp. 1089
Author(s):  
Hailin Zheng ◽  
Qinyou Hu ◽  
Chun Yang ◽  
Jinhai Chen ◽  
Qiang Mei

Since the spread of the coronavirus disease 2019 (COVID-19) pandemic, the transportation of cargo by ship has been seriously impacted. In order to prevent and control maritime COVID-19 transmission, it is of great significance to track and predict ship sailing behavior. As the nodes of cargo ship transportation networks, ports of call can reflect the sailing behavior of the cargo ship. Accurate hierarchical division of ports of call can help to clarify the navigation law of ships with different ship types and scales. For typical cargo ships, ships with deadweight over 10,000 tonnages account for 95.77% of total deadweight, and 592,244 berthing ships’ records were mined from automatic identification system (AIS) from January to October 2020. Considering ship type and ship scale, port hierarchy classification models are constructed to divide these ports into three kinds of specialized ports, including bulk, container, and tanker ports. For all types of specialized ports (considering ship scale), port call probability for corresponding ship type is higher than other ships, positively correlated with the ship deadweight if port scale is bigger than ship scale, and negatively correlated with the ship deadweight if port scale is smaller than ship scale. Moreover, port call probability for its corresponding ship type is positively correlated with ship deadweight, while port call probability for other ship types is negatively correlated with ship deadweight. Results indicate that a specialized port hierarchical clustering algorithm can divide the hierarchical structure of typical cargo ship calling ports, and is an effective method to track the maritime transmission path of the COVID-19 pandemic.


Author(s):  
N. P. Szabó ◽  
B. A. Braun ◽  
M. M. G. Abdelrahman ◽  
M. Dobróka

AbstractThe identification of lithology, fluid types, and total organic carbon content are of great priority in the exploration of unconventional hydrocarbons. As a new alternative, a further developed K-means type clustering method is suggested for the evaluation of shale gas formations. The traditional approach of cluster analysis is mainly based on the use of the Euclidean distance for grouping the objects of multivariate observations into different clusters. The high sensitivity of the L2 norm applied to non-Gaussian distributed measurement noises is well-known, which can be reduced by selecting a more suitable norm as distance metrics. To suppress the harmful effect of non-systematic errors and outlying data, the Most Frequent Value method as a robust statistical estimator is combined with the K-means clustering algorithm. The Cauchy-Steiner weights calculated by the Most Frequent Value procedure is applied to measure the weighted distance between the objects, which improves the performance of cluster analysis compared to the Euclidean norm. At the same time, the centroids are also calculated as a weighted average (using the Most Frequent Value method), instead of applying arithmetic mean. The suggested statistical method is tested using synthetic datasets as well as observed wireline logs, mud-logging data and core samples collected from the Barnett Shale Formation, USA. The synthetic experiment using extremely noisy well logs demonstrates that the newly developed robust clustering procedure is able to separate the geological-lithological units in hydrocarbon formations and provide additional information to standard well log analysis. It is also shown that the Cauchy-Steiner weighted cluster analysis is affected less by outliers, which allows a more efficient processing of poor-quality wireline logs and an improved evaluation of shale gas reservoirs.


Sign in / Sign up

Export Citation Format

Share Document