Development of novel combustion risk index for flammable liquids based on unsupervised clustering algorithms

Abstract Hantaviruses belong to the Bunyaviridae family with small mammals hosting them. Humans are infected either by inhaling virus-containing aerosols or through contact with animal droppings. Even if rodents host the pathogenic species and humans are dead-end hosts, they still get accidentally infected. The Andes Orthohantavirus (ANDV) seems to be the only species with documented person-to-person transmission. Hemorrhagic fever with renal syndrome (HFRS) and Hantavirus cardiopulmonary syndrome (HCPS) are both serious syndromes associated with hantavirus infections. For both syndromes, the mortality rate is near 40%. Decades of studies have already highlighted the CpG repression in RNA viruses, and both the estimation of the CpG odds ratio and the correlation with their genome polarity were dominant factors in figuring out the CpG bias. We conducted the differential analysis of the CpG odds ratio for all the orthohantaviruses on the full segmented genomes (L, M, S). The results suggested the statistical significance of the three groups. The “Small” genomes were more informative from the CpG odd ratio point of view. We calculated the CpG odds ratio for all the Orthohantaviruses within these segments and furthermore estimated the correlation coefficient with the relative coding sequences (CDS). Preliminary results first confirmed the CpG odds ratio as the lowest among all the nucleotides. Second, the Andes virus was highlighted as the one with the highest CpG odds ratio within CDS. The use of these two measures as features for unsupervised clustering algorithms has allowed us to identify four different sub-groups within the Orthohantaviridae family. The evidence is that the Andes Hantavirus exhibits a peculiar CpG odds ratio distribution, probably linked to its unique characteristic of passing from person to person.

Download Full-text

Exploring and Comparing Unsupervised Clustering Algorithms

Journal of Open Research Software ◽

10.5334/jors.269 ◽

2020 ◽

Vol 8 ◽

Author(s):

Marc Lavielle ◽

Philip D. Waggoner

Keyword(s):

Clustering Algorithms ◽

Unsupervised Clustering

Download Full-text

Authorship Attribution using Unsupervised Clustering Algorithms on English C50 News Articles

IARJSET ◽

10.17148/iarjset.2017.4747 ◽

2017 ◽

Vol 4 (7) ◽

pp. 272-276

Author(s):

Dr. O Srinivasa Rao ◽

Ganapathi Raju Dr. N V ◽

Srilalitha Dr. Y

Keyword(s):

Clustering Algorithms ◽

Unsupervised Clustering ◽

Authorship Attribution

Download Full-text

Using Machine-Learning Techniques to Identify Responders vs. Non-responders in Randomized Clinical Trials.

10.1101/2020.11.21.20232041 ◽

2020 ◽

Author(s):

Vasiliki Nikolodimou ◽

Paul Agapow

Keyword(s):

Machine Learning ◽

Randomized Clinical Trials ◽

Clustering Algorithms ◽

Human Monoclonal Antibody ◽

Clinical History ◽

Differential Response ◽

Unsupervised Clustering ◽

Machine Learning Techniques ◽

Genetic Characteristics ◽

Learning Techniques

Despite the expectation of heterogeneity in therapy outcomes, especially for complex diseases like cancer, analyzing differential response to experimental therapies in a randomized clinical trial (RCT) setting is typically done by dividing patients into responders and non-responders, usually based on a single endpoint. Given the existence of biological and patho-physiological differences among metastatic colorectal cancer (mCRC) patients, we hypothesized that a data-driven analysis of an RCT population outcomes can identify sub-types of patients founded on differential response to Panitumumab - a fully human monoclonal antibody directed against the epidermal growth factor receptor. Outcome and response data of the RCT population were mined with heuristic, distance-based and model-based unsupervised clustering algorithms. The population sub-groups obtained by the best performing clustering approach were then examined in terms of molecular and clinical characteristics. The utility of this characterization was compared against that of the sub-groups obtained by the conventional responders' analysis and then contrasted with aetiological evidence around mCRC heterogeneity and biological functioning. The Partition around Medoids clustering method results into the identification of seven sub-types of patients, statistically distinct from each other in survival outcomes, prognostic biomarkers and genetic characteristics. Conventional responders analysis was proven inferior in uncovering relationships between physical, clinical history, genetic attributes and differential treatment resistance mechanisms. Combined with improved characterization of the molecular subtypes of CRC, applying Machine Learning techniques, like unsupervised clustering, onto the wealth of data already collected by previous RCTs can support the design of further targeted, more efficient RCTs and better identification of patient groups who will respond to a given intervention.

Download Full-text

COMPUTATIONAL INTELLIGENCE METHODS FOR FINANCIAL TIME SERIES MODELING

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127406015891 ◽

2006 ◽

Vol 16 (07) ◽

pp. 2053-2062 ◽

Cited By ~ 11

Author(s):

N. G. PAVLIDIS ◽

D. K. TASOULIS ◽

V. P. PLAGIANAKOS ◽

M. N. VRAHATIS

Keyword(s):

Neural Networks ◽

Time Series ◽

Exchange Rate ◽

Ad Hoc ◽

Clustering Algorithms ◽

Real Life ◽

Feedforward Neural Networks ◽

Unsupervised Clustering ◽

Space Partitioning ◽

Input Space

In this paper, the combination of unsupervised clustering algorithms with feedforward neural networks in exchange rate time series forecasting is studied. Unsupervised clustering algorithms have the desirable property of deciding on the number of partitions required to accurately segment the input space during the clustering process, thus relieving the user from making this ad hoc choice. Combining this input space partitioning methodology with feedforward neural networks acting as local predictors for each identified cluster helps alleviate the problem of nonstationarity frequently encountered in real-life applications. An improvement in the one-step-ahead forecasting accuracy was achieved compared to a global feedforward neural network model for the time series of the exchange rate of the German Mark to the US Dollar.

Download Full-text

Unsupervised clustering algorithms for flow/mass cytometry data

Computational Methods with Applications in Bioinformatics Analysis ◽

10.1142/9789813207981_0010 ◽

2017 ◽

pp. 193-206 ◽

Cited By ~ 1

Author(s):

Jinmiao Chen ◽

Feng Lin

Keyword(s):

Clustering Algorithms ◽

Unsupervised Clustering ◽

Mass Cytometry

Download Full-text

UNSUPERVISED CLUSTERING USING FRACTAL DIMENSION

International Journal of Bifurcation and Chaos ◽

10.1142/s021812740601591x ◽

2006 ◽

Vol 16 (07) ◽

pp. 2073-2079 ◽

Cited By ~ 7

Author(s):

D. K. TASOULIS ◽

M. N. VRAHATIS

Keyword(s):

Fractal Dimension ◽

Clustering Algorithm ◽

Exploratory Data Analysis ◽

Clustering Algorithms ◽

Unsupervised Clustering ◽

Qualitative Information ◽

Number Of Clusters ◽

Clustering Problem ◽

Exploratory Data ◽

Real World Problems

Clustering can be defined as the process of "grouping" a collection of objects into subsets or clusters. The clustering problem has been addressed in numerous contexts and by researchers in different disciplines. This reflects its broad appeal and usefulness as an exploratory data analysis approach. Unsupervised clustering algorithms have been developed to address real world problems in which the number of clusters present in the dataset is unknown. These algorithms approximate the number of clusters while performing the clustering procedure. This paper is a first step towards the development of unsupervised clustering algorithms capable of identifying clusters within clusters. To this end, an unsupervised clustering algorithm is modified so as to take into consideration the fractal dimension of the data. The experimental results indicate that this approach can provide further qualitative information compared to the unsupervised clustering algorithm.

Download Full-text

A visual data-mining methodology for seismic facies analysis: Part 1 — Testing and comparison with other unsupervised clustering methods

Geophysics ◽

10.1190/1.3046455 ◽

2009 ◽

Vol 74 (1) ◽

pp. P1-P11 ◽

Cited By ~ 38

Author(s):

Iván Dimitri Marroquín ◽

Jean-Jules Brault ◽

Bruce S. Hart

Keyword(s):

Data Mining ◽

Pattern Recognition ◽

Seismic Data ◽

Facies Analysis ◽

Clustering Algorithms ◽

Unsupervised Clustering ◽

Seismic Facies ◽

Visual Data ◽

Visual Data Mining ◽

Seismic Facies Analysis

Seismic facies analysis aims to identify clusters (groups) of similar seismic trace shapes, where each cluster can be considered to represent variability in lithology, rock properties, and/or fluid content of the strata being imaged. Unfortunately, it is not always clear whether the seismic data has a natural clustering structure. Cluster analysis consists of a family of approaches that have significant potential for classifying seismic trace shapes into meaningful clusters. The clustering can be performed using a supervised process (assigning a pattern to a predefined cluster) or an unsupervised process (partitioning a collection of patterns into groups without predefined clusters). We evaluate and compare different unsupervised clustering algorithms (e.g., partition, hierarchical, probabilistic, and soft competitive models) for pattern recognition based entirely on the characteristics of the seismic response. From validation results on simple data sets, we demonstrate that a self-organizing maps algorithm implemented in a visual data-mining approach outperforms all other clustering algorithms for interpreting the cluster structure. We apply this approach to 2D seismic models generated using a discrete, known number of different stratigraphic geometries. The visual strategy recovers the correct number of end-member seismic facies in the model tests, showing that it is suitable for pattern recognition in highly correlated and continuous seismic data.

Download Full-text