scholarly journals Exploring a Bioinformatics Clustering Algorithm

2021 ◽  
Author(s):  
◽  
Mukhlis Matti

<p>This thesis explores and evaluates MAXCCLUS, a bioinformatics clustering algorithm, which was designed to be used to cluster genes from microarray experimental data. MAXCCLUS does the clustering of genes depending on the textual data that describe the genes. MAXCCLUS attempts to create clusters of which it selects only the statistically significant clusters by running a significance test. It then attempts to generalise these clusters by using a simple greedy generalisation algorithm. We explore the behaviour of MAXCCLUS by running several clustering experiments that investigate various modifications to MAXCCLUS and its data. The thesis shows (a) that using the simple generalisation algorithm of MAXCCLUS gives better result than using an exhaustive search algorithm for generalisation, (b) the significance test that MAXCCLUS uses needs to be modified to take into consideration the dependency of some genes on other genes functionally, (c) it is advantageous to delete the non domain-relevant textual data that describe the genes but disadvantageous to add more textual data to describe the genes, and (d) that MAXCCLUS behaves poorly when it attempts to cluster genes that have adjacent categories instead of having two distinct categories only.</p>

2021 ◽  
Author(s):  
◽  
Mukhlis Matti

<p>This thesis explores and evaluates MAXCCLUS, a bioinformatics clustering algorithm, which was designed to be used to cluster genes from microarray experimental data. MAXCCLUS does the clustering of genes depending on the textual data that describe the genes. MAXCCLUS attempts to create clusters of which it selects only the statistically significant clusters by running a significance test. It then attempts to generalise these clusters by using a simple greedy generalisation algorithm. We explore the behaviour of MAXCCLUS by running several clustering experiments that investigate various modifications to MAXCCLUS and its data. The thesis shows (a) that using the simple generalisation algorithm of MAXCCLUS gives better result than using an exhaustive search algorithm for generalisation, (b) the significance test that MAXCCLUS uses needs to be modified to take into consideration the dependency of some genes on other genes functionally, (c) it is advantageous to delete the non domain-relevant textual data that describe the genes but disadvantageous to add more textual data to describe the genes, and (d) that MAXCCLUS behaves poorly when it attempts to cluster genes that have adjacent categories instead of having two distinct categories only.</p>


2021 ◽  
Vol 12 (4) ◽  
pp. 169-185
Author(s):  
Saida Ishak Boushaki ◽  
Omar Bendjeghaba ◽  
Nadjet Kamel

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.


Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3791
Author(s):  
Tianli Ma ◽  
Song Gao ◽  
Chaobo Chen ◽  
Xiaoru Song

To deal with the problem of multitarget tracking with measurement origin uncertainty, the paper presents a multitarget tracking algorithm based on Adaptive Network Graph Segmentation (ANGS). The multitarget tracking is firstly formulated as an Integer Programming problem for finding the maximum a posterior probability in a cost flow network. Then, a network structure is partitioned using an Adaptive Spectral Clustering algorithm based on the Nyström Method. In order to obtain the global optimal solution, the parallel A* search algorithm is used to process each sub-network. Moreover, the trajectory set is extracted by the Track Mosaic technique and Rauch–Tung–Striebel (RTS) smoother. Finally, the simulation results achieved for different clutter intensity indicate that the proposed algorithm has better tracking accuracy and robustness compared with the A* search algorithm, the successive shortest-path (SSP) algorithm and the shortest path faster (SPFA) algorithm.


2016 ◽  
Vol 69 (5) ◽  
pp. 1143-1153 ◽  
Author(s):  
Marta Wlodarczyk–Sielicka ◽  
Andrzej Stateczny

An electronic navigational chart is a major source of information for the navigator. The component that contributes most significantly to the safety of navigation on water is the information on the depth of an area. For the purposes of this article, the authors use data obtained by the interferometric sonar GeoSwath Plus. The data were collected in the area of the Port of Szczecin. The samples constitute large sets of data. Data reduction is a procedure to reduce the size of a data set to make it easier and more effective to analyse. The main objective of the authors is the compilation of a new reduction algorithm for bathymetric data. The clustering of data is the first part of the search algorithm. The next step consists of generalisation of bathymetric data. This article presents a comparison and analysis of results of clustering bathymetric data using the following selected methods:K-means clustering algorithm, traditional hierarchical clustering algorithms and self-organising map (using artificial neural networks).


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 39-40
Author(s):  
Pattarapol Sumreddee ◽  
Sajjad Toghiani ◽  
Andrew J Roberts ◽  
El H Hay ◽  
Samuel E Aggrey ◽  
...  

Abstract Pedigree information was traditionally used to assess inbreeding. Availability of high-density marker panels provides an alternative to assess inbreeding, particularly in the presence of incomplete and error-prone pedigrees. Assessment of autozygosity across chromosomal segments using runs of homozygosity (ROH) is emerging as a valuable tool to estimate inbreeding due to its general flexibility and ability to quantify chromosomal contribution to genome-wide inbreeding. Unfortunately, identifying ROH segments is sensitive to the parameters used during the search process. These parameters are heuristically set, leading to significant variation in the results. The minimum length required to identify a ROH segment has major effects on the estimation of inbreeding, yet it is arbitrarily set. Understanding the rise, purging, and the effects of deleterious mutations requires the ability to discriminate between ancient and recent inbreeding. However, thresholds to discriminate between short and long ROH segments are largely unknown. To address these questions, an inbred Hereford cattle population of 785 animals genotyped for 30,220 SNPs was used. A search algorithm to approximate mutation loads was used to determine the minimum length of ROH segments. It consisted of finding genome segments with significant differences in trait means between animals with high and low autozygosity intervals at certain threshold values. The minimum length was around 1 Mb for weaning and yearling weights and ADG, and 2.5 Mb for birth weight. Using a model-based clustering algorithm, a mixture of three Gaussian distributions was clearly separable, resulting in three classes of short (&lt; 6.16 Mb), medium (6.16–12.57 Mb), and long (&gt;12.27 Mb) ROH segments, representing ancient, intermediate, and recent inbreeding. Contribution of ancient, intermediate and recent to genome-wide inbreeding was 37.4%, 40.1% and 22.5%, respectively. Inbreeding depression analyses showed a greater damaging effect of recent inbreeding, likely due to purging of old highly deleterious haplotypes.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4920
Author(s):  
Lin Cao ◽  
Xinyi Zhang ◽  
Tao Wang ◽  
Kangning Du ◽  
Chong Fu

In the multi-target traffic radar scene, the clustering accuracy between vehicles with close driving distance is relatively low. In response to this problem, this paper proposes a new clustering algorithm, namely an adaptive ellipse distance density peak fuzzy (AEDDPF) clustering algorithm. Firstly, the Euclidean distance is replaced by adaptive ellipse distance, which can more accurately describe the structure of data obtained by radar measurement vehicles. Secondly, the adaptive exponential function curve is introduced in the decision graph of the fast density peak search algorithm to accurately select the density peak point, and the initialization of the AEDDPF algorithm is completed. Finally, the membership matrix and the clustering center are calculated through successive iterations to obtain the clustering result.The time complexity of the AEDDPF algorithm is analyzed. Compared with the density-based spatial clustering of applications with noise (DBSCAN), k-means, fuzzy c-means (FCM), Gustafson-Kessel (GK), and adaptive Euclidean distance density peak fuzzy (Euclid-ADDPF) algorithms, the AEDDPF algorithm has higher clustering accuracy for real measurement data sets in certain scenarios. The experimental results also prove that the proposed algorithm has a better clustering effect in some close-range vehicle scene applications. The generalization ability of the proposed AEDDPF algorithm applied to other types of data is also analyzed.


Sign in / Sign up

Export Citation Format

Share Document