Exploring a Bioinformatics Clustering Algorithm

10.26686/wgtn.16915627.v1 ◽

2021 ◽

Author(s):

◽

Mukhlis Matti

Keyword(s):

Experimental Data ◽

Clustering Algorithm ◽

Search Algorithm ◽

Significance Test ◽

Exhaustive Search ◽

Textual Data

<p>This thesis explores and evaluates MAXCCLUS, a bioinformatics clustering algorithm, which was designed to be used to cluster genes from microarray experimental data. MAXCCLUS does the clustering of genes depending on the textual data that describe the genes. MAXCCLUS attempts to create clusters of which it selects only the statistically significant clusters by running a significance test. It then attempts to generalise these clusters by using a simple greedy generalisation algorithm. We explore the behaviour of MAXCCLUS by running several clustering experiments that investigate various modifications to MAXCCLUS and its data. The thesis shows (a) that using the simple generalisation algorithm of MAXCCLUS gives better result than using an exhaustive search algorithm for generalisation, (b) the significance test that MAXCCLUS uses needs to be modified to take into consideration the dependency of some genes on other genes functionally, (c) it is advantageous to delete the non domain-relevant textual data that describe the genes but disadvantageous to add more textual data to describe the genes, and (d) that MAXCCLUS behaves poorly when it attempts to cluster genes that have adjacent categories instead of having two distinct categories only.</p>

Download Full-text

On the error-prone substructures for the binary-input ternary-output channel and its corresponding exhaustive search algorithm

2012 IEEE International Conference on Communications (ICC) ◽

10.1109/icc.2012.6363940 ◽

2012 ◽

Cited By ~ 1

Author(s):

Gyu Bum Kyung ◽

Chih-Chun Wang

Keyword(s):

Search Algorithm ◽

Exhaustive Search ◽

Output Channel ◽

Binary Input

Download Full-text

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2021100109 ◽

2021 ◽

Vol 12 (4) ◽

pp. 169-185

Author(s):

Saida Ishak Boushaki ◽

Omar Bendjeghaba ◽

Nadjet Kamel

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Latent Semantic Indexing ◽

Research Area ◽

Semantic Indexing ◽

Local Optima ◽

Symbiotic Organisms Search ◽

Symbiotic Organisms

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

Download Full-text

Multitarget Tracking Algorithm Based on Adaptive Network Graph Segmentation in the Presence of Measurement Origin Uncertainty

Sensors ◽

10.3390/s18113791 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3791

Author(s):

Tianli Ma ◽

Song Gao ◽

Chaobo Chen ◽

Xiaoru Song

Keyword(s):

Shortest Path ◽

Clustering Algorithm ◽

Search Algorithm ◽

Optimal Solution ◽

Tracking Algorithm ◽

Multitarget Tracking ◽

Integer Programming Problem ◽

Tracking Accuracy ◽

Network Graph ◽

Adaptive Network

To deal with the problem of multitarget tracking with measurement origin uncertainty, the paper presents a multitarget tracking algorithm based on Adaptive Network Graph Segmentation (ANGS). The multitarget tracking is firstly formulated as an Integer Programming problem for finding the maximum a posterior probability in a cost flow network. Then, a network structure is partitioned using an Adaptive Spectral Clustering algorithm based on the Nyström Method. In order to obtain the global optimal solution, the parallel A* search algorithm is used to process each sub-network. Moreover, the trajectory set is extracted by the Track Mosaic technique and Rauch–Tung–Striebel (RTS) smoother. Finally, the simulation results achieved for different clutter intensity indicate that the proposed algorithm has better tracking accuracy and robustness compared with the A* search algorithm, the successive shortest-path (SSP) algorithm and the shortest path faster (SPFA) algorithm.

Download Full-text

Clustering Bathymetric Data for Electronic Navigational Charts

Journal of Navigation ◽

10.1017/s0373463316000035 ◽

2016 ◽

Vol 69 (5) ◽

pp. 1143-1153 ◽

Cited By ~ 24

Author(s):

Marta Wlodarczyk–Sielicka ◽

Andrzej Stateczny

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Data Set ◽

Bathymetric Data ◽

Large Sets ◽

Analysis Of Results ◽

Comparison And Analysis ◽

Self Organising Map ◽

Source Of Information

An electronic navigational chart is a major source of information for the navigator. The component that contributes most significantly to the safety of navigation on water is the information on the depth of an area. For the purposes of this article, the authors use data obtained by the interferometric sonar GeoSwath Plus. The data were collected in the area of the Port of Szczecin. The samples constitute large sets of data. Data reduction is a procedure to reduce the size of a data set to make it easier and more effective to analyse. The main objective of the authors is the compilation of a new reduction algorithm for bathymetric data. The clustering of data is the first part of the search algorithm. The next step consists of generalisation of bathymetric data. This article presents a comparison and analysis of results of clustering bathymetric data using the following selected methods:K-means clustering algorithm, traditional hierarchical clustering algorithms and self-organising map (using artificial neural networks).

Download Full-text

205 Runs of homozygosity and analysis of inbreeding depression

Journal of Animal Science ◽

10.1093/jas/skz258.078 ◽

2019 ◽

Vol 97 (Supplement_3) ◽

pp. 39-40

Author(s):

Pattarapol Sumreddee ◽

Sajjad Toghiani ◽

Andrew J Roberts ◽

El H Hay ◽

Samuel E Aggrey ◽

...

Keyword(s):

Inbreeding Depression ◽

Clustering Algorithm ◽

Search Algorithm ◽

Pedigree Information ◽

Minimum Length ◽

Runs Of Homozygosity ◽

Model Based Clustering ◽

Genome Wide ◽

Hereford Cattle ◽

Density Marker

Abstract Pedigree information was traditionally used to assess inbreeding. Availability of high-density marker panels provides an alternative to assess inbreeding, particularly in the presence of incomplete and error-prone pedigrees. Assessment of autozygosity across chromosomal segments using runs of homozygosity (ROH) is emerging as a valuable tool to estimate inbreeding due to its general flexibility and ability to quantify chromosomal contribution to genome-wide inbreeding. Unfortunately, identifying ROH segments is sensitive to the parameters used during the search process. These parameters are heuristically set, leading to significant variation in the results. The minimum length required to identify a ROH segment has major effects on the estimation of inbreeding, yet it is arbitrarily set. Understanding the rise, purging, and the effects of deleterious mutations requires the ability to discriminate between ancient and recent inbreeding. However, thresholds to discriminate between short and long ROH segments are largely unknown. To address these questions, an inbred Hereford cattle population of 785 animals genotyped for 30,220 SNPs was used. A search algorithm to approximate mutation loads was used to determine the minimum length of ROH segments. It consisted of finding genome segments with significant differences in trait means between animals with high and low autozygosity intervals at certain threshold values. The minimum length was around 1 Mb for weaning and yearling weights and ADG, and 2.5 Mb for birth weight. Using a model-based clustering algorithm, a mixture of three Gaussian distributions was clearly separable, resulting in three classes of short (< 6.16 Mb), medium (6.16–12.57 Mb), and long (>12.27 Mb) ROH segments, representing ancient, intermediate, and recent inbreeding. Contribution of ancient, intermediate and recent to genome-wide inbreeding was 37.4%, 40.1% and 22.5%, respectively. Inbreeding depression analyses showed a greater damaging effect of recent inbreeding, likely due to purging of old highly deleterious haplotypes.

Download Full-text

On Using Gray Codes to Improve the Efficiency of the Parallel Exhaustive Search Algorithm for the Knapsack Problem

2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) ◽

10.1109/eiconrus.2019.8657131 ◽

2019 ◽

Author(s):

Kupriyashina Natalia ◽

Kupriyashin Mikhail ◽

Borzunov Georgii

Keyword(s):

Knapsack Problem ◽

Search Algorithm ◽

Exhaustive Search ◽

Gray Codes

Download Full-text

An Adaptive Ellipse Distance Density Peak Fuzzy Clustering Algorithm Based on the Multi-target Traffic Radar

Sensors ◽

10.3390/s20174920 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4920

Author(s):

Lin Cao ◽

Xinyi Zhang ◽

Tao Wang ◽

Kangning Du ◽

Chong Fu

Keyword(s):

Euclidean Distance ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Search Algorithm ◽

Measurement Data ◽

Data Sets ◽

Density Peak ◽

Close Range ◽

Decision Graph ◽

Membership Matrix

In the multi-target traffic radar scene, the clustering accuracy between vehicles with close driving distance is relatively low. In response to this problem, this paper proposes a new clustering algorithm, namely an adaptive ellipse distance density peak fuzzy (AEDDPF) clustering algorithm. Firstly, the Euclidean distance is replaced by adaptive ellipse distance, which can more accurately describe the structure of data obtained by radar measurement vehicles. Secondly, the adaptive exponential function curve is introduced in the decision graph of the fast density peak search algorithm to accurately select the density peak point, and the initialization of the AEDDPF algorithm is completed. Finally, the membership matrix and the clustering center are calculated through successive iterations to obtain the clustering result.The time complexity of the AEDDPF algorithm is analyzed. Compared with the density-based spatial clustering of applications with noise (DBSCAN), k-means, fuzzy c-means (FCM), Gustafson-Kessel (GK), and adaptive Euclidean distance density peak fuzzy (Euclid-ADDPF) algorithms, the AEDDPF algorithm has higher clustering accuracy for real measurement data sets in certain scenarios. The experimental results also prove that the proposed algorithm has a better clustering effect in some close-range vehicle scene applications. The generalization ability of the proposed AEDDPF algorithm applied to other types of data is also analyzed.

Download Full-text