Clustering applications of IFDBSCAN algorithm with comparative analysis

Density Based Spatial Clustering of Application with Noise (DBSCAN) is one of the mostly preferred algorithm among density based clustering approaches in unsupervised machine learning, which uses epsilon neighborhood construction strategy in order to discover arbitrary shaped clusters. DBSCAN separates dense regions from low density regions and simultaneously assigns points that lie alone as outliers to unearth the hidden cluster patterns in the datasets. DBSCAN identifies dense regions by means of core point definition, detection of which are strictly dependent on input parameter definitions: ε is distance of the neighborhood or radius of hypersphere and MinPts is minimum density constraint inside ε radius hypersphere. Contrarily to classical DBSCAN’s crisp core point definition, intuitionistic fuzzy core point definition is proposed in our preliminary work to make DBSCAN algorithm capable of detecting different patterns of density by two different combinations of input parameters, particularly is a necessity for the density varying large datasets in multidimensional feature space. In this study, preliminarily proposed DBSCAN extension is studied: IFDBSCAN. The proposed extension is tested by computational experiments on several machine learning repository real-time datasets. Results show that, IFDBSCAN is superior to classical DBSCAN with respect to external & internal performance indices such as purity index, adjusted rand index, Fowlkes-Mallows score, silhouette coefficient, Calinski-Harabasz index and with respect to clustering structure results without increasing computational time so much, along with the possibility of trying two different density patterns on the same run and trying intermediary density values for the users by manipulating α margin.

Download Full-text

Density Based Clustering with Integrated One-Class SVM for Noise Reduction

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v6i3.pp199-208 ◽

2017 ◽

Vol 6 (3) ◽

pp. 199

Author(s):

K. Nafees Ahmed ◽

T. Abdul Razak

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Data Analysis ◽

Noise Reduction ◽

Spatial Clustering ◽

Support Vector ◽

Machine Learning Technique ◽

Learning Classifier ◽

Density Based Clustering ◽

Learning Technique

<p>Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class Support Vector Machine (SVM), a machine learning technique for noise reduction, a modified variant of DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. The Experimental results indicate high homogeneity levels in the clustering process.</p>

Download Full-text

Density Based Spatial Clustering Application with Noise by Varying Densities

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8757.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 5886-5891

Keyword(s):

Machine Learning ◽

Spatial Clustering ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Cluster Algorithms ◽

Model Based ◽

Dbscan Algorithm ◽

Density Values ◽

Original Size

Cluster algorithms are used for grouping up of similar points to form a cluster. It has seen mostly in Machine Learning algorithms. The most popular density-based algorithm is DBSCAN. DBSCAN can find the clusters, irrespective of its shapes and sizes of a cluster. DBSCAN algorithm can easily detect the noise in a clustering dataset. In the proposed algorithm we developed a model based on the existing dbscan algorithm. In the developed algorithm we focus mainly on the epsilon parameter value. Whenever the dbscan algorithm fails to form a cluster we increase the epsilon value by half of its original size. We repeat this step until a cluster is formed. Whenever a cluster is newly formed we change existing epsilon parameter value by adding the 10 percent of the previous used epsilon parameter value. We use epsilon for varying the density of a cluster. So, we can use the dbscan algorithm with the varying density values for developing a cluster. We applied this algorithm on the various datasets.

Download Full-text

Knee Muscle Force Estimating Model Using Machine Learning Approach

The Computer Journal ◽

10.1093/comjnl/bxaa160 ◽

2020 ◽

Author(s):

Anurag Sohane ◽

Ravinder Agarwal

Keyword(s):

Machine Learning ◽

Random Forest ◽

Muscle Force ◽

Vastus Lateralis ◽

Input Parameter ◽

Research Work ◽

Cost Effective ◽

Coefficient Of Determination ◽

Muscle Forces ◽

Knee Muscle

Abstract Various simulation type tools and conventional algorithms are being used to determine knee muscle forces of human during dynamic movement. These all may be good for clinical uses, but have some drawbacks, such as higher computational times, muscle redundancy and less cost-effective solution. Recently, there has been an interest to develop supervised learning-based prediction model for the computationally demanding process. The present research work is used to develop a cost-effective and efficient machine learning (ML) based models to predict knee muscle force for clinical interventions for the given input parameter like height, mass and angle. A dataset of 500 human musculoskeletal, have been trained and tested using four different ML models to predict knee muscle force. This dataset has obtained from anybody modeling software using AnyPyTools, where human musculoskeletal has been utilized to perform squatting movement during inverse dynamic analysis. The result based on the datasets predicts that the random forest ML model outperforms than the other selected models: neural network, generalized linear model, decision tree in terms of mean square error (MSE), coefficient of determination (R2), and Correlation (r). The MSE of predicted vs actual muscle forces obtained from the random forest model for Biceps Femoris, Rectus Femoris, Vastus Medialis, Vastus Lateralis are 19.92, 9.06, 5.97, 5.46, Correlation are 0.94, 0.92, 0.92, 0.94 and R2 are 0.88, 0.84, 0.84 and 0.89 for the test dataset, respectively.

Download Full-text

A machine-learning-based alloy design platform that enables both forward and inverse predictions for thermo-mechanically controlled processed (TMCP) steel alloys

Scientific Reports ◽

10.1038/s41598-021-90237-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jin-Woong Lee ◽

Chaewon Park ◽

Byung Do Lee ◽

Joonseo Park ◽

Nam Hoon Goo ◽

...

Keyword(s):

Machine Learning ◽

Feature Space ◽

Research Strategy ◽

Nsga Ii ◽

Steel Alloys ◽

Non Linear ◽

Data Driven Approach ◽

Processing Information ◽

Set Up ◽

World Industry

AbstractPredicting mechanical properties such as yield strength (YS) and ultimate tensile strength (UTS) is an intricate undertaking in practice, notwithstanding a plethora of well-established theoretical and empirical models. A data-driven approach should be a fundamental exercise when making YS/UTS predictions. For this study, we collected 16 descriptors (attributes) that implicate the compositional and processing information and the corresponding YS/UTS values for 5473 thermo-mechanically controlled processed (TMCP) steel alloys. We set up an integrated machine-learning (ML) platform consisting of 16 ML algorithms to predict the YS/UTS based on the descriptors. The integrated ML platform involved regularization-based linear regression algorithms, ensemble ML algorithms, and some non-linear ML algorithms. Despite the dirty nature of most real-world industry data, we obtained acceptable holdout dataset test results such as R2 > 0.6 and MSE < 0.01 for seven non-linear ML algorithms. The seven fully trained non-linear ML models were used for the ensuing ‘inverse design (prediction)’ based on an elitist-reinforced, non-dominated sorting genetic algorithm (NSGA-II). The NSGA-II enabled us to predict solutions that exhibit desirable YS/UTS values for each ML algorithm. In addition, the NSGA-II-driven solutions in the 16-dimensional input feature space were visualized using holographic research strategy (HRS) in order to systematically compare and analyze the inverse-predicted solutions for each ML algorithm.

Download Full-text

The prediction of aquifer groundwater level based on spatial clustering approach using machine learning

Environmental Monitoring and Assessment ◽

10.1007/s10661-021-08961-y ◽

2021 ◽

Vol 193 (4) ◽

Author(s):

Hamid Kardan Moghaddam ◽

Sami Ghordoyee Milan ◽

Zahra Kayhomayoon ◽

Zahra Rahimzadeh kivi ◽

Naser Arya Azar

Keyword(s):

Machine Learning ◽

Groundwater Level ◽

Spatial Clustering ◽

Clustering Approach

Download Full-text

Improved Visible Light-Based Indoor Positioning System Using Machine Learning Classification and Regression

Applied Sciences ◽

10.3390/app9061048 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1048 ◽

Cited By ~ 8

Author(s):

Huy Tran ◽

Cheolkeun Ha

Keyword(s):

Machine Learning ◽

Visible Light ◽

Noise Reduction ◽

Indoor Positioning ◽

Computational Time ◽

Dual Function ◽

Positioning System ◽

Positioning Accuracy ◽

Positioning Systems ◽

Machine Learning Classification

Recently, indoor positioning systems have attracted a great deal of research attention, as they have a variety of applications in the fields of science and industry. In this study, we propose an innovative and easily implemented solution for indoor positioning. The solution is based on an indoor visible light positioning system and dual-function machine learning (ML) algorithms. Our solution increases positioning accuracy under the negative effect of multipath reflections and decreases the computational time for ML algorithms. Initially, we perform a noise reduction process to eliminate low-intensity reflective signals and minimize noise. Then, we divide the floor of the room into two separate areas using the ML classification function. This significantly reduces the computational time and partially improves the positioning accuracy of our system. Finally, the regression function of those ML algorithms is applied to predict the location of the optical receiver. By using extensive computer simulations, we have demonstrated that the execution time required by certain dual-function algorithms to determine indoor positioning is decreased after area division and noise reduction have been applied. In the best case, the proposed solution took 78.26% less time and provided a 52.55% improvement in positioning accuracy.

Download Full-text

Forecasting System of Computational Time of DFT/TDDFT Calculations under the Multiverse Ansatz via Machine Learning and Cheminformatics

ACS Omega ◽

10.1021/acsomega.0c04981 ◽

2021 ◽

Vol 6 (3) ◽

pp. 2001-2024

Author(s):

Shuo Ma ◽

Yingjin Ma ◽

Baohua Zhang ◽

Yingqi Tian ◽

Zhong Jin

Keyword(s):

Machine Learning ◽

Computational Time ◽

Tddft Calculations ◽

Forecasting System

Download Full-text

A Machine Learning Approach to Reveal the NeuroPhenotypes of Autisms

International Journal of Neural Systems ◽

10.1142/s0129065718500582 ◽

2019 ◽

Vol 29 (07) ◽

pp. 1850058 ◽

Cited By ~ 8

Author(s):

Juan M. Górriz ◽

Javier Ramírez ◽

F. Segovia ◽

Francisco J. Martínez ◽

Meng-Chuan Lai ◽

...

Keyword(s):

Machine Learning ◽

Brain Structure ◽

Feature Space ◽

Classification Problem ◽

Small Sample ◽

Biological Sex ◽

Machine Learning Approach ◽

Learning Machine ◽

Small Sample Sizes ◽

Low Dimensional

Although much research has been undertaken, the spatial patterns, developmental course, and sexual dimorphism of brain structure associated with autism remains enigmatic. One of the difficulties in investigating differences between the sexes in autism is the small sample sizes of available imaging datasets with mixed sex. Thus, the majority of the investigations have involved male samples, with females somewhat overlooked. This paper deploys machine learning on partial least squares feature extraction to reveal differences in regional brain structure between individuals with autism and typically developing participants. A four-class classification problem (sex and condition) is specified, with theoretical restrictions based on the evaluation of a novel upper bound in the resubstitution estimate. These conditions were imposed on the classifier complexity and feature space dimension to assure generalizable results from the training set to test samples. Accuracies above [Formula: see text] on gray and white matter tissues estimated from voxel-based morphometry (VBM) features are obtained in a sample of equal-sized high-functioning male and female adults with and without autism ([Formula: see text], [Formula: see text]/group). The proposed learning machine revealed how autism is modulated by biological sex using a low-dimensional feature space extracted from VBM. In addition, a spatial overlap analysis on reference maps partially corroborated predictions of the “extreme male brain” theory of autism, in sexual dimorphic areas.

Download Full-text

Exploiting node metadata to predict interactions in large networks using graph embedding and neural networks

10.1101/2021.06.10.447991 ◽

2021 ◽

Author(s):

Rogini Runghen ◽

Daniel B Stouffer ◽

Giulio Valentino Dalla Riva

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Link Prediction ◽

Graph Embedding ◽

Feature Space ◽

Machine Learning Techniques ◽

Large Networks ◽

Data Set ◽

Learning Techniques ◽

Low Dimensional

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.

Download Full-text

treeheatr: an R package for interpretable decision tree visualizations

10.1101/2020.07.10.196352 ◽

2020 ◽

Author(s):

Trang T. Le ◽

Jason H. Moore

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Feature Space ◽

R Package ◽

Tree Structure ◽

Decision Tree Model ◽

Teaching Tool ◽

Tree Model ◽

Machine Learning Methods ◽

Link Type

AbstractSummarytreeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.AvailabilityThe treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous [email protected]

Download Full-text