A new method for unveiling open clusters in Gaia

Context. The publication of the Gaia Data Release 2 (Gaia DR2) opens a new era in astronomy. It includes precise astrometric data (positions, proper motions, and parallaxes) for more than 1.3 billion sources, mostly stars. To analyse such a vast amount of new data, the use of data-mining techniques and machine-learning algorithms is mandatory. Aims. A great example of the application of such techniques and algorithms is the search for open clusters (OCs), groups of stars that were born and move together, located in the disc. Our aim is to develop a method to automatically explore the data space, requiring minimal manual intervention. Methods. We explore the performance of a density-based clustering algorithm, DBSCAN, to find clusters in the data together with a supervised learning method such as an artificial neural network (ANN) to automatically distinguish between real OCs and statistical clusters. Results. The development and implementation of this method in a five-dimensional space (l, b, ϖ, μα*, μδ) with the Tycho-Gaia Astrometric Solution (TGAS) data, and a posterior validation using Gaia DR2 data, lead to the proposal of a set of new nearby OCs. Conclusions. We have developed a method to find OCs in astrometric data, designed to be applied to the full Gaia DR2 archive.

Download Full-text

Hunting for open clusters in Gaia DR2: the Galactic anticentre

Astronomy and Astrophysics ◽

10.1051/0004-6361/201935531 ◽

2019 ◽

Vol 627 ◽

pp. A35 ◽

Cited By ~ 15

Author(s):

A. Castro-Ginard ◽

C. Jordi ◽

X. Luri ◽

T. Cantat-Gaudin ◽

L. Balaguer-Núñez

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Open Clusters ◽

Fine Tuning ◽

Learning Methods ◽

Density Based Clustering ◽

Single Method ◽

Blind Search ◽

Physical Clusters ◽

Perseus Arm

Context. The Gaia Data Release 2 (DR2) provided an unprecedented volume of precise astrometric and excellent photometric data. In terms of data mining the Gaia catalogue, machine learning methods have shown to be a powerful tool, for instance in the search for unknown stellar structures. Particularly, supervised and unsupervised learning methods combined together significantly improves the detection rate of open clusters. Aims. We systematically scan Gaia DR2 in a region covering the Galactic anticentre and the Perseus arm (120° ≤ l ≤ 205° and −10° ≤ b ≤ 10°), with the goal of finding any open clusters that may exist in this region, and fine tuning a previously proposed methodology and successfully applied to TGAS data, adapting it to different density regions. Methods. Our methodology uses an unsupervised, density-based, clustering algorithm, DBSCAN, that identifies overdensities in the five-dimensional astrometric parameter space (l, b, ϖ, μα*, μδ) that may correspond to physical clusters. The overdensities are separated into physical clusters (open clusters) or random statistical clusters using an artificial neural network to recognise the isochrone pattern that open clusters show in a colour magnitude diagram. Results. The method is able to recover more than 75% of the open clusters confirmed in the search area. Moreover, we detected 53 open clusters unknown previous to Gaia DR2, which represents an increase of more than 22% with respect to the already catalogued clusters in this region. Conclusions. We find that the census of nearby open clusters is not complete. Different machine learning methodologies for a blind search of open clusters are complementary to each other; no single method is able to detect 100% of the existing groups. Our methodology has shown to be a reliable tool for the automatic detection of open clusters, designed to be applied to the full Gaia DR2 catalogue.

Download Full-text

A Survey of Network Embedding for Drug Analysis and Prediction

Current Protein and Peptide Science ◽

10.2174/1389203721666200702145701 ◽

2020 ◽

Vol 21 ◽

Author(s):

Zhixian Liu ◽

Qingfeng Chen ◽

Wei Lan ◽

Jiahai Liang ◽

Yiping Pheobe Chen ◽

...

Keyword(s):

Deep Learning ◽

Protein Function ◽

Dimensional Space ◽

Auxiliary Information ◽

Matrix Decomposition ◽

Drug Analysis ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Network Embedding ◽

Similarity Estimation

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.

Download Full-text

On the Experimental, Numerical and Data-Driven Methods to Study Urban Flows

Energies ◽

10.3390/en14051310 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1310

Author(s):

Pablo Torres ◽

Soledad Le Clainche ◽

Ricardo Vinuesa

Keyword(s):

Physical Phenomenon ◽

Urban Pollution ◽

Pollutant Dispersion ◽

Urban Environments ◽

Machine Learning Algorithms ◽

Data Driven ◽

Thermal Fields ◽

Use Of Data ◽

Sustainability Challenges ◽

New Research

Understanding the flow in urban environments is an increasingly relevant problem due to its significant impact on air quality and thermal effects in cities worldwide. In this review we provide an overview of efforts based on experiments and simulations to gain insight into this complex physical phenomenon. We highlight the relevance of coherent structures in urban flows, which are responsible for the pollutant-dispersion and thermal fields in the city. We also suggest a more widespread use of data-driven methods to characterize flow structures as a way to further understand the dynamics of urban flows, with the aim of tackling the important sustainability challenges associated with them. Artificial intelligence and urban flows should be combined into a new research line, where classical data-driven tools and machine-learning algorithms can shed light on the physical mechanisms associated with urban pollution.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

Adaptive Density-Based Clustering Algorithm with Shared KNN Conflict Game

Information Sciences ◽

10.1016/j.ins.2021.02.017 ◽

2021 ◽

Author(s):

Rui Zhang ◽

Tao Du ◽

Shouning Qu ◽

Hongwei Sun

Keyword(s):

Clustering Algorithm ◽

Density Based Clustering

Download Full-text

Distance Density Based Clustering Algorithm in Wireless Sensor Network

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.291-294.344 ◽

2011 ◽

Vol 291-294 ◽

pp. 344-348

Author(s):

Lin Lin ◽

Shu Yan ◽

Yi Nian

Keyword(s):

Clustering Algorithm ◽

Distribution Density ◽

Simulation Experiment ◽

Clustering Algorithms ◽

Wireless Sensor ◽

Energy Usage ◽

Cluster Heads ◽

Hierarchical Topology ◽

Energy Factors ◽

Density Based Clustering

The hierarchical topology of wireless sensor networks can effectively reduce the consumption in communication. Clustering algorithm is the foundation to realize herarchical structure, so it has been extensive researched. On the basis of Leach algorithm, a distance density based clustering algorithm (DDBC) is proposed, considering synthetically the distribution density of around nodes and the remaining energy factors of the node to dynamically banlance energy usage of nodes when selecting cluster heads. We analyzed the performance of DDBC through compared with the existing other clustering algorithms in simulation experiment. Results show that the proposed method can generare stable quantity cluster heads and banlance the energy load effectively.

Download Full-text

Emotion recognition using time–frequency ridges of EEG signals based on multivariate synchrosqueezing transform

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2020-0295 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Ahmet Mert ◽

Hasan Huseyin Celik

Keyword(s):

Emotion Recognition ◽

Feature Vector ◽

Dimensional Space ◽

Computational Cost ◽

Feature Space ◽

Maximum Energy ◽

Machine Learning Algorithms ◽

Eeg Signals ◽

Time Frequency ◽

Synchrosqueezing Transform

Abstract The feasibility of using time–frequency (TF) ridges estimation is investigated on multi-channel electroencephalogram (EEG) signals for emotional recognition. Without decreasing accuracy rate of the valence/arousal recognition, the informative component extraction with low computational cost will be examined using multivariate ridge estimation. The advanced TF representation technique called multivariate synchrosqueezing transform (MSST) is used to obtain well-localized components of multi-channel EEG signals. Maximum-energy components in the 2D TF distribution are determined using TF-ridges estimation to extract instantaneous frequency and instantaneous amplitude, respectively. The statistical values of the estimated ridges are used as a feature vector to the inputs of machine learning algorithms. Thus, component information in multi-channel EEG signals can be captured and compressed into low dimensional space for emotion recognition. Mean and variance values of the five maximum-energy ridges in the MSST based TF distribution are adopted as feature vector. Properties of five TF-ridges in frequency and energy plane (e.g., mean frequency, frequency deviation, mean energy, and energy deviation over time) are computed to obtain 20-dimensional feature space. The proposed method is performed on the DEAP emotional EEG recordings for benchmarking, and the recognition rates are yielded up to 71.55, and 70.02% for high/low arousal, and high/low valence, respectively.

Download Full-text

THE USE OF DATA MINING IN DETECTING CREDIT CARD FRAUD

10.31234/osf.io/uhqcs ◽

2022 ◽

Author(s):

Kingsley Austin

Keyword(s):

Machine Learning ◽

Credit Card ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

High Detection Rate ◽

Credit Card Fraud ◽

Real Time Processing ◽

Detection Systems ◽

Hybrid Approaches ◽

Use Of Data

Abstract— Credit card fraud is a serious problem for e-commerce retailers with UK merchants reporting losses of $574.2M in 2020. As a result, effective fraud detection systems must be in place to ensure that payments are processed securely in an online environment. From the literature, the detection of credit card fraud is challenging due to dataset imbalance (genuine versus fraudulent transactions), real-time processing requirements, and the dynamic behavior of fraudsters and customers. It is proposed in this paper that the use of machine learning could be an effective solution for combating credit card fraud.According to research, machine learning techniques can play a role in overcoming the identified challenges while ensuring a high detection rate of fraudulent transactions, both directly and indirectly. Even though both supervised and unsupervised machine learning algorithms have been suggested, the flaws in both methods point to the necessity for hybrid approaches.

Download Full-text

An Efficient Density-based Clustering Algorithm for Face Groping

Neurocomputing ◽

10.1016/j.neucom.2021.07.074 ◽

2021 ◽

Author(s):

Shenfei Pei ◽

Feiping Nie ◽

Rong Wang ◽

Xuelong Li

Keyword(s):

Clustering Algorithm ◽

Density Based Clustering

Download Full-text

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

Intelligent Data Analysis ◽

10.3233/ida-205497 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1453-1471

Author(s):

Chunhua Tang ◽

Han Wang ◽

Zhiwen Wang ◽

Xiangkun Zeng ◽

Huaran Yan ◽

...

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Substantial Improvement ◽

Experimental Results ◽

High Time ◽

Parameter Setting ◽

K Nearest Neighbor ◽

Density Based Clustering

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.

Download Full-text