Intelligent Data Analysis

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

Intelligent Data Analysis ◽

10.3233/ida-205497 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1453-1471

Author(s):

Chunhua Tang ◽

Han Wang ◽

Zhiwen Wang ◽

Xiangkun Zeng ◽

Huaran Yan ◽

...

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Substantial Improvement ◽

Experimental Results ◽

High Time ◽

Parameter Setting ◽

K Nearest Neighbor ◽

Density Based Clustering

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.

Download Full-text

An improved YOLOv3 model for detecting location information of ovarian cancer from CT images

Intelligent Data Analysis ◽

10.3233/ida-205542 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1565-1578

Author(s):

Xun Wang ◽

Hanlin Li ◽

Lisheng Wang ◽

Yongzhi Yu ◽

Hao Zhou ◽

...

Keyword(s):

Ovarian Cancer ◽

Cancer Cells ◽

Ovarian Cancer Cells ◽

Location Information ◽

Learning Technology ◽

Medical College ◽

Geometric Deformation ◽

Women's Lives ◽

Aided Diagnosis ◽

Better Than

Ovarian cancer is a malignant tumor that poses a serious threat to women’s lives. Computer-aided diagnosis (CAD) systems can classify the type of ovarian tumors, but few of them can provide exactly the location information of ovarian cancer cells. Recently, deep learning technology becomes hot for automatic detection of cancer cells, particularly for detecting their locations. In this work, we propose a novel end-to-end network YOLO-OC (Ovarian cancer) model, which can extract the characteristics of ovarian cancer more efficiently. In our method, deformable convolution is used to enhance the model’s ability to learn geometric deformation in space. Squeeze-and-Excitation (SE) module is proposed to automatically learn the importance of different channel features. Data experiments are conducted on datasets collected from The Affiliated Hospital of Qingdao University Medical College, China. Experimental results show that our YOLO-OC model achieves 91.83%, 85.66% and 73.82% on mean average precision [email protected], [email protected] and mAP@[.5,.95], respectively, which performs better than Faster R-CNN, SSD and RetinaNet on both accuracy and efficiency.

Download Full-text

MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data

Intelligent Data Analysis ◽

10.3233/ida-205340 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1507-1524

Author(s):

Chunying Zhang ◽

Ruiyan Gao ◽

Jiahao Wang ◽

Song Chen ◽

Fengchun Liu ◽

...

Keyword(s):

Measurement Method ◽

Clustering Algorithm ◽

Average Distance ◽

Boundary Region ◽

Data Sets ◽

Calculation Formula ◽

Information Granule ◽

Clustering Problem ◽

Definition Of ◽

Multiple Clusters

In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.

Download Full-text

Attention mechanism based LSTM in classification of stressed speech under workload

Intelligent Data Analysis ◽

10.3233/ida-205429 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1603-1627

Author(s):

Xiao Yao ◽

Zhengyan Sheng ◽

Min Gu ◽

Haibin Wang ◽

Ning Xu ◽

...

Keyword(s):

Feature Fusion ◽

Small Sample ◽

Attention Mechanism ◽

Fusion Model ◽

Speech Corpus ◽

Traditional Methods ◽

Transient Nature ◽

Stress Classification ◽

Proposed Model ◽

Small Sample Sizes

In order to improve the robustness of speech recognition systems, this study attempts to classify stressed speech caused by the psychological stress under multitasking workloads. Due to the transient nature and ambiguity of stressed speech, the stress characteristics is not represented in all the segments in stressed speech as labeled. In this paper, we propose a multi-feature fusion model based on the attention mechanism to measure the importance of segments for stress classification. Through the attention mechanism, each speech frame is weighted to reflect the different correlations to the actual stressed state, and the multi-channel fusion of features characterizing the stressed speech to classify the speech under stress. The proposed model further adopts SpecAugment in view of the feature spectrum for data augment to resolve small sample sizes problem among stressed speech. During the experiment, we compared the proposed model with traditional methods on CASIA Chinese emotion corpus and Fujitsu stressed speech corpus, and results show that the proposed model has better performance in speaker-independent stress classification. Transfer learning is also performed for speaker-dependent classification for stressed speech, and the performance is improved. The attention mechanism shows the advantage for continuous speech under stress in authentic context comparing with traditional methods.

Download Full-text

Forecasting emergency department admissions

Intelligent Data Analysis ◽

10.3233/ida-205390 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1579-1601

Author(s):

Carlos Narciso Rocha ◽

Fátima Rodrigues

Keyword(s):

Neural Network ◽

Emergency Department ◽

Recurrent Neural Network ◽

Patient Flow ◽

Clinical Staff ◽

Quality Service ◽

Emergency Admissions ◽

Machine Learning Methods ◽

Recurrent Neuronal Network ◽

High Quality Service

The emergency department of a hospital plays an extremely important role in the healthcare of patients. To maintain a high quality service, clinical professionals need information on how patient flow will evolve in the immediate future. With accurate emergency department forecasts it is possible to better manage available human resources by allocating clinical staff before peak periods, thus preventing service congestion, or releasing clinical staff at less busy times. This paper describes a solution developed for the presentation of hourly, four-hour, eight-hour and daily number of admissions to a hospital’s emergency department. A 10-year history (2009–2018) of the number of emergency admissions in a Portuguese hospital was used. To create the models several methods were tested, including exponential smoothing, SARIMA, autoregressive and recurrent neural network, XGBoost and ensemble learning. The models that generated the most accurate hourly time predictions were the recurrent neural network with one-layer (sMAPE = 23.26%) and with three layers (sMAPE = 23.12%) and XGBoost (sMAPE = 23.70%). In terms of efficiency, the XGBoost method has by far outperformed all others. The success of the recurrent neuronal network and XGBoost machine learning methods applied to the prediction of the number of emergency department admissions has been demonstrated here, with an accuracy that surpasses the models found in the literature.

Download Full-text

Analyzing mixed-type data by using word embedding for handling categorical features

Intelligent Data Analysis ◽

10.3233/ida-205453 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1349-1368

Author(s):

Chung-Chian Hsu ◽

Wei-Cyun Tsao ◽

Arthur Chang ◽

Chuan-Yu Chang

Keyword(s):

Real World ◽

Mixed Type ◽

Transformation Method ◽

The Other ◽

Data Analyses ◽

Categorical Attributes ◽

Real World Datasets ◽

Type Data ◽

Numeric Representation ◽

Degree Of Similarity

Most of real-world datasets are of mixed type including both numeric and categorical attributes. Unlike numbers, operations on categorical values are limited, and the degree of similarity between distinct values cannot be measured directly. In order to properly analyze mixed-type data, dedicated methods to handle categorical values in the datasets are needed. The limitation of most existing methods is lack of appropriate numeric representations of categorical values. Consequently, some of analysis algorithms cannot be applied. In this paper, we address this deficiency by transforming categorical values to their numeric representation so as to facilitate various analyses of mixed-type data. In particular, the proposed transformation method preserves semantics of categorical values with respect to the other values in the dataset, resulting in better performance on data analyses including classification and clustering. The proposed method is verified and compared with other methods on extensive real-world datasets.

Download Full-text

Editorial

Intelligent Data Analysis ◽

10.3233/ida-210004 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1345-1347

Author(s):

A. Famili

Download Full-text

Multidimensional indexing technique for medical images retrieval

Intelligent Data Analysis ◽

10.3233/ida-205495 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1629-1666

Author(s):

Ali Asghar Safaei ◽

Saeede Habibi-Asl

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Medical Image ◽

Retrieval System ◽

Medical Images ◽

Information Retrieval System ◽

Memory Usage ◽

Multidimensional Index ◽

Multidimensional Indexing ◽

Indexing Technique

Retrieving required medical images from a huge amount of images is one of the most widely used features in medical information systems, including medical imaging search engines. For example, diagnostic decision making has traditionally been accompanied by patient data (image or non-image) and previous medical experiences from similar cases. Indexing as part of search engines (or retrieval system), increases the speed of a search. The goal of this study, is to provide an effective and efficient indexing technique for medical images search engines. In this paper, in order to archive this goal, a multidimensional indexing technique for medical images is designed using the normalization technique that is used to reduce redundancy in relational database design. Data structure of the proposed multidimensional index and also different required operations are designed to create and handle such a multidimensional index. Time complexity of each operation is analyzed and also average memory space required to store any medical image (along with its related metadata) is calculated as the space complexity analysis of the proposed indexing technique. The results show that the proposed indexing technique has a good performance in terms of memory usage, as well as execution time for the usual operations. Moreover, and may be more important, the proposed indexing techniques improves the precision and recall of the information retrieval system (i.e., search engine) which uses this technique for indexing medical images. Besides, a user of such search engine can retrieve medical images which s/he has specified its attributes is some different aspects (dimensions), e.g., tissue, image modality and format, sickness and trauma, etc. So, the proposed multidimensional indexing techniques can improve effectiveness of a medical image information retrieval system (in terms of precision and recall), while having a proper efficiency (in terms of execution time and memory usage), and can improve the information retrieval process for healthcare search engines.

Download Full-text

Mutual information-based multi-output tree learning algorithm

Intelligent Data Analysis ◽

10.3233/ida-205367 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1525-1545

Author(s):

Hyun-Seok Kang ◽

Chi-Hyuck Jun

Keyword(s):

Mutual Information ◽

Variable Selection ◽

Time Complexity ◽

Learning Algorithm ◽

Regression Tree ◽

Classification And Regression Tree ◽

Tree Model ◽

Industrial Systems ◽

Output Dimension ◽

Cart Algorithm

A tree model with low time complexity can support the application of artificial intelligence to industrial systems. Variable selection based tree learning algorithms are more time efficient than existing Classification and Regression Tree (CART) algorithms. To our best knowledge, there is no attempt to deal with categorical input variable in variable selection based multi-output tree learning. Also, in the case of multi-output regression tree, a conventional variable selection based algorithm is not suitable to large datasets. We propose a mutual information-based multi-output tree learning algorithm that consists of variable selection and split optimization. The proposed method discretizes each variable based on k-means into 2–4 clusters and selects the variable for splitting based on the discretized variables using mutual information. This variable selection component has relatively low time complexity and can be applied regardless of output dimension and types. The proposed split optimization component is more efficient than an exhaustive search. The performance of the proposed tree learning algorithm is similar to or better than that of a multi-output version of CART algorithm on a specific dataset. In addition, with a large dataset, the time complexity of the proposed algorithm is significantly reduced compared to a CART algorithm.

Download Full-text

Differential evolution algorithm-based multiple-factor optimization methods for data assimilation

Intelligent Data Analysis ◽

10.3233/ida-205471 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1473-1486

Author(s):

Yulong Bai ◽

Di Wang ◽

Yizhao Wang ◽

Mingheng Chang

Keyword(s):

Data Assimilation ◽

Differential Evolution ◽

Land Surface ◽

Differential Evolution Algorithm ◽

Forecast Accuracy ◽

Optimization Methods ◽

De Algorithm ◽

Filter Divergence ◽

Lorenz 96 ◽

Multiple Factor

The methods of searching for optimized parameters have substantial effects on the forecast accuracy of ensemble data assimilation systems. The selection of these factors is usually performed using trial-and-error methods, and poor parameterizations may lead to filter divergence. Combined with the local ensemble transform Kalman filtering method (LETKF), a technique for an automated search of the best configuration (parameters) of a data assimilation system is proposed. To obtain better assimilation, a differential evolution (DE) algorithm-based multiple-factor parameterization method results in the corresponding circumstances. By combining with fast-searching DE algorithms, we may retrieve the most ideal parameter combinations. Several numerical experiments performed with the Lorenz-96 model show that new methods performed better than the original one-parameter optimization methods. As the basis of DE methods, the best combinations of the local radius and the covariance inflation parameter, which can guarantee the best DA performances in the corresponding circumstances, are retrieved. It is found that the new method is capable of outperforming previous search algorithms under both perfect and imperfect model scenarios, and the calculation cost in Lorenz-96 model is lower. However, how to apply the new proposed method to more complex atmospheric or land surface models requires further verification.

Download Full-text

Intelligent Data Analysis
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Ios Press

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

An improved YOLOv3 model for detecting location information of ovarian cancer from CT images

MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data

Attention mechanism based LSTM in classification of stressed speech under workload

Forecasting emergency department admissions

Analyzing mixed-type data by using word embedding for handling categorical features

Editorial

Multidimensional indexing technique for medical images retrieval

Mutual information-based multi-output tree learning algorithm

Differential evolution algorithm-based multiple-factor optimization methods for data assimilation

Export Citation Format

Intelligent Data AnalysisLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Ios Press

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

An improved YOLOv3 model for detecting location information of ovarian cancer from CT images

MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data

Attention mechanism based LSTM in classification of stressed speech under workload

Forecasting emergency department admissions

Analyzing mixed-type data by using word embedding for handling categorical features

Editorial

Multidimensional indexing technique for medical images retrieval

Mutual information-based multi-output tree learning algorithm

Differential evolution algorithm-based multiple-factor optimization methods for data assimilation

Intelligent Data Analysis
Latest Publications