Big Data Summarization Using Novel Clustering Algorithm and Semantic Feature Approach

This paper proposes a big data (i.e., documents, texts) summarization method using proposed clustering and semantic features. This paper proposes a novel clustering algorithm which is used for big data summarization. The proposed system works in four phases and provides a modular implementation of multiple documents summarization. The experimental results using Iris dataset show that the proposed clustering algorithm performs better than K-means and K-medodis algorithm. The performance of big data (i.e., documents, texts) summarization is evaluated using Australian legal cases from the Federal Court of Australia (FCA) database. The experimental results demonstrate that the proposed method can summarize big data document superior as compared with existing systems.

Download Full-text

Big Data Summarization Using Modified Fuzzy Clustering Algorithm, Semantic Feature, and Data Compression Approach

Applied Machine Learning for Smart Data Analysis ◽

10.1201/9780429440953-6 ◽

2019 ◽

pp. 117-134

Author(s):

Shilpa G. Kolte ◽

Jagdish W. Bakal

Keyword(s):

Big Data ◽

Data Compression ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Semantic Feature ◽

Data Summarization ◽

Fuzzy Clustering Algorithm

Download Full-text

Improving the K-Means Clustering Algorithm Oriented to Big Data Environments

Handbook of Research on Natural Language Processing and Smart Service Systems - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4730-4.ch013 ◽

2021 ◽

pp. 289-308

Author(s):

Joaquín Pérez Ortega ◽

Nelva Nely Almanza Ortega ◽

Andrea Vega Villalobos ◽

Marco A. Aguirre L. ◽

Crispín Zavala Díaz ◽

...

Keyword(s):

Big Data ◽

Text Mining ◽

Large Volume ◽

Execution Time ◽

Clustering Algorithm ◽

Efficient Algorithms ◽

Experimental Results ◽

Digital Format ◽

Basic Approaches ◽

Previous Iteration

In recent years, the amount of texts in natural language, in digital format, has had an impressive increase. To obtain useful information from a large volume of data, new specialized techniques and efficient algorithms are required. Text mining consists of extracting meaningful patterns from texts; one of the basic approaches is clustering. The most used clustering algorithm is k-means. This chapter proposes an improvement of the k-means algorithm in the convergence step; the process stops whenever the number of objects that change their assigned cluster in the current iteration is bigger than the ones that changed in the previous iteration. Experimental results showed a reduction in execution time up to 93%. It is remarkable that, in general, better results are obtained when the volume of the text increase, particularly in those texts within big data environments.

Download Full-text

Short Text Clustering Algorithm with Feature Keyword Expansion

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1716 ◽

2012 ◽

Vol 532-533 ◽

pp. 1716-1720 ◽

Cited By ~ 3

Author(s):

Chun Xia Jin ◽

Hai Yan Zhou ◽

Qiu Chan Bai

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Experimental Results ◽

Semantic Features ◽

Short Text ◽

Clustering Quality ◽

Short Text Clustering

To solve the problem of sparse keywords and similarity drift in short text segments, this paper proposes short text clustering algorithm with feature keyword expansion (STCAFKE). The method can realize short text clustering by expanding feature keyword based on HowNet and combining K-means algorithm and density algorithm. It may add the number of text keyword with feature keyword expansion and increase text semantic features to realize short text clustering. Experimental results show that this algorithm has increased the short text clustering quality on precision and recall.

Download Full-text

The Analysis and Implementation of the K - Means Algorithm Based on Hadoop Platform

Computer and Information Science ◽

10.5539/cis.v11n1p98 ◽

2018 ◽

Vol 11 (1) ◽

pp. 98

Author(s):

Liu Xiang Wei

Keyword(s):

Big Data ◽

Data Storage ◽

Clustering Algorithm ◽

Experimental Results ◽

Mode Of Operation ◽

Cluster Configuration ◽

Hadoop Platform ◽

Kmeans Algorithm

In today's society has entered the era of big data, data of the diversity and the amount of data increases to the data storage and processing brought great challenges, Hadoop HDFS and MapReduce better solves the these two problems. Classical K-means algorithm is the most widely used one based on the partition of the clustering algorithm. At the completion of the cluster configuration based on, the k-means algorithm in cluster mode of operation principle and in the cluster mode realized kmeans algorithm, and the experimental results are research and analysis, summarized the k-means algorithm is run on the Hadoop platform's strengths and limitations.

Download Full-text

An Improved Spectral Clustering Algorithm Using Minimum Maximum Principle

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.182-183.1881 ◽

2012 ◽

Vol 182-183 ◽

pp. 1881-1884

Author(s):

Xiu Fang Xu ◽

Sen Xu ◽

Tian Zhou

Keyword(s):

Maximum Principle ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Document Clustering ◽

Experimental Results ◽

Eigenvalue Decomposition ◽

Spectral Algorithm ◽

Low Dimensional ◽

Spectral Clustering Algorithm ◽

Better Than

In this paper a novel document clustering spectral algorithm is proposed, which uses a minimum maximum principle. Firstly the low dimensional embedding of documents is attained by eigenvalue decomposition, and then a minimum maximum principle is used to get the initial seeds for k-means algorithm. Finally, K-means algorithm is performed to get the clustering results. Experimental results show that the clustering results found by this method is better than traditional clustering algorithm.

Download Full-text

Better Effectiveness of Multi-Integrated Neural Networks: Take Stock Big Data as an Example

Wireless Communications and Mobile Computing ◽

10.1155/2021/3938409 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

HangLin Lu ◽

XiuYun Peng

Keyword(s):

Neural Network ◽

Big Data ◽

Deep Learning ◽

High Latitude ◽

Integrated Model ◽

Experimental Results ◽

Single Model ◽

Integration Model ◽

Model Based ◽

Better Than

With the development of big data, in the financial market, the stock price prediction has many research directions from the perspective of big data. The classical time series prediction model cannot adapt to the high-latitude information of stock data in the era of big data. The development of deep learning provides a new idea for high-latitude stock data prediction. Four neural network models and three integrated learning models form different strategy sets, and the opening price of the next timestamp is predicted by backtracking information over the past 15 days with the characteristics of 12 indexes of the stock. The experimental results show that the prediction effect of the integration model based on the average weight policy and stacking policy is better than that of the single neural network, and the integration model based on stacking policy is expected to have the highest prediction accuracy and the minimum expected error. The accuracy was 80.2%, and the mean square error was 0.024. Compared with the single model, the accuracy is increased by 2%~7%, and the error is reduced by 0.01~0.03. The innovation of this article lies in the traditional machine learning thinking is applied to deep learning, as an individual with a variety of neural network to study, through the integration of learning strategies, fusion for the integration model, the experimental results show that the effect of the integrated model is better than that of a single model, to improve the robustness and accuracy of the model; the performance of the integrated model is more stable. For the utilization of big data resources, the integrated model of neural network has better prediction effect.

Download Full-text

Exploring performance of clustering methods on document sentiment analysis

Journal of Information Science ◽

10.1177/0165551515617374 ◽

2016 ◽

Vol 43 (1) ◽

pp. 54-74 ◽

Cited By ~ 14

Author(s):

Baojun Ma ◽

Hua Yuan ◽

Ye Wu

Keyword(s):

Sentiment Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Studies ◽

Experimental Results ◽

Clustering Methods ◽

Term Weighting ◽

Weighting Method ◽

Clustering Techniques ◽

Better Than

Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.

Download Full-text

N-Net: 3D Fully Convolution Network-Based Vertebrae Segmentation from CT Spinal Images

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419570039 ◽

2019 ◽

Vol 33 (06) ◽

pp. 1957003 ◽

Cited By ~ 1

Author(s):

Wenhui Zhou ◽

Lili Lin ◽

Guangtao Ge

Keyword(s):

Surgical Planning ◽

Global Structure ◽

Experimental Results ◽

Semantic Features ◽

Structure Information ◽

Residual Structure ◽

Vertebrae Segmentation ◽

High Level ◽

Operative Assessment ◽

Better Than

Accurate vertebrae segmentation from CT spinal images is crucial for the clinical tasks of diagnosis, surgical planning, and post-operative assessment. This paper describes an [Formula: see text]-shaped 3D fully convolution network (FCN) for vertebrae segmentation: [Formula: see text]-net. In this network, a global structure guidance pathway is designed for fusing the high-level semantic features with the global structure information. Moreover, the residual structure and the skip connection are introduced into traditional 3D FCN framework. These schemes can significantly improve the accuracy of vertebrae segmentation. Experimental results demonstrate the effectiveness and robustness of our method. A high average DICE score of 0.9499 [Formula: see text] 0.02 can be obtained, which is better than those of existing methods.

Download Full-text

A Network Decomposition-Based Text Clustering Algorithm for Topic Detection

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.239-240.1318 ◽

2012 ◽

Vol 239-240 ◽

pp. 1318-1323

Author(s):

Zu Qiang Meng ◽

Shi Mo Shen ◽

Qiu Lian Chen

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Experimental Results ◽

Original Text ◽

Topic Detection ◽

Network Decomposition ◽

Detection Techniques ◽

Text Document ◽

Average Cluster ◽

Better Than

Text clustering is one of the most popular topic detection techniques. However, the existing text clustering approaches require that each document has to be partitioned to one and only one cluster. This is not reasonable in some cases for there exist some documents which should not used to constitute topics. This paper firstly models a text document set as a network and designs a method for decomposing such a network, and then proposes a truly original text clustering algorithm for topic detection, called a network decomposition-based text clustering algorithm for topic detection (NDTCATD). The proposed algorithm ensures that meaningless documents can not be used to constitute topics. Experimental results show that NDTCATD is much better than bisecting k-means algorithm in terms of overall similarity and average cluster similarity. Therefore the proposed algorithm is reasonable and effective and is especially suitable for topic detection.

Download Full-text

An Efficient Trajectory Clustering Framework Based Relative Distance

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.3209 ◽

2012 ◽

Vol 241-244 ◽

pp. 3209-3212

Author(s):

Guan Bo ◽

Liang Xu Liu ◽

Jian Bo Fan ◽

Jin Yang Chen

Keyword(s):

Hausdorff Distance ◽

Clustering Algorithm ◽

Experimental Results ◽

Relative Distance ◽

Mobile Object ◽

Trajectory Clustering ◽

On Line ◽

Application Servers ◽

Better Than

along with more and more trajectory dataset being collected into application servers, the research in trajectory clustering has become increasingly important topic. This paper proposes a new mobile object trajectory Clustering algorithm (Trajectory Clustering based Improved Minimum Hausdorff Distance under Translation, TraClustMHD). In this framework, improved Minimum Hausdorff Distance under Translation is presented to measure the similarity between sub-segments. In additional, R-Tree is employed to improve the efficiency. The experimental results showed that this algorithm better than based on Hausdorff distance and based on line Hausdorff distance has good trajectory clustering performance.

Download Full-text