Evaluation of TF-IDF Algorithm Weighting Scheme in The Qur'an Translation Clustering with K-Means Algorithm

The Al-Quran translation index issued by the Ministry of Religion can be used in text mining to search for similar patterns of Al-Quran translation. This study performs sentence grouping using the K-Means Clustering algorithm and three weighting scheme models of the TF-IDF algorithm to get the best performance of the Tf-IDF algorithm. From the three models of the TF-IDF algorithm weighting scheme, the highest percentage results were obtained in the traditional TF-IDF weighting scheme, namely 62.16% with an average percentage of 36.12% and a standard deviation of 12.77%. The smallest results are shown in the TF-IDF 1 normalization weighting scheme, namely 48.65% with an average percentage of 25.65% and a standard deviation of 10.16%. The smallest standard deviation results in a normalized 2 TF-IDF weighting of 8.27% with an average percentage of 28.15% and the largest percentage weighting of 48.65% which is the same as the normalized TF-IDF 1 weighting.

Download Full-text

An Optimal and Stable Algorithm for Clustering Numerical Data

Algorithms ◽

10.3390/a14070197 ◽

2021 ◽

Vol 14 (7) ◽

pp. 197

Author(s):

Ali Seman ◽

Azizian Mohd Sapawi

Keyword(s):

Standard Deviation ◽

Real World ◽

Clustering Algorithm ◽

Numerical Data ◽

Zero Point ◽

The Other ◽

Suitable Alternative ◽

Stable Algorithm ◽

Real World Applications

In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools.

Download Full-text

Implementasi Graph Clustering Algorithm Modification Maximum Standard Deviation Reduction (MMSDR) dalam Clustering Provinsi di Indonesia Menurut Indikator Kesejahteraan Rakyat

Faktor Exacta ◽

10.30998/faktorexacta.v13i2.5863 ◽

2020 ◽

Vol 13 (2) ◽

pp. 73

Author(s):

Nurfidah Dwitiyanti ◽

Septian Wulandari ◽

Noni Selvia

Keyword(s):

Economic Growth ◽

Standard Deviation ◽

Clustering Algorithm ◽

Graph Clustering ◽

Public Welfare ◽

Poor People ◽

Equitable Distribution ◽

Maximum Standard Deviation ◽

Central Statistics

<p>The population of Indonesia from year to year has increased. The increase in population must also be accompanied by increased economic growth in Indonesia. The increase in economic growth in Indonesia is marked by the reduction in the number of poor people in Indonesia. In addition, the increase in economic growth is reflected in the equitable distribution of public income in the country. Even though there are still many Indonesian people who are not yet prosperous in economic terms. To overcome, it is necessary to have clustering and characteristics of 34 provinces in Indonesia by implementing the Modification Maximum Standard Deviation Reduction (MMSDR) graph clustering algorithm. The data used are indicators of public welfare in 2017 obtained from the Central Statistics Agency. There are 9 indicators of community welfare used in this research. There are four stages in the MMSDR algorithm namely the "MST", "Subdivide", "Biggest Stepping" and "Create Clusters" processes. The results of this study can be seen from the distance between the nodes or between one province and another province produced 22 clusters. From the cluster results obtained using the MMSDR algorithm on welfare data, there are many clusters formed with cluster members formed at most two nodes (province).</p><p> Keywords: MMSDR, Clustering, Welfare of People</p>

Download Full-text

A clustering algorithm for asymmetrically related data with applications to text mining

Proceedings of the tenth international conference on Information and knowledge management - CIKM'01 ◽

10.1145/502585.502694 ◽

2001 ◽

Cited By ~ 6

Author(s):

K. Krishna ◽

Raghu Krishnapuram

Keyword(s):

Text Mining ◽

Clustering Algorithm ◽

Related Data

Download Full-text

Performance Evaluation of New Text Mining Method Based on GA and K-Means Clustering Algorithm

Advanced Computing and Communication Technologies - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-10-4603-2_3 ◽

2017 ◽

pp. 23-30 ◽

Cited By ~ 2

Author(s):

Neha Garg ◽

R. K. Gupta

Keyword(s):

Performance Evaluation ◽

Text Mining ◽

Clustering Algorithm ◽

Mining Method

Download Full-text

Improving the K-Means Clustering Algorithm Oriented to Big Data Environments

Handbook of Research on Natural Language Processing and Smart Service Systems - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4730-4.ch013 ◽

2021 ◽

pp. 289-308

Author(s):

Joaquín Pérez Ortega ◽

Nelva Nely Almanza Ortega ◽

Andrea Vega Villalobos ◽

Marco A. Aguirre L. ◽

Crispín Zavala Díaz ◽

...

Keyword(s):

Big Data ◽

Text Mining ◽

Large Volume ◽

Execution Time ◽

Clustering Algorithm ◽

Efficient Algorithms ◽

Experimental Results ◽

Digital Format ◽

Basic Approaches ◽

Previous Iteration

In recent years, the amount of texts in natural language, in digital format, has had an impressive increase. To obtain useful information from a large volume of data, new specialized techniques and efficient algorithms are required. Text mining consists of extracting meaningful patterns from texts; one of the basic approaches is clustering. The most used clustering algorithm is k-means. This chapter proposes an improvement of the k-means algorithm in the convergence step; the process stops whenever the number of objects that change their assigned cluster in the current iteration is bigger than the ones that changed in the previous iteration. Experimental results showed a reduction in execution time up to 93%. It is remarkable that, in general, better results are obtained when the volume of the text increase, particularly in those texts within big data environments.

Download Full-text

Determination of the Iodine Content of Some Commonly Consumed Foods in Zaria Metropolis, Nigeria, Using PCNAA and Sandell-Kolthoff Reaction

Journal of Nuclear Chemistry ◽

10.1155/2014/780640 ◽

2014 ◽

Vol 2014 ◽

pp. 1-4

Author(s):

T. Muhammad ◽

A. Uzairu ◽

M. S. Sallau ◽

M. O. A. Oladipo

Keyword(s):

Neutron Activation Analysis ◽

Standard Deviation ◽

Test Score ◽

Research Reactor ◽

Iodine Content ◽

Food Samples ◽

Average Percentage ◽

Average Percentage Deviation ◽

Percentage Standard Deviation

The Nigerian Research Reactor-1 was employed in the analysis of iodine in local food samples at an operating flux of 5.0×1011 ncm−2 s−1. Preconcentration neutron activation analysis (PCNAA) was compared against the most common spectroscopic (Sandell-Kolthoff reaction) technique, giving a concentration range of 0.295 to 2.960 mg/Kg and 0.264 to 2.725 mg/Kg, respectively, with an average percentage deviation of 11.34% and a positive correlation between the methods at 0.89. PCNAA and Sandell-Kolthoff spectroscopy of NIST 1548a reported values of 0.759±0.06 mg/Kg and 0.751±0.05 with Student’s t-test score of 1 and 0.95 and percentage standard deviation of 0 and 1.12%, respectively.

Download Full-text

Identifying the perspectives for sustainability enhancement

Journal of Advances in Management Research ◽

10.1108/jamr-02-2016-0012 ◽

2016 ◽

Vol 13 (3) ◽

pp. 244-270 ◽

Cited By ~ 5

Author(s):

Neeraj Bhanot ◽

P. Venkateswara Rao ◽

S.G. Deshmukh

Keyword(s):

Text Mining ◽

Clustering Algorithm ◽

Business Processes ◽

Manufacturing Sector ◽

Principal Component ◽

Machining Process ◽

Machining Processes ◽

Content Type ◽

Word Clouds ◽

Industry Professionals

Purpose Integrating sustainability strategies with business processes is the most challenging task for industry professionals due to the lack of a proper understanding of sustainability concepts. At the same time, a lack of proper guidance restricts them from pursuing such activities. As far as the aspects of implementation are concerned, it is very tough to analyse and pick up key points to start with. The purpose of this paper is to utilize a text mining approach to analyse qualitative data and identify the critical issues for implementing sustainability in the manufacturing sector by focussing on turning processes based on the survey responses of researchers and industry professionals. Design/methodology/approach An integrated method employing principal component analysis (PCA) and the k-means clustering algorithm has been applied to extract useful information from a set of various suggestions provided by both the groups surveyed. The textual data has also been visualized using word clouds and, thus, it has been compared with the results of the text mining approach. Findings The results of the study indicate the importance of the role of government organizations and the need for a skilled workforce, which are crucial for enhancing aspects of sustainability in the manufacturing sector, as supported by both researchers and industry professionals. Besides this, researchers have highlighted the need to focus more on environmentally related issues, whereas industry professionals have raised performance-related issues. Practical implications The findings of the study present the important concerns of both the groups towards sustainability initiatives and, thus, will help to enhance the understanding of the underlying possibilities of negotiating jointly to enhance the performance of machining processes. Originality/value The novelty of this paper lies in its identification of important initiatives that are having a direct impact on the sustainable aspects of the machining process, based on the views of researchers and industry professionals.

Download Full-text

Development Grouping of Synonym Set Thesaurus Vocabulary The Qur’an in English Using Hierarchical Clustering Algorithm

JURNAL INFOTEL ◽

10.20895/infotel.v12i3.477 ◽

2020 ◽

Vol 12 (3) ◽

Author(s):

Salma Fauziah ◽

Moch Arif Bijaksana

Keyword(s):

Text Mining ◽

Hierarchical Clustering ◽

English Translation ◽

Clustering Algorithm ◽

Research Development ◽

Research System ◽

Hierarchical Grouping ◽

Grouping Method ◽

Hierarchical Clustering Algorithm ◽

F Measure

Research in the field of text mining to process entries or words from the Qur'an is very beneficial for Muslims. This study aims to establish a set of synonyms for the thesaurus in the words of the Qur'an. This research is used because the source of knowledge about the science of the Qur'an is still lacking. The dataset in this study uses the Corpus Qur'an and English Translation. This research is a research development of an article that has been published, namely "The Development of Al-Qur'an Vocabulary Set Synonyms with WordNet Approach" by Laras Gupitasari. Input from this research system uses nouns from the translation of English words in the Quran. The output of the system produces several groups that have the same level of closeness of meaning displayed, the first group means the word in the group has a close meaning. To produce output, this study uses word grouping with a hierarchical grouping method and calculates distances using common paths, then groups results according to the closeness of meaning from word entries. The evaluation in this study produced an F-Measure value of 76%, F-Measure Value is an evaluation to measure the accuracy of predictions issued by the system.

Download Full-text

A Naive Clustering Algorithm for Text Mining

International Journal of Computer Applications ◽

10.5120/ijca2015906717 ◽

2015 ◽

Vol 127 (17) ◽

pp. 20-24

Author(s):

Aishwarya Kappala ◽

Sudhakar Godi

Keyword(s):

Text Mining ◽

Clustering Algorithm

Download Full-text

Gate Sizing Methodology with a Novel Accurate Metric to Improve Circuit Timing Performance under Process Variations

Technologies ◽

10.3390/technologies8020025 ◽

2020 ◽

Vol 8 (2) ◽

pp. 25 ◽

Cited By ~ 1

Author(s):

Zahira Perez-Rivera ◽

Esteban Tlelo-Cuautle ◽

Victor Champac

Keyword(s):

Standard Deviation ◽

Integrated Circuits ◽

Computing Time ◽

Process Variations ◽

Path Selection ◽

Economic Losses ◽

Gate Sizing ◽

Circuit Performance ◽

Average Percentage ◽

The Impact

The impact of process variations on circuit performance has become more critical with the technological scaling, and the increasing level of integration of integrated circuits. The degradation of the performance of the circuit means economic losses. In this paper, we propose an efficient statistical gate-sizing methodology for improving circuit speed in the presence of independent intra-die process variations. A path selection method, a heuristic, two coarse selection metrics, and one fine selection metric are part of the new proposed methodology. The fine metric includes essential concepts like the derivative of the standard deviation of delay, a path segment analysis, the criticality, the slack-time, and area. The proposed new methodology is applied to ISCAS Benchmark circuits. The average percentage of optimization in the delay is 12%, the average percentage of optimization in the delay standard deviation is 27.8%, the average percentage in the area increase is less than 5%, and computing time is up to ten times less than using analytical methods like Lagrange Multipliers.

Download Full-text