scholarly journals Evaluation of TF-IDF Algorithm Weighting Scheme in The Qur'an Translation Clustering with K-Means Algorithm

2021 ◽  
Vol 6 (2) ◽  
pp. 117-129
Author(s):  
M Didik R Wahyudi

The Al-Quran translation index issued by the Ministry of Religion can be used in text mining to search for similar patterns of Al-Quran translation. This study performs sentence grouping using the K-Means Clustering algorithm and three weighting scheme models of the TF-IDF algorithm to get the best performance of the Tf-IDF algorithm. From the three models of the TF-IDF algorithm weighting scheme, the highest percentage results were obtained in the traditional TF-IDF weighting scheme, namely 62.16% with an average percentage of 36.12% and a standard deviation of 12.77%. The smallest results are shown in the TF-IDF 1 normalization weighting scheme, namely 48.65% with an average percentage of 25.65% and a standard deviation of 10.16%. The smallest standard deviation results in a normalized 2 TF-IDF weighting of 8.27% with an average percentage of 28.15% and the largest percentage weighting of 48.65% which is the same as the normalized TF-IDF 1 weighting.

Algorithms ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 197
Author(s):  
Ali Seman ◽  
Azizian Mohd Sapawi

In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools.


Faktor Exacta ◽  
2020 ◽  
Vol 13 (2) ◽  
pp. 73
Author(s):  
Nurfidah Dwitiyanti ◽  
Septian Wulandari ◽  
Noni Selvia

<p>The population of Indonesia from year to year has increased. The increase in population must also be accompanied by increased economic growth in Indonesia. The increase in economic growth in Indonesia is marked by the reduction in the number of poor people in Indonesia. In addition, the increase in economic growth is reflected in the equitable distribution of public income in the country. Even though there are still many Indonesian people who are not yet prosperous in economic terms. To overcome, it is necessary to have clustering and characteristics of 34 provinces in Indonesia by implementing the Modification Maximum Standard Deviation Reduction (MMSDR) graph clustering algorithm. The data used are indicators of public welfare in 2017 obtained from the Central Statistics Agency. There are 9 indicators of community welfare used in this research. There are four stages in the MMSDR algorithm namely the "MST", "Subdivide", "Biggest Stepping" and "Create Clusters" processes. The results of this study can be seen from the distance between the nodes or between one province and another province produced 22 clusters. From the cluster results obtained using the MMSDR algorithm on welfare data, there are many clusters formed with cluster members formed at most two nodes (province).</p><p> Keywords: MMSDR, Clustering, Welfare of People</p>


Author(s):  
Joaquín Pérez Ortega ◽  
Nelva Nely Almanza Ortega ◽  
Andrea Vega Villalobos ◽  
Marco A. Aguirre L. ◽  
Crispín Zavala Díaz ◽  
...  

In recent years, the amount of texts in natural language, in digital format, has had an impressive increase. To obtain useful information from a large volume of data, new specialized techniques and efficient algorithms are required. Text mining consists of extracting meaningful patterns from texts; one of the basic approaches is clustering. The most used clustering algorithm is k-means. This chapter proposes an improvement of the k-means algorithm in the convergence step; the process stops whenever the number of objects that change their assigned cluster in the current iteration is bigger than the ones that changed in the previous iteration. Experimental results showed a reduction in execution time up to 93%. It is remarkable that, in general, better results are obtained when the volume of the text increase, particularly in those texts within big data environments.


2014 ◽  
Vol 2014 ◽  
pp. 1-4
Author(s):  
T. Muhammad ◽  
A. Uzairu ◽  
M. S. Sallau ◽  
M. O. A. Oladipo

The Nigerian Research Reactor-1 was employed in the analysis of iodine in local food samples at an operating flux of 5.0×1011 ncm−2 s−1. Preconcentration neutron activation analysis (PCNAA) was compared against the most common spectroscopic (Sandell-Kolthoff reaction) technique, giving a concentration range of 0.295 to 2.960 mg/Kg and 0.264 to 2.725 mg/Kg, respectively, with an average percentage deviation of 11.34% and a positive correlation between the methods at 0.89. PCNAA and Sandell-Kolthoff spectroscopy of NIST 1548a reported values of 0.759±0.06 mg/Kg and 0.751±0.05 with Student’s t-test score of 1 and 0.95 and percentage standard deviation of 0 and 1.12%, respectively.


2016 ◽  
Vol 13 (3) ◽  
pp. 244-270 ◽  
Author(s):  
Neeraj Bhanot ◽  
P. Venkateswara Rao ◽  
S.G. Deshmukh

Purpose Integrating sustainability strategies with business processes is the most challenging task for industry professionals due to the lack of a proper understanding of sustainability concepts. At the same time, a lack of proper guidance restricts them from pursuing such activities. As far as the aspects of implementation are concerned, it is very tough to analyse and pick up key points to start with. The purpose of this paper is to utilize a text mining approach to analyse qualitative data and identify the critical issues for implementing sustainability in the manufacturing sector by focussing on turning processes based on the survey responses of researchers and industry professionals. Design/methodology/approach An integrated method employing principal component analysis (PCA) and the k-means clustering algorithm has been applied to extract useful information from a set of various suggestions provided by both the groups surveyed. The textual data has also been visualized using word clouds and, thus, it has been compared with the results of the text mining approach. Findings The results of the study indicate the importance of the role of government organizations and the need for a skilled workforce, which are crucial for enhancing aspects of sustainability in the manufacturing sector, as supported by both researchers and industry professionals. Besides this, researchers have highlighted the need to focus more on environmentally related issues, whereas industry professionals have raised performance-related issues. Practical implications The findings of the study present the important concerns of both the groups towards sustainability initiatives and, thus, will help to enhance the understanding of the underlying possibilities of negotiating jointly to enhance the performance of machining processes. Originality/value The novelty of this paper lies in its identification of important initiatives that are having a direct impact on the sustainable aspects of the machining process, based on the views of researchers and industry professionals.


2020 ◽  
Vol 12 (3) ◽  
Author(s):  
Salma Fauziah ◽  
Moch Arif Bijaksana

Research in the field of text mining to process entries or words from the Qur'an is very beneficial for Muslims. This study aims to establish a set of synonyms for the thesaurus in the words of the Qur'an. This research is used because the source of knowledge about the science of the Qur'an is still lacking. The dataset in this study uses the Corpus Qur'an and English Translation. This research is a research development of an article that has been published, namely "The Development of Al-Qur'an Vocabulary Set Synonyms with WordNet Approach" by Laras Gupitasari. Input from this research system uses nouns from the translation of English words in the Quran. The output of the system produces several groups that have the same level of closeness of meaning displayed, the first group means the word in the group has a close meaning. To produce output, this study uses word grouping with a hierarchical grouping method and calculates distances using common paths, then groups results according to the closeness of meaning from word entries. The evaluation in this study produced an F-Measure value of 76%, F-Measure Value is an evaluation to measure the accuracy of predictions issued by the system.


2015 ◽  
Vol 127 (17) ◽  
pp. 20-24
Author(s):  
Aishwarya Kappala ◽  
Sudhakar Godi

Technologies ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 25 ◽  
Author(s):  
Zahira Perez-Rivera ◽  
Esteban Tlelo-Cuautle ◽  
Victor Champac

The impact of process variations on circuit performance has become more critical with the technological scaling, and the increasing level of integration of integrated circuits. The degradation of the performance of the circuit means economic losses. In this paper, we propose an efficient statistical gate-sizing methodology for improving circuit speed in the presence of independent intra-die process variations. A path selection method, a heuristic, two coarse selection metrics, and one fine selection metric are part of the new proposed methodology. The fine metric includes essential concepts like the derivative of the standard deviation of delay, a path segment analysis, the criticality, the slack-time, and area. The proposed new methodology is applied to ISCAS Benchmark circuits. The average percentage of optimization in the delay is 12%, the average percentage of optimization in the delay standard deviation is 27.8%, the average percentage in the area increase is less than 5%, and computing time is up to ten times less than using analytical methods like Lagrange Multipliers.


Sign in / Sign up

Export Citation Format

Share Document