Some Efficient and Fast Approaches to Document Clustering

Author(s):  
P. Viswanth

Clustering is a process of finding natural grouping present in a dataset. Various clustering methods are proposed to work with various types of data. The quality of the solution as well as the time taken to derive the solution is important when dealing with large datasets like that in a typical documents database. Recently hybrid and ensemble based clustering methods are shown to yield better results than conventional methods. The chapter proposes two clustering methods; one is based on a hybrid scheme and the other based on an ensemble scheme. Both of these are experimentally verified and are shown to yield better and faster results.

2016 ◽  
Vol 43 (2) ◽  
pp. 275-292 ◽  
Author(s):  
Aytug Onan ◽  
Hasan Bulut ◽  
Serdar Korukoglu

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.


2018 ◽  
Vol 29 (1) ◽  
pp. 814-830 ◽  
Author(s):  
Hasan Rashaideh ◽  
Ahmad Sawaie ◽  
Mohammed Azmi Al-Betar ◽  
Laith Mohammad Abualigah ◽  
Mohammed M. Al-laham ◽  
...  

Abstract Text clustering problem (TCP) is a leading process in many key areas such as information retrieval, text mining, and natural language processing. This presents the need for a potent document clustering algorithm that can be used effectively to navigate, summarize, and arrange information to congregate large data sets. This paper encompasses an adaptation of the grey wolf optimizer (GWO) for TCP, referred to as TCP-GWO. The TCP demands a degree of accuracy beyond that which is possible with metaheuristic swarm-based algorithms. The main issue to be addressed is how to split text documents on the basis of GWO into homogeneous clusters that are sufficiently precise and functional. Specifically, TCP-GWO, or referred to as the document clustering algorithm, used the average distance of documents to the cluster centroid (ADDC) as an objective function to repeatedly optimize the distance between the clusters of the documents. The accuracy and efficiency of the proposed TCP-GWO was demonstrated on a sufficiently large number of documents of variable sizes, documents that were randomly selected from a set of six publicly available data sets. Documents of high complexity were also included in the evaluation process to assess the recall detection rate of the document clustering algorithm. The experimental results for a test set of over a part of 1300 documents showed that failure to correctly cluster a document occurred in less than 20% of cases with a recall rate of more than 65% for a highly complex data set. The high F-measure rate and ability to cluster documents in an effective manner are important advances resulting from this research. The proposed TCP-GWO method was compared to the other well-established text clustering methods using randomly selected data sets. Interestingly, TCP-GWO outperforms the comparative methods in terms of precision, recall, and F-measure rates. In a nutshell, the results illustrate that the proposed TCP-GWO is able to excel compared to the other comparative clustering methods in terms of measurement criteria, whereby more than 55% of the documents were correctly clustered with a high level of accuracy.


Author(s):  
K. T. Tokuyasu

During the past investigations of immunoferritin localization of intracellular antigens in ultrathin frozen sections, we found that the degree of negative staining required to delineate u1trastructural details was often too dense for the recognition of ferritin particles. The quality of positive staining of ultrathin frozen sections, on the other hand, has generally been far inferior to that attainable in conventional plastic embedded sections, particularly in the definition of membranes. As we discussed before, a main cause of this difficulty seemed to be the vulnerability of frozen sections to the damaging effects of air-water surface tension at the time of drying of the sections.Indeed, we found that the quality of positive staining is greatly improved when positively stained frozen sections are protected against the effects of surface tension by embedding them in thin layers of mechanically stable materials at the time of drying (unpublished).


CCIT Journal ◽  
2014 ◽  
Vol 7 (3) ◽  
pp. 335-354
Author(s):  
Untung Rahardja ◽  
Muhamad Yusup ◽  
Ana Nurmaliana

The accuracy and reliability is the quality of the information. The more accurate and reliable, the more information it’s good quality. Similarly, a survey, the better the survey, the more accurate the information provided. Implementation of student satisfaction measurement to the process of teaching and learning activities on the quality of the implementation of important lectures in order to get feedback on the assessed variables and for future repair. Likewise in Higher Education Prog has undertaken the process of measuring student satisfaction through a distributed questioner finally disemester each class lecture. However, the deployment process questioner is identified there are 7 (seven) problems. However, the problem can be resolved by the 3 (three) ways of solving problems one of which is a system of iLearning Survey (Isur), that is by providing an online survey to students that can be accessed anywhere and anytime. In the implementation shown a prototype of Isur itself. It can be concluded that the contribution Isur system can maximize the decision taken by the Higher Education Prog. By using this Isur system with questions and evaluation forms are submitted and given to the students and the other colleges. To assess the extent to which the campus has grown and how faculty performance in teaching students class, and can be used as a media Isur valid information for an assessment of activities throughout college.


2017 ◽  
Vol 4 (1) ◽  
pp. 189-215
Author(s):  
Yoiz Shofwa Shafrani

Perkembangan dunia perbankan syariah tidak lepas dari peran para nasabah yang memberikan kepercayaan terhadap pihak perbankan untuk penyimpanan asset keuangannya. Faktanya banyak kelompok nasabah yang memutuskan untuk menjadi nasabah di perbankan syariah karena faktor religiusitasnya. Faktor lain yang dapat ikut mempengaruhi keputusan nasabah adalah kualitas produk. Di mana kualitas produk merupakan karakteristik yang melekat dari suatu produk. Kemungkinan yang terjadi bahwa kebanyakan nasabah pada perbankan syariah juga masih merupakan nasabah perbankan konvensional.Tujuan yang ingin dicapai dalam penelitian ini adalah untuk mengetahui pengaruh kualitas produk dan tingkat religiusitas nasabah terhadap keputusan nasabah untuk menyimpan dananya atau tidak di BSM Cabang Purwokerto. Alat analisis yang digunakan adalah analisis regresi linier berganda, dengan jumlah sampel 100 nasabah. Diperoleh hasil Y = 5,046 + 0,101X1 + 0,218X2. Berdasarkan uji F yang sudah dilakukan maka dapat diketahui bahwa variabel kualitas produk dan religiusitas secara bersama – sama berpengaruh terhadap keputusan nasabah untuk menyimpan dananya di BSM Cabang Purwokerto. Berdasarkan uji t yang sudah dilakukan dapat diketahui bahwa secara partial baik variabel kualitas produk maupun variabel religiusitas berpengaruh terhadap keputusan nasabah untuk menyimpan dananya di BSM Cabang Purwokerto. The progress of the Islamic bank cannot be separated from the role of its customers who give trust to the bank to deposit their financial assets. It is a fact many groups of customers decide to be the customers of the Islamic bank because of their religiosity. The other influences factor of a customer’s decision is the quality of the product. The aim of this research was to determine the effect of product quality and level of customers’ religiosity towards customers’ decision whether to keep their funds in Syariah Mandiri Bank, Branch of Purwokerto, or not. The analytical tool used was multiple linear regression analysis, with a sample of 100 customers. The results indicate Y = 5,046 + 0,101X1 + 0,218X2. Based on F, it can be seen that both variables of product quality and religiosity simultanously affect the customers’ decision to keep theirfunds in BSM Branch of Purwokerto. Based on t test, it can be seen that independently, either variable of product quality or variables of religiosityinfluences the customers’ decision to keep their funds in BSM Branch of Purwokerto.


2020 ◽  
Vol 7 (1) ◽  
pp. 99
Author(s):  
Yong Adilah Shamsul Harumain ◽  
Nur Farhana Azmi ◽  
Suhaini Yusoff

Transit stations are generally well known as nodes of spaces where percentage of people walking are relatively high. The issue is do more planning is actually given to create walkability. Creating walking led transit stations involves planning of walking distance, providing facilities like pathways, toilets, seating and lighting. On the other hand, creating walking led transit station for women uncover a new epitome. Walking becomes one of the most important forms of mobility for women in developing countries nowadays. Encouraging women to use public transportation is not just about another effort to promote the use of public transportation but also another great endeavour to reduce numbers of traffic on the road. This also means, creating an effort to control accidents rate, reducing carbon emission, improving health and eventually, developing the quality of life. Hence, in this paper, we sought first to find out the factors that motivate women to walk at transit stations in Malaysia. A questionnaire survey with 562 female user of Light Railway Transit (LRT) was conducted at LRT stations along Kelana Jaya Line. Both built and non-built environment characteristics, particularly distance, safety and facilities were found as factors that are consistently associated with women walkability. With these findings, the paper highlights the criteria  which are needed to create and make betterment of transit stations not just for women but also for walkability in general.


ARCHALP ◽  
2018 ◽  
pp. 66-75
Author(s):  
Antonio De Rossi ◽  
Roberto Dini

The contemporary architectural production in the Alps of Piedmont has to be studied taking into consideration the contrasting phenomena of depopulation and tourism that have involved the mountain areas of the region during last century. In the fifties and sixties the percentage of abandonment of the high valleys reaches even 80-90%. Entire communities move to industrial urban centers in the cities on the plain. On the other side we witness to a strong polarization of the winter stations that become real “banlieues blanches” for the free time of the citizens and where the architecture of alpine modernism, with various forms, shapes. The paradox nowadays is that the rarefaction of abandoned and depopulated territories is necessary to force to start and choose new innovative paths. We witness a contemporary situation with different shades: on one side the well-established touristic territories that need projects to promote the redevelopment and diversification, on the other side the marginal places where are rising new visions are practices of reactivation of the territory in which architecture is fundamental. The topic of quality of the construction of the physical space intersects with the regeneration of places on a cultural basis, new agriculture and green economy, innovative development of the patrimony, sustainable tourism, with inclusive and participative paths of nature, by giving new meanings to places and building new economies and identities.


2010 ◽  
Vol 6 (1) ◽  
pp. 36
Author(s):  
Silvana Dinaintang Harikedua

The objective of this study was to investigate the effect of ginger extract addition and refrigerate storage on sensory quality of Tuna through panelist’s perception. Panelists (n=30) evaluated samples for overall appearance and flavor attribute using hedonic scale 1–7. The sample which is more acceptable by panelists on flavor attributes having 3% gingers extract and storage for 3 days. The less acceptable sample on flavor attribute having 0% ginger extract and storage for 9 days. On the other hand, the sample which is more acceptable by panelists on overall appearance having 0% ginger extract without storage treatment. The less acceptable sample on overall appearance having 3% ginger extract and storage for 9 days.


Author(s):  
Gustavo Rafael Escobar Delgado ◽  
Anicia Katherine Tarazona Meza ◽  
Andy Einstein García García

The research analyzes the relationship between factors of resilience and academic performance in disabled students studying at the Technical University of Manabí. It is a correlational descriptive study conducted with a population of 88 disabled students, of which two groups were selected, one with high academic performance and the other with low performance. A questionnaire was designed and applied to determine the level of quality of life and risk factors of adolescents. Resilience was measured with the SV-RES scale created for the Latin American population.


2018 ◽  
Vol 8 (2) ◽  
pp. 35-48
Author(s):  
Jiří Rybička ◽  
Petra Čačková

One of the tools to determine the recommended order of the courses to be taught is to set the prerequisites, that is, the conditions that have to be fulfilled before commencing the study of the course. The recommended sequence of courses is to follow logical links between their logical units, as the basic aim is to provide students with a coherent system according to the Comenius' principle of continuity. Declared continuity may, on the other hand, create organizational complications when passing through the study, as failure to complete one course may result in a whole sequence of forced deviations from the recommended curriculum and ultimately in the extension of the study period. This empirical study deals with the quantitative evaluation of the influence of the level of initial knowledge given by the previous study on the overall results in a certain follow-up course. In this evaluation, data were obtained that may slightly change the approach to determining prerequisites for higher education courses.


Sign in / Sign up

Export Citation Format

Share Document