A semi-hierarchical clustering method for constructing knowledge trees from stackoverflow

2020 ◽  
pp. 016555152096103
Author(s):  
Chun-Hsiung Tseng ◽  
Jia-Rou Lin

To help students learn how to programme, we have to give them a clear knowledge map and sufficient materials. Question-based websites, such as stackoverflow, are excellent information sources for this goal. However, for beginners, the process can be a little tricky since they may not know how to ask correct questions if they do not have sufficient background knowledge, and a knowledge tree is usually considered more helpful in such a scenario. In this research, a method to infer a knowledge tree automatically from the type of websites and to group documents based on the resulting knowledge tree is proposed. The proposed method mainly addresses two issues: first, the quality of tags cannot be guaranteed, and second, clustering-based methods usually generate the flat schema. The occurrence count and the co-occurrence ratio were used together to identify important tags. Then, an algorithm was developed to infer the hierarchical relationship between tags. Using these tags as centres, the clustering performance is better than applying k-means alone.

2005 ◽  
Vol 02 (02) ◽  
pp. 167-180
Author(s):  
SEUNG-JOON OH ◽  
JAE-YEARN KIM

Clustering of sequences is relatively less explored but it is becoming increasingly important in data mining applications such as web usage mining and bioinformatics. The web user segmentation problem uses web access log files to partition a set of users into clusters such that users within one cluster are more similar to one another than to the users in other clusters. Similarly, grouping protein sequences that share a similar structure can help to identify sequences with similar functions. However, few clustering algorithms consider sequentiality. In this paper, we study how to cluster sequence datasets. Due to the high computational complexity of hierarchical clustering algorithms for clustering large datasets, a new clustering method is required. Therefore, we propose a new scalable clustering method using sampling and a k-nearest-neighbor method. Using a splice dataset and a synthetic dataset, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional algorithms.


2020 ◽  
Vol 42 ◽  
pp. e44378
Author(s):  
Manoel Rivelino Gomes de Oliveira ◽  
David Venâncio da Cruz ◽  
Moacyr Cunha Filho

This work uses non-hierarchical grouping methods to evaluate the quality of the groups formed by plate cisterns according to some water quality variables. These methods use the cluster validation criterion to determine the optimal partition, which provides the most homogeneous groups possible. The methods were tested on a sample of 100 cisterns located in the Pajeú region. However, the non-hierarchical clustering method of ‘K-medoid’ formed more homogeneous groups, and thus the best performance according to the Silhouette [s (i) = 0.64] statistics.


2011 ◽  
Vol 42 (8) ◽  
pp. 1753-1762 ◽  
Author(s):  
N. J. Reavley ◽  
A. J. Mackinnon ◽  
A. J. Morgan ◽  
M. Alvarez-Jimenez ◽  
S. E. Hetrick ◽  
...  

BackgroundAlthough mental health information on the internet is often of poor quality, relatively little is known about the quality of websites, such as Wikipedia, that involve participatory information sharing. The aim of this paper was to explore the quality of user-contributed mental health-related information on Wikipedia and compare this with centrally controlled information sources.MethodContent on 10 mental health-related topics was extracted from 14 frequently accessed websites (including Wikipedia) providing information about depression and schizophrenia, Encyclopaedia Britannica, and a psychiatry textbook. The content was rated by experts according to the following criteria: accuracy, up-to-dateness, breadth of coverage, referencing and readability.ResultsRatings varied significantly between resources according to topic. Across all topics, Wikipedia was the most highly rated in all domains except readability.ConclusionsThe quality of information on depression and schizophrenia on Wikipedia is generally as good as, or better than, that provided by centrally controlled websites, Encyclopaedia Britannica and a psychiatry textbook.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Wenjia Chen ◽  
Jinlin Li

Abstract Background To enhance teleconsultation management, demands can be classified into different patterns, and the service of each pattern demand can be improved. Methods For the effective teleconsultation classification, a novel ensemble hierarchical clustering method is proposed in this study. In the proposed method, individual clustering results are first obtained by different hierarchical clustering methods, and then ensembled by one-hot encoding, the calculation and division of cosine similarity, and network graph representation. In the built network graph about the high cosine similarity, the connected demand series can be categorized into one pattern. For verification, 43 teleconsultation demand series are used as sample data, and the efficiency and quality of teleconsultation services are respectively analyzed before and after the demand classification. Results The teleconsultation demands are classified into three categories, erratic, lumpy, and slow. Under the fixed strategies, the service analysis after demand classification reveals the deficiencies of teleconsultation services, but analysis before demand classification can’t. Conclusion The proposed ensemble hierarchical clustering method can effectively category teleconsultation demands, and the effective demand categorization can enhance teleconsultation management.


Author(s):  
SEUNG-JOON OH ◽  
JAE-YEARN KIM

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, few existing clustering algorithms consider sequentiality. In this paper, we study how to cluster these sequence datasets. We propose a new similarity measure to compute the similarity between two sequences. In the proposed measure, subsets of a sequence are considered, and the more identical subsets there are, the more similar the two sequences. In addition, we propose a hierarchical clustering algorithm and an efficient method for measuring similarity. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.


Author(s):  
Nandi K Sukendar ◽  
Amran Laga ◽  
Try Permata Siade

(Ananas comosus (L) Merr) is a plant that contains enzyme bromelin in the fruit, leaf and skin but more in the stem. In general, enzymes are produced and marketed in powder form. However, producing enzymes in powder form requires high process technology and very expensive cost. The purpose of this research were to know how to prepare bromelin enzyme in liquid form and it preservation technique, and to know the quality of bromelin enzyme during storage. This research was conducted through 3 stages: 1. Extraction of bromelin from pineapple stem, 2. Physical and Chemical preservation and 3. Observation of enzyme storage. The results of the study showed that preservation of enzyme bromelin physically and chemically still performed enzyme activity on day 32, while enzyme without preservation did not show activity at day 24. Bromelin enzyme with physical preservation was better than chemical preservation.


Liquidity ◽  
2018 ◽  
Vol 2 (2) ◽  
pp. 151-159
Author(s):  
Pitri Yandri

The purpose of this study is (1) to analyze public perception on urban services before and after the expansion of the region, (2) analyze the level of people's satisfaction with urban services, and (3) analyze the determinants of the variables that determine what level of people's satisfaction urban services. This study concluded that first, after the expansion, the quality of urban services in South Tangerang City is better than before. Secondly, however, public satisfaction with the services only reached 48.53% (poor scale). Third, by using a Cartesian Diagram, the second priority that must be addressed are: (1) clarity of service personnel, (2) the discipline of service personnel, (3) responsibility for care workers; (4) the speed of service, (5) the ability of officers services, (6) obtain justice services, and (7) the courtesy and hospitality workers.


2019 ◽  
Vol 9 (01) ◽  
pp. 47-54
Author(s):  
Rabbai San Arif ◽  
Yuli Fitrisia ◽  
Agus Urip Ari Wibowo

Voice over Internet Protocol (VoIP) is a telecommunications technology that is able to pass the communication service in Internet Protocol networks so as to allow communicating between users in an IP network. However VoIP technology still has weakness in the Quality of Service (QoS). VOPI weaknesses is affected by the selection of the physical servers used. In this research, VoIP is configured on Linux operating system with Asterisk as VoIP application server and integrated on a Raspberry Pi by using wired and wireless network as the transmission medium. Because of depletion of IPv4 capacity that can be used on the network, it needs to be applied to VoIP system using the IPv6 network protocol with supports devices. The test results by using a wired transmission medium that has obtained are the average delay is 117.851 ms, jitter is 5.796 ms, packet loss is 0.38%, throughput is 962.861 kbps, 8.33% of CPU usage and 59.33% of memory usage. The analysis shows that the wired transmission media is better than the wireless transmission media and wireless-wired.


2018 ◽  
Vol 5 (2) ◽  
pp. 102
Author(s):  
Enike Dwi Kusumawati ◽  
Selvinus Lawu Woli ◽  
Aju Tjatur Nugroho Krisnaningsih ◽  
Waluyo Edi Susanto ◽  
Syam Rahadi

ABSTRAKPenelitian ini dilakukan untuk mengetahui motilitas dan viabilitas spermatozoa ayam kampung pada suhu 5oC menggunakan pengencer dan lama simpan yang berbeda. Metode yang digunakan dalam penelitian ini adalah penelitian laboratorium menggunakan Rancangan Acak Lengkap (RAL) Faktorial dengan pengencer ringer lactat solution, air kelapa dan tanpa pengencer serta lama simpan 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, masing-masing diulang 10 kali. Variabel yang diamati yaitu motilitas dan viabilitas spermatozoa. Analisa data yang digunakan adalah analisis varian. Hasil analisis data menunjukkan bahwa motilitas dan viabilitas spermatozoa menggunakan pengencer ringer lactat solution lebih tinggi (P<0,01) serta dapat bertahan sampai lama simpan 24 jam dibandingkan air kelapa dan tanpa pengencer. Adapun nilai motilitas ringer lactat solution, air kelapa dan tanpa pengencer pada lama simpan 24 jam masing-masing sebesar 43,5±17,17%; 8±4,83%; 6,5±2,4%, sedangkan nilai viabilitasnya sebesar 83,2±7,25%; 64,6±3,20%; dan 63,1±2,33%. Kesimpulan dari hasil penelitian ini adalah ringer lactat solution lebih baik dibandingkan air kelapa dan tanpa pengencer dalam mempertahankan kualitas semen ayam kampung pada suhu simpan 5oC sampai lama simpan 24 jam.Kata Kunci : air kelapa, ayam kampung, motilitas, spermatozoa, viabilitas  ABSTRACTThis study was conducted to determine the motility and viability of spermatozoa of Native chickens at 5oC using different diluents and time storage. The method used in this study was laboratory research using Factorial Completely Randomized Design with ringer lactate solution, coconut water and without diluent at 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30 hours of time storage each repeated 10 times. The variables observed were motility and viability of sperm. Data analysis used is variance analysis. The results of data analysis showed that the motility and viability of spermatozoa using ringer lactate solution diluents was higher (P <0.05) than coconut water and without diluents. The motility values of ringer lactat solution, coconut water and without diluents were 43,5±17,17%; 8±4,83%; 6,5±2,4% respectively, while the viability values were 83,2±7,25%; 64,6±3,20% and 63,1±2,33%. The conclusion of this study is that ringer lactat solution is better than coconut water an without diluents in maintaining the quality of Native chicken semen at a storage temperature of 5oC until 24 hours.Keywords: coconut water, motility, native chicken, sperm, viability


Sign in / Sign up

Export Citation Format

Share Document