tree similarity
Recently Published Documents


TOTAL DOCUMENTS

53
(FIVE YEARS 11)

H-INDEX

7
(FIVE YEARS 1)

2021 ◽  
Vol 5 (2) ◽  
pp. 106-114
Author(s):  
Muhamad Aldi Rifai ◽  
Indra Gita Anugrah

The activity of writing scientific articles by academics at universities is one of the activities that is often carried out, but when writing scientific articles problems arise regarding the difficulty of finding ideas, literature studies, and reference sources that you want to use as references when writing. Sometimes when searching on a search engine, we have trouble finding the right document, because usually, the keywords we are looking for are not in the title section but another part of the structure. Since most search engines only match titles, other structures are usually excluded from matching. So that the search results that we do sometimes don't match what we want. In addition, usually, each scientific article has many language differences in its structure as found in the abstract section. To detect similarities through the structure of scientific articles, an algorithm is used, namely weighted tree similarity, and to detect language using the N-gram algorithm, then the cosine similarity algorithm can be used to check the level of similarity in keyword text with text in scientific articles.


2021 ◽  
Author(s):  
Jianhua Wang ◽  
Jianye Yang ◽  
Wenjie Zhang

2021 ◽  
Vol 5 (1) ◽  
pp. 21-27
Author(s):  
Abdurrosyiid amrullah ◽  
Indra Gita Anugrah

As more and more documents we manage, the more difficult it is in the search process, and the need to use information retrieval becomes important. With the information retrieval system, it can help in searching for documents that match the similarity of keywords. Usually document searches usually only see the name of the document (file) being searched for by the user without paying attention to the content or metadata of the document, so that it cannot meet their information needs. Document search has several approaches, including full-text search, plain metadata search and semantic search. This study uses the Weighted Tree Similarity algorithm with the Cosine Sorensen Dice algorithm to calculate the semantic search similarity. In this study, document metadata is represented in the form of a tree that has labeled nodes, labeled branches and weighted branches. The similarity calculation on the subtree edge label uses Cosine Sorensen Dice, while the total similarity of a document uses the weighted tree similarity. The metadata structure of the document uses the taxonomy owner, description, title, disposition content and type. The result of this research is a document search application with taxonomic weight on file storage.


Author(s):  
Muhammad Alkaff ◽  
Husnul Khatimi ◽  
Andi Eriadi

Perpustakaan Daerah Provinsi Kalimantan Selatan merupakan salah satu perpustakaan dan pusat penyedia layanan informasi yang ada di Kalimantan Selatan. Namun. selama ini pengunjung perpustakaan kesulitan dalam mencari buku yang berkaitan dengan buku yang dipilih sebelumnya dan juga dalam menemukan alternatif buku lain ketika buku yang diinginkan tersebut telah dipinjam. Dengan adanya rekomendasi atau saran buku-buku lain yang berhubungan diharapkan membantu dalam mendapatkan buku yang sesuai dan diinginkan pengunjung perpustakaan. Pada penelitian ini penerapan sistem rekomendasi menggunakan metode Content-Based Filtering dalam memberikan rekomendasi buku yang bekerja dengan melihat kemiripan item yang dianalisis dari fitur yang dikandungnya dengan Weighted Tree Similarity. Berdasarkan hasil pengujian yang telah dilakukan pada 5 skenario pengujian yang diujikan dihasilkan nilai precision sebesar 88%.


2020 ◽  
Vol 36 (20) ◽  
pp. 5007-5013 ◽  
Author(s):  
Martin R Smith

Abstract Motivation The Robinson–Foulds (RF) metric is widely used by biologists, linguists and chemists to quantify similarity between pairs of phylogenetic trees. The measure tallies the number of bipartition splits that occur in both trees—but this conservative approach ignores potential similarities between almost-identical splits, with undesirable consequences. ‘Generalized’ RF metrics address this shortcoming by pairing splits in one tree with similar splits in the other. Each pair is assigned a similarity score, the sum of which enumerates the similarity between two trees. The challenge lies in quantifying split similarity: existing definitions lack a principled statistical underpinning, resulting in misleading tree distances that are difficult to interpret. Here, I propose probabilistic measures of split similarity, which allow tree similarity to be measured in natural units (bits). Results My new information-theoretic metrics outperform alternative measures of tree similarity when evaluated against a broad suite of criteria, even though they do not account for the non-independence of splits within a single tree. Mutual clustering information exhibits none of the undesirable properties that characterize other tree comparison metrics, and should be preferred to the RF metric. Availability and implementation The methods discussed in this article are implemented in the R package ‘TreeDist’, archived at https://dx.doi.org/10.5281/zenodo.3528123. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document