Information Retrieval using Jaccard Similarity Coefficient

2016 ◽  
Vol 36 (3) ◽  
pp. 140-143 ◽  
Author(s):  
Manoj Chahal ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 42
Author(s):  
Luke Michael Febriansyah ◽  
Shinta Estri Wahyuningrum

Cases of plagiarism in recent years has been an issues. Based on that issues, this research will create a system to detect similarity in a text. There is an aspect as reference of the research that is analyze the plagiarism algorithm. This research will analyze the accuracy one of plagiarism check algorithm, winnowing algorithm. Winnowing algorithm is a plagiarism detection algorithm based on document fingerprinting. To calculate percentage similarity of document fingerprinting in text, there are 3 methods to measure similarity that will be used in this research, which is jaccard similarity coefficient, sorensen dice similarity coefficient, and berg similarity coefficient.


2017 ◽  
Vol 14 (2) ◽  
pp. 775-782 ◽  
Author(s):  
Sobhan Normohamadi ◽  
Mahmood Solouki ◽  
Forouzan Heidari

ABSTRACT: Biodiversity is one of the most important factors in the survival and improvement of any species. Therefore, germplasm collection is the first step for plant improvement. To investigate their genetic and morphological relationships, 10 morphological traits of 20 genotypes of local cucumbers were evaluated using 9 SSR primers. A high genetic variability was observed for the number of flowers per plant. The values of the Jaccard similarity coefficient ranged between 0.51 and 0.92, indicating a high diversity of the genotypes. To evaluate the genetic similarity among genotypes, a cluster analysis using the UPGMA method was performed based on the Jaccard similarity coefficient. The average genetic distance between genotypes (using the Jaccard similarity coefficient) was 0.74 and the mean polymorphic information content (PIC) was 0.69. The primer SSR13251 had the highest PIC (0.8). The clustering pattern of the SSR markers did not coincide with the groupings based on quantitative traits. A dendrogram of the cluster analysis of molecular data showed a high diversity among the studied genotypes. The highest genetic similarity was between genotypes 2 and 3 (0.94), and the lowest genetic similarity was between genotypes 6 and 12 (0.51). The results suggest that SSR markers are a suitable tool to study the genetic diversity and relationships among different genotypes in cucumber.


Genetika ◽  
2014 ◽  
Vol 46 (3) ◽  
pp. 975-984 ◽  
Author(s):  
Ali Ghasemi ◽  
Ahmad Golparvar ◽  
Mehdi Isfahani

Plant breeding programs are formulated based on the diversity and selection of superior quantitative and qualitative traits. Hence, assessment of genetic diversity is the first step of every plant breeding program. In this regard, use of new methods for studying genetic diversity seems necessary. In the present study, the genetic diversity of thirty sugar beet genotypes was determined using Random Amplified Polymorphic DNA (RAPD) marker. Following the DNA extraction and optimization of experiment conditions, of the 40 primers under study, 10 primers that induced polymorphism and produced good and clear bands in the genotypes of sugar beet were randomly selected. Statistical calculations were carried out based on the Jaccard similarity coefficient and UPGMA-based grouping in the NTSYS software (version 2.02). The amplitude of the multiplied bands varied between 100 and 3000 of alkaline pair. The polymorphism of all primers was 82.33% within the similarity limit. The Cophenetic coefficient for the similarity matrix and the resulting curve was obtained to be r=0.75. Genotypes 4 and 18 with a similarity coefficient of 0.91% demonstrated the highest similarity while genotypes 21 and 15 with a similarity coefficient of 0.63% showed the lowest similarity. Of the primers in use, the OPB-18 primer produced 12 bands (the highest number of bands) and the OPA-09 primer produced 5 bands (the lowest number of bands). Cluster analysis also confirmed the results obtained from the profiles produced in the genetic differentiation of cultivars under study as well as the correlations resulting from the Jaccard similarity coefficient. Finally, genotypes were categorized into 13 groups based on the results and resulting dendrogram. Results of the cluster analysis performed using the Jaccard similarity coefficient revealed the genetic diversity among genotypes that emphasize on efficiency of selection in sugar beet genotypes.


HortScience ◽  
2014 ◽  
Vol 49 (5) ◽  
pp. 524-530
Author(s):  
Lydia E. Wahba ◽  
Nor Hazlina ◽  
A. Fadelah ◽  
Wickneswari Ratnam

Dendrobium is one of the largest genera in the Orchidaceae family. Information on the genetic diversity and relationships among species and hybrids is important for breeding purposes and species conservation. The objectives of this study were to assess genetic relatedness and to determine whether morphological, molecular, or combined analysis can discriminate among Dendrobium species, commercial hybrids, and interspecific hybrids. A total of 81 Dendrobium accessions were characterized with 12 amplified fragment length polymorphism (AFLP) primer pairs and 21 morphological characters. Mean genetic relatedness for morphological characters, AFLP analysis, and combined analysis were 0.61, 0.37, and 0.43, respectively. Dendrograms were generated using an unweighted pair group method with arithmetic averages (UPGMA); the analysis was performed on a Jaccard similarity coefficient matrix. The data from morphological characters revealed that the Jaccard similarity coefficient ranged from 0.20 to 1.0, where the tested 81 Dendrobium accessions could be grouped into four clusters. For the AFLP analysis, the number of polymorphic fragments for each primer varied from 80 to 284 with 78% average percentage of polymorphic loci and the similarity coefficient ranging from 0.125 to 1.0 with Dendrobium accessions grouped into three clusters. The similarity coefficients estimated through a combined analysis of morphological and AFLP data ranged from 0.21 to 1.0 and the Dendrobium accessions appeared clustered into two groups. The results revealed some similarities among the three data sets. The combined data set was the most useful in discriminating Dendrobium accessions based on species sections and relationship among species and their hybrids. The correlation between the AFLP data and the combined data was highly significant (r = 0.98, P > 0.001), indicating the usefulness of AFLP data for species discrimination and hybrid identity in the absence of floral morphological characters.


2006 ◽  
Author(s):  
W. Benjamin Porr ◽  
Dustin W. Scott ◽  
Thomas A. Stetz ◽  
Julie A. Cincotta ◽  
Scott B. Button

2020 ◽  
Vol 5 (6) ◽  
pp. 363-369
Author(s):  
Hao Tuan Huynh ◽  
Nghia Duong-Trung ◽  
Dinh Quoc Truong ◽  
Hiep Xuan Huynh

2019 ◽  
Vol 39 (4) ◽  
pp. 145-151
Author(s):  
Kalyan Sundar Samanta ◽  
Durga Sankar Rath

The concept of ‘social tagging’ has gained popularity nowadays due to the emergence of web 2.0 technologies. Those technologies led to the practice of associating metadata with digital resources among users through collaboratively or socially for self-information retrieval. Many researchers have opined that social tags can enhance the use of library collections. The present study was predominantly carried out to compare social tags collected from the LibraryThing website with Library of Congress Subject Heading (LCSH) descriptors collected from the Library of Congress Online Catalogue applied for thousand book titles in the field of Economics. The study also aimed to know whether social tags can be applied in the library database or not. The findings elucidate that users mostly use descriptors (47.39 %) as tags than expert’s usage of tags (12.77 %) as descriptors. Spearman’s correlation suggests that 75 per cent chance where tags and descriptors can be used simultaneously in overlapping terms. The Jaccard similarity coefficient identifies that users and experts use different terminologies to annotate the books. Users and experts use at least one common keyword for major book titles (908). Users mostly sought title based keywords but experts use mostly subject-based terminologies. The study further clarifies that social tags may be incorporated into the library databases but cannot replace LCSHs. The accessibility and usage of documents especially in the field of economics may be enhanced once the notion of social tags is incorporated with the library OPAC.


Sign in / Sign up

Export Citation Format

Share Document