The Document Similarity Index based on the Jaccard Distance for Mail Filtering

Author(s):  
Seiya Temma ◽  
Manabu Sugii ◽  
Hiroshi Matsuno
Author(s):  
Mardi Siswo Utomo ◽  
Edi Winarko

Abstract— Document similarity can be used as a reference for other information searches similar. So as to reduce the time-re-appointment for information following a similar document. Document similarity search capability is usually implemented on the features 'related articles'.Similarity of documents can be measured with a cosine, with preprosesing conducted prior to the document that will be measured. The indexing process and the measurement takes a relatively long excecution time. Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.The purpose of this research is to design and create a software that give capability for web-based database management system of medical journals in Indonesian language to find other documents similar to the current document in reading at the time.The results of this research is the mechanism autoreload javascript and session cookies and can break down the process and measurement index similaritas into several small sections, so the process can be performed on web-based applications and the number of relatively large documents.Results with the cosine similarity measure in the case of Indonesian-language medical journal “Media medika Indonesiana” has a fairly high accuracy of 90%. Keywords— document similarity, cosine measure, web-based application.


2020 ◽  
Vol 41 (4) ◽  
pp. 219-227 ◽  
Author(s):  
Bojana M. Dinić ◽  
Tara Bulut Allred ◽  
Boban Petrović ◽  
Anja Wertag

Abstract. The aim of this study was to evaluate psychometric properties of three sadism scales: Short Sadistic Impulse Scale (SSIS), Varieties of Sadistic Tendencies (VAST, which measures direct and vicarious sadism), and Assessment of Sadistic Personality (ASP). Sample included 443 participants (50.1% men) from the general population. Reliability based on internal consistency of all scales was good, and results of Confirmatory Factor Analysis (CFA) showed that all three scales had acceptable fit indices for the proposed structure. Results of Item Response Theory (IRT) analysis showed that all three scales had higher measurement precision (information) in above-average scores. Validity of the scales was supported through moderate to high positive correlations with the Dark Triad traits, especially psychopathy, as well as positive correlations with aggressiveness and negative with Honesty-Humility. Moreover, results of hierarchical regression analysis showed that all three measures of direct, but not vicarious sadism, contributed significantly above and beyond other Dark Triad traits to the prediction of increased positive attitudes toward dangerous social groups. The profile similarity index showed that the SSIS and the ASP were highly overlapping, while vicarious sadism seems distinct from other sadism scales.


2018 ◽  
Vol 8 (2) ◽  
pp. 348-353
Author(s):  
E. A. Kuchina ◽  
N. D. Ovcharenko ◽  
L. D. Vasileva

<p>Anthropogenic impact on the population of ground beetles leads to a change in their numbers, structure of dominance, density, species composition, spectrum of life forms. This makes the beetles Carabidae a convenient and informative bioindicator of the ecological state of biocenoses. The material for this work was the Carabidae collections conducted in June-August 2016-2017 in the park zone of different regions of Barnaul, differing in location, area, hydrological regime, vegetation cover, purpose and anthropogenic load. When processing the material, the quantitative, species and generic composition of the carabidae was determined, calculations were made for such indicators as the Berger-Parker dominance index, the Shannon species diversity index (Hs), and the Jacquard species similarity index. The fauna (Coleoptera, Carabidae) of the park zone of Barnaul is represented by 55 species belonging to 20 genera. The dominant group is represented by species belonging to steppe, forest and polyzonal groups. Forest-steppe species of ground beetles as dominants have not been identified in any of the investigated territories. The greatest variety of ecological groups was noted on the territory of the Yubileyny рark, which is explained by the presence of zones with various microclimatic conditions, the presence of a birch grove that flows through the park with the Pivovarka River, and a wide log in the park. Registered species belong to eight groups of life forms belonging to two classes - zoophagous and myxophytophagous. On the numerical and species abundance, zoophages predominate. The spectrum of life forms corresponds to the zonal spectrum characteristic of the forest-steppe zone.</p><p> </p>


2014 ◽  
Vol 6 (2) ◽  
pp. 46-51
Author(s):  
Galang Amanda Dwi P. ◽  
Gregorius Edwadr ◽  
Agus Zainal Arifin

Nowadays, a large number of information can not be reached by the reader because of the misclassification of text-based documents. The misclassified data can also make the readers obtain the wrong information. The method which is proposed by this paper is aiming to classify the documents into the correct group.  Each document will have a membership value in several different classes. The method will be used to find the degree of similarity between the two documents is the semantic similarity. In fact, there is no document that doesn’t have a relationship with the other but their relationship might be close to 0. This method calculates the similarity between two documents by taking into account the level of similarity of words and their synonyms. After all inter-document similarity values obtained, a matrix will be created. The matrix is then used as a semi-supervised factor. The output of this method is the value of the membership of each document, which must be one of the greatest membership value for each document which indicates where the documents are grouped. Classification result computed by the method shows a good value which is 90 %. Index Terms - Fuzzy co-clustering, Heuristic, Semantica Similiarity, Semi-supervised learning.


2020 ◽  
Vol 25 (2) ◽  
pp. 86-97
Author(s):  
Sandy Suryo Prayogo ◽  
Tubagus Maulana Kusuma

DVB merupakan standar transmisi televisi digital yang paling banyak digunakan saat ini. Unsur terpenting dari suatu proses transmisi adalah kualitas gambar dari video yang diterima setelah melalui proses transimisi tersebut. Banyak faktor yang dapat mempengaruhi kualitas dari suatu gambar, salah satunya adalah struktur frame dari video. Pada tulisan ini dilakukan pengujian sensitifitas video MPEG-4 berdasarkan struktur frame pada transmisi DVB-T. Pengujian dilakukan menggunakan simulasi matlab dan simulink. Digunakan juga ffmpeg untuk menyediakan format dan pengaturan video akan disimulasikan. Variabel yang diubah dari video adalah bitrate dan juga group-of-pictures (GOP), sedangkan variabel yang diubah dari transmisi DVB-T adalah signal-to-noise-ratio (SNR) pada kanal AWGN di antara pengirim (Tx) dan penerima (Rx). Hasil yang diperoleh dari percobaan berupa kualitas rata-rata gambar pada video yang diukur menggunakan metode pengukuran structural-similarity-index (SSIM). Dilakukan juga pengukuran terhadap jumlah bit-error-rate BER pada bitstream DVB-T. Percobaan yang dilakukan dapat menunjukkan seberapa besar sensitifitas bitrate dan GOP dari video pada transmisi DVB-T dengan kesimpulan semakin besar bitrate maka akan semakin buruk nilai kualitas gambarnya, dan semakin kecil nilai GOP maka akan semakin baik nilai kualitasnya. Penilitian diharapkan dapat dikembangkan menggunakan deep learning untuk memperoleh frame struktur yang tepat di kondisi-kondisi tertentu dalam proses transmisi televisi digital.


Sign in / Sign up

Export Citation Format

Share Document