SQLiDDS: SQL injection detection using document similarity measure

2016 ◽  
Vol 24 (4) ◽  
pp. 507-539 ◽  
Author(s):  
Debabrata Kar ◽  
Suvasini Panigrahi ◽  
Srikanth Sundararajan
2018 ◽  
Vol 5 (1) ◽  
Author(s):  
Marzieh Oghbaie ◽  
Morteza Mohammadi Zanjireh

2003 ◽  
Vol 2 (3) ◽  
pp. 160-170 ◽  
Author(s):  
Steven Noel ◽  
Chee-Hung Henry Chu ◽  
Vijay Raghavan

Visualization of author or document influence networks as a two-dimensional image can provide key insights into the direct influence of authors or documents on each other in a document collection. The influence network is constructed based on the minimum spanning tree, in which the nodes are documents and an edge is the most direct influence between two documents. Influence network visualizations have typically relied on co-citation correlation as a measure of document similarity. That is, the similarity between two documents is computed by correlating the sets of citations to each of the two documents. In a different line of research, co-citation count (the number of times two documents are jointly cited) has been applied as a document similarity measure. In this work, we demonstrate the impact of each of these similarity measures on the document influence network. We provide examples, and analyze the significance of the choice of similarity measure. We show that correlation-based visualizations exhibit chaining effects (low average vertex degree), a manifestation of multiple minor variations in document similarities. These minor similarity variations are absent in count-based visualizations. The result is that count-based influence network visualizations are more consistent with the intuitive expectation of authoritative documents being hubs that directly influence large numbers of documents.


2012 ◽  
Vol 63 (8) ◽  
pp. 1593-1608 ◽  
Author(s):  
Lan Huang ◽  
David Milne ◽  
Eibe Frank ◽  
Ian H. Witten

Author(s):  
YONGLI LIU ◽  
YUANXIN OUYANG ◽  
ZHANG XIONG

Document clustering is one of the most effective techniques to organize documents in an unsupervised manner. In this paper, an Incremental method for document Clustering based on Information Bottleneck theory (ICIB) is presented. The ICIB is designed to improve the accuracy and efficiency of document clustering, and resolve the issue that an arbitrary choice of document similarity measure could produce an inaccurate clustering result. In our approach, document similarity is calculated using information bottleneck theory and documents are grouped incrementally. A first document is selected randomly and classified as one cluster, then each remaining document is processed incrementally according to the mutual information loss introduced by the merger of the document and each existing cluster. If the minimum value of mutual information loss is below a certain threshold, the document will be added to its closest cluster; otherwise it will be classified as a new cluster. The incremental clustering process is low-precision and order-dependent, which cannot guarantee accurate clustering results. Therefore, an improved sequential clustering algorithm (SIB) is proposed to adjust the intermediate clustering results. In order to test the effectiveness of ICIB method, ten independent document subsets are constructed based on the 20NewsGroup and Reuters-21578 corpora. Experimental results show that our ICIB method achieves higher accuracy and time performance than K-Means, AIB and SIB algorithms.


Author(s):  
Mardi Siswo Utomo ◽  
Edi Winarko

Abstract— Document similarity can be used as a reference for other information searches similar. So as to reduce the time-re-appointment for information following a similar document. Document similarity search capability is usually implemented on the features 'related articles'.Similarity of documents can be measured with a cosine, with preprosesing conducted prior to the document that will be measured. The indexing process and the measurement takes a relatively long excecution time. Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.The purpose of this research is to design and create a software that give capability for web-based database management system of medical journals in Indonesian language to find other documents similar to the current document in reading at the time.The results of this research is the mechanism autoreload javascript and session cookies and can break down the process and measurement index similaritas into several small sections, so the process can be performed on web-based applications and the number of relatively large documents.Results with the cosine similarity measure in the case of Indonesian-language medical journal “Media medika Indonesiana” has a fairly high accuracy of 90%. Keywords— document similarity, cosine measure, web-based application.


Sign in / Sign up

Export Citation Format

Share Document