Efficient data reduction in multimedia data

2006 ◽  
Vol 25 (3) ◽  
pp. 359-374 ◽  
Author(s):  
Surong Wang ◽  
Manoranjan Dash ◽  
Liang-Tien Chia ◽  
Min Xu
2017 ◽  
Vol 238 ◽  
pp. 234-244 ◽  
Author(s):  
Jianpei Wang ◽  
Shihong Yue ◽  
Xiao Yu ◽  
Yaru Wang

Author(s):  
Joe Tekli

W3C’s XML (eXtensible Mark-up Language) has recently gained unparalleled importance as a fundamental standard for efficient data management and exchange. The use of XML covers data representation and storage, database information interchange, data filtering, as well as Web applications interaction and interoperability. XML has been intensively exploited in the multimedia field as an effective and standard means for indexing, storing, and retrieving complex multimedia objects. SVG1, SMIL2, X3D3 and MPEG-74 are only some examples of XML-based multimedia data representations. With the ever-increasing Web exploitation of XML, there is an emergent need to automatically process XML documents and grammars for similarity classification and clustering, information extraction, and search functions. All these applications require some notion of structural similarity, XML representing semi-structured data. In this area, most work has focused on estimating similarity between XML documents (i.e., data layer). Nonetheless, few efforts have been dedicated to comparing XML grammars (i.e., type layer). Computing the structural similarity between XML documents is relevant in several scenarios such as change management (Chawathe, Rajaraman, Garcia- Molina, & Widom, 1996; Cobéna, Abiteboul, & Marian, 2002), XML structural query systems (finding and ranking results according to their similarity) (Schlieder, 2001; Zhang, Li, Cao, & Zhu, 2003) as well as the structural clustering of XML documents gathered from the Web (Dalamagas, Cheng, Winkel, & Sellis, 2006; Nierman & Jagadish, 2002). On the other hand, estimating similarity between XML grammars is useful for data integration purposes, in particular the integration of DTDs/schemas that contain nearly or exactly the same information but are constructed using different structures (Doan, Domingos, & Halevy, 2001; Melnik, Garcia-Molina, & Rahm, 2002). It is also exploited in data warehousing (mapping data sources to warehouse schemas) as well as XML data maintenance and schema evolution where we need to detect differences/updates between different versions of a given grammar/schema to consequently revalidate corresponding XML documents (Rahm & Bernstein, 2001). The goal of this article is to briefly review XML grammar structural similarity approaches. Here, we provide a unified view of the problem, assessing the different aspects and techniques related to XML grammar comparison. The remainder of this article is organized as follows. The second section presents an overview of XML grammar similarity, otherwise known as XML schema matching. The third section reviews the state of the art in XML grammar comparison methods. The fourth section discusses the main criterions characterizing the effectiveness of XML grammar similarity approaches. Conclusions and current research directions are covered in the last section.


Author(s):  
Veronika Strnadová-Neeley ◽  
Aydın Buluç ◽  
Jarrod Chapman ◽  
John R. Gilbert ◽  
Joseph Gonzalez ◽  
...  

2014 ◽  
Vol 11 (2) ◽  
pp. 665-678 ◽  
Author(s):  
Stefanos Ougiaroglou ◽  
Georgios Evangelidis

Data reduction techniques improve the efficiency of k-Nearest Neighbour classification on large datasets since they accelerate the classification process and reduce storage requirements for the training data. IB2 is an effective prototype selection data reduction technique. It selects some items from the initial training dataset and uses them as representatives (prototypes). Contrary to many other techniques, IB2 is a very fast, one-pass method that builds its reduced (condensing) set in an incremental manner. New training data can update the condensing set without the need of the ?old? removed items. This paper proposes a variation of IB2, that generates new prototypes instead of selecting them. The variation is called AIB2 and attempts to improve the efficiency of IB2 by positioning the prototypes in the center of the data areas they represent. The empirical experimental study conducted in the present work as well as the Wilcoxon signed ranks test show that AIB2 performs better than IB2.


Sign in / Sign up

Export Citation Format

Share Document