Discovering Spatio-Textual Association Rules in Document Images

2011 ◽  
pp. 176-197
Author(s):  
Donato Malerba ◽  
Margherita Berardi ◽  
Michelangelo Ceci

This chapter introduces a data mining method for the discovery of association rules from images of scanned paper documents. It argues that a document image is a multi-modal unit of analysis whose semantics is deduced from a combination of both the textual content and the layout structure and the logical structure. Therefore, it proposes a method where both the spatial information derived from a complex document image analysis process (layout analysis), and the information extracted from the logical structure of the document (document image classification and understanding) and the textual information extracted by means of an OCR, are simultaneously considered to generate interesting patterns. The proposed method is based on an inductive logic programming approach, which is argued to be the most appropriate to analyze data available in more than one modality. It contributes to show a possible evolution of the unimodal knowledge discovery scheme, according to which different types of data describing the units of analysis are dealt with through the application of some preprocessing technique that transform them into a single double entry tabular data.

10.1068/b3305 ◽  
2007 ◽  
Vol 34 (5) ◽  
pp. 767-784 ◽  
Author(s):  
Diansheng Guo ◽  
Ke Liao ◽  
Michael Morgan

The terrorism database includes more than 27000 terrorism incidents between 1968 and 2006. Each incident record has spatial information (country names for all records and city names for some records), a time stamp (ie year, month, and day), and several other fields (eg tactics, weapon types, target types, fatalities, and injuries). We introduce a unified visualization environment that is able to present various types of patterns and thus to facilitate explorations of the incident data from different perspectives. With the visualization environment one can visualize either spatiomultivariate, spatiotemporal, temporal - multivariate, or spatiotemporal - multivariate patterns. For example, the analyst can examine the characteristics (in terms of target types, tactics, or other multivariate vectors) of aggregated incidents and at the same time perceive how multivariate characteristics change over time and vary spatially. Special attention is devoted to the application-specific data analysis process, from data compilation, geocoding, preprocessing, and transformation, through customization and configuration of visualization components, to the interpretation and presentation of discovered patterns.


2015 ◽  
Vol 15 (4-5) ◽  
pp. 481-494 ◽  
Author(s):  
CRAIG BLACKMORE ◽  
OLIVER RAY ◽  
KERSTIN EDER

AbstractThis paper introduces a new logic-based method for optimising the selection of compiler flags on embedded architectures. In particular, we use Inductive Logic Programming (ILP) to learn logical rules that relate effective compiler flags to specific program features. Unlike earlier work, we aim to infer human-readable rules and we seek to develop a relational first-order approach which automatically discovers relevant features rather than relying on a vector of predetermined attributes. To this end we generated a data set by measuring execution times of 60 benchmarks on an embedded system development board and we developed an ILP prototype which outperforms the current state-of-the-art learning approach in 34 of the 60 benchmarks. Finally, we combined the strengths of the current state of the art and our ILP method in a hybrid approach which reduced execution times by an average of 8% and up to 50% in some cases.


2020 ◽  
Vol 39 (5) ◽  
pp. 7233-7246
Author(s):  
Fahed Yoseph ◽  
Markku Heikkilä

Market Intelligence is knowledge extracted from numerous data sources, both internal and external, to provide a holistic view of the market and to support decision-making. Association Rules Mining provides powerful data mining techniques for identifying associations and co-occurrences in large databases. Market Basket Analysis (MBA) uses ARM to gain insights from heterogeneous consumer shopping patterns and examines the effects of marketing initiatives. As Artificial Intelligence (AI) more and more finds its way to marketing, it entails fundamental changes in the skills-set required by marketers. For MBA, AI provides important ways to improve both the outcomes of the market basket analysis and the performance of the analysis process. In this study we demonstrate the effects of AI on MBA by our proposed new MBA model where results of computational intelligence are used in data preprocessing, in market segmentation and in finding market trends. We show with point-of-sale (POS) data of a small, local retailer that our proposed “Åbo algorithm” MBA model increases mining performance/intelligence and extract important marketing insights to assess both demand dynamics and product popularity trends. Additionally, the results show how, as related to the 80/20 percent rule, 78% of revenue is derived 16% of the product assortment.


2021 ◽  
Author(s):  
Andrew Lensen

When faced with a new dataset, most practitioners begin by performing exploratory data analysis to discover interesting patterns and characteristics within data. Techniques such as association rule mining are commonly applied to uncover relationships between features (attributes) of the data. However, association rules are primarily designed for use on binary or categorical data, due to their use of rule-based machine learning. A large proportion of real-world data is continuous in nature, and discretisation of such data leads to inaccurate and less informative association rules. In this paper, we propose an alternative approach called feature relationship mining (FRM), which uses a genetic programming approach to automatically discover symbolic relationships between continuous or categorical features in data. To the best of our knowledge, our proposed approach is the first such symbolic approach with the goal of explicitly discovering relationships between features. Empirical testing on a variety of real-world datasets shows the proposed method is able to find high-quality, simple feature relationships which can be easily interpreted and which provide clear and non-trivial insight into data.


Author(s):  
Zailani Abdullah ◽  
Aggy Gusman ◽  
Tutut Herawan ◽  
Mustafa Mat Deris

One of the interesting and meaningful information that is hiding in transactional database is indirect association rule. It corresponds to the property of high dependencies between two items that are rarely occurred together but indirectly emerged via another items. Since indirect association rule is nontrivial information, it can implicitly give a new perspective of relationship which cannot be directly observed from the common rule. Therefore, we proposed an algorithm for Mining Indirect Least Association Rule (MILAR) from the real and benchmarked datasets. MILAR is embedded with our scalable least measure namely Critical Relative Support (CRS). The experimental results show that MILAR can generate the desired rules in term of least and indirect least association rules. In addition, the obtained results can also be used by the domain experts to do further analysis and finally reveal more interesting findings


2011 ◽  
Vol 11 (4-5) ◽  
pp. 783-799 ◽  
Author(s):  
DOMENICO CORAPI ◽  
ALESSANDRA RUSSO ◽  
MARINA DE VOS ◽  
JULIAN PADGET ◽  
KEN SATOH

AbstractIn this paper we propose a use-case-driven iterative design methodology for normative frameworks, also called virtual institutions, which are used to govern open systems. Our computational model represents the normative framework as a logic program under answer set semantics (ASP). By means of an inductive logic programming approach, implemented using ASP, it is possible to synthesise new rules and revise the existing ones. The learning mechanism is guided by the designer who describes the desired properties of the framework through use cases, comprising (i) event traces that capture possible scenarios, and (ii) a state that describes the desired outcome. The learning process then proposes additional rules, or changes to current rules, to satisfy the constraints expressed in the use cases. Thus, the contribution of this paper is a process for the elaboration and revision of a normative framework by means of a semi-automatic and iterative process driven from specifications of (un)desirable behaviour. The process integrates a novel and general methodology for theory revision based on ASP.


Sign in / Sign up

Export Citation Format

Share Document