BDD-Based Combinatorial Keyword Query Processing under a Taxonomy Model

Author(s):  
Shin-ichi Minato ◽  
Nicolas Spyratos

Digital libraries are one of the key systems for an IT society, and supporting easy access to them is an important technical issue between a human and an intelligent system. Here the authors consider a publish/subscribe system for digital libraries which continuously evaluates queries over a large repository containing document descriptions. The subscriptions, the query expressions and the document descriptions, all rely on a taxonomy that is a hierarchically organized set of keywords, or terms. The digital library supports insertion, update and removal of a document. Each of these operations is seen as an event that must be notified only to those users whose subscriptions match the document's description. In this chapter, the authors present a novel method of processing such keyword queries. Their method is based on Binary Decision Diagram (BDD), an efficient data structure for manipulating large-scale Boolean functions. The authors compile the given keyword queries into a BDD under a taxonomy model. The number of possible keyword sets can be exponentially large, but the compiled BDD gives a compact representation, and enabling a highly efficient matching process. In addition, the authors' method can deal with any Boolean combination of keywords from the taxonomy, while the previous result considered only a conjunctive keyword set. In this chapter, they describe the basic idea of their new method, and then the authors show their preliminary experimental result applying to a document set with large-scale keyword domain under a real-life taxonomy structure.

Author(s):  
Shin-ichi Minato ◽  
Nicolas Spyratos

Digital libraries are one of the key systems for an IT society, and supporting easy access to them is an important technical issue between a human and an intelligent system. Here we consider a publish/subscribe system for digital libraries which continuously evaluates queries over a large repository containing document descriptions. The subscriptions, the query expressions and the document descriptions, all rely on a taxonomy that is a hierarchically organized set of keywords, or terms. The digital library supports insertion, update and removal of a document. Each of these operations is seen as an event that must be notified only to those users whose subscriptions match the document‘s description. In this chapter, the authors present a novel method of processing such keyword queries. Our method is based on Binary Decision Diagram (BDD), an efficient data structure for manipulating large-scale Boolean functions. The authors compile the given keyword queries into a BDD under a taxonomy model. The number of possible keyword sets can be exponentially large, but the compiled BDD gives a compact representation, and enabling a highly efficient matching process. In addition, our method can deal with any Boolean combination of keywords from the taxonomy, while the previous result considered only a conjunctive keyword set. In this chapter, they describe the basic idea of their new method, and then the authors show their preliminary experimental result applying to a document set with large-scale keyword domain under a real-life taxonomy structure.


2019 ◽  
Vol 8 (6) ◽  
pp. 287 ◽  
Author(s):  
Zhu ◽  
Wu ◽  
Chen ◽  
Jing

The tremendous advance in information technology has promoted the rapid development of location-based services (LBSs), which play an indispensable role in people’s daily lives. Compared with a traditional LBS based on Point-Of-Interest (POI), which is an isolated location point, an increasing number of demands have concentrated on Region-Of-Interest (ROI) exploration, i.e., geographic regions that contain many POIs and express rich environmental information. The intention behind the POI is to search the geographical regions related to the user’s requirements, which contain some spatial objects, such as POIs and have certain environmental characteristics. In order to achieve effective ROI exploration, we propose an ROI top-k keyword query method that considers the environmental information of the regions. Specifically, the Word2Vec model has been introduced to achieve the distributed representation of POIs and capture their environmental semantics, which are then leveraged to describe the environmental characteristic information of the candidate ROI. Given a keyword query, different query patterns are designed to measure the similarities between the query keyword and the candidate ROIs to find the k candidate ROIs that are most relevant to the query. In the verification step, an evaluation criterion has been developed to test the effectiveness of the distributed representations of POIs. Finally, after generating the POI vectors in high quality, we validated the performance of the proposed ROI top-k query on a large-scale real-life dataset where the experimental results demonstrated the effectiveness of our proposals.


2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


Author(s):  
Gianluca Bardaro ◽  
Alessio Antonini ◽  
Enrico Motta

AbstractOver the last two decades, several deployments of robots for in-house assistance of older adults have been trialled. However, these solutions are mostly prototypes and remain unused in real-life scenarios. In this work, we review the historical and current landscape of the field, to try and understand why robots have yet to succeed as personal assistants in daily life. Our analysis focuses on two complementary aspects: the capabilities of the physical platform and the logic of the deployment. The former analysis shows regularities in hardware configurations and functionalities, leading to the definition of a set of six application-level capabilities (exploration, identification, remote control, communication, manipulation, and digital situatedness). The latter focuses on the impact of robots on the daily life of users and categorises the deployment of robots for healthcare interventions using three types of services: support, mitigation, and response. Our investigation reveals that the value of healthcare interventions is limited by a stagnation of functionalities and a disconnection between the robotic platform and the design of the intervention. To address this issue, we propose a novel co-design toolkit, which uses an ecological framework for robot interventions in the healthcare domain. Our approach connects robot capabilities with known geriatric factors, to create a holistic view encompassing both the physical platform and the logic of the deployment. As a case study-based validation, we discuss the use of the toolkit in the pre-design of the robotic platform for an pilot intervention, part of the EU large-scale pilot of the EU H2020 GATEKEEPER project.


2021 ◽  
Vol 5 (1) ◽  
pp. 14
Author(s):  
Christos Makris ◽  
Georgios Pispirigos

Nowadays, due to the extensive use of information networks in a broad range of fields, e.g., bio-informatics, sociology, digital marketing, computer science, etc., graph theory applications have attracted significant scientific interest. Due to its apparent abstraction, community detection has become one of the most thoroughly studied graph partitioning problems. However, the existing algorithms principally propose iterative solutions of high polynomial order that repetitively require exhaustive analysis. These methods can undoubtedly be considered resource-wise overdemanding, unscalable, and inapplicable in big data graphs, such as today’s social networks. In this article, a novel, near-linear, and highly scalable community prediction methodology is introduced. Specifically, using a distributed, stacking-based model, which is built on plain network topology characteristics of bootstrap sampled subgraphs, the underlined community hierarchy of any given social network is efficiently extracted in spite of its size and density. The effectiveness of the proposed methodology has diligently been examined on numerous real-life social networks and proven superior to various similar approaches in terms of performance, stability, and accuracy.


Energies ◽  
2021 ◽  
Vol 14 (10) ◽  
pp. 2776
Author(s):  
Xin Ye ◽  
Jun Lu ◽  
Tao Zhang ◽  
Yupeng Wang ◽  
Hiroatsu Fukuda

Space cooling is currently the fastest-growing end-user in buildings. The global warming trend combined with increased population and economic development will lead to accelerated growth in space cooling in the future, especially in China. The hot summer and cold winter (HSCW) zone is the most densely populated and economically developed region in China, but with the worst indoor thermal environment. Relatively few studies have been conducted on the actual measurements in the optimization of insulation design under typical intermittent cooling modes in this region. This case study was conducted in Chengdu—the two residences selected were identical in design, but the south bedroom of the case study residence had interior insulation (inside insulation on all opaque interior surfaces of a space) retrofitted in the bedroom area in 2017. In August 2019, a comparative on-site measurement was done to investigate the effect of the retrofit work under three typical intermittent cooling patterns in the real-life scenario. The experimental result shows that interior insulation provides a significant improvement in energy-saving and the indoor thermal environment. The average energy savings in daily cooling energy consumption of the south bedroom is 42.09%, with the maximum reaching 48.91%. In the bedroom with interior insulation retrofit, the indoor temperature is closer to the set temperature and the vertical temperature difference is smaller during the cooling period; when the air conditioner is off, the room remains a comfortable temperature for a slightly longer time.


2021 ◽  
Vol 13 (7) ◽  
pp. 1367
Author(s):  
Yuanzhi Cai ◽  
Hong Huang ◽  
Kaiyang Wang ◽  
Cheng Zhang ◽  
Lei Fan ◽  
...  

Over the last decade, a 3D reconstruction technique has been developed to present the latest as-is information for various objects and build the city information models. Meanwhile, deep learning based approaches are employed to add semantic information to the models. Studies have proved that the accuracy of the model could be improved by combining multiple data channels (e.g., XYZ, Intensity, D, and RGB). Nevertheless, the redundant data channels in large-scale datasets may cause high computation cost and time during data processing. Few researchers have addressed the question of which combination of channels is optimal in terms of overall accuracy (OA) and mean intersection over union (mIoU). Therefore, a framework is proposed to explore an efficient data fusion approach for semantic segmentation by selecting an optimal combination of data channels. In the framework, a total of 13 channel combinations are investigated to pre-process data and the encoder-to-decoder structure is utilized for network permutations. A case study is carried out to investigate the efficiency of the proposed approach by adopting a city-level benchmark dataset and applying nine networks. It is found that the combination of IRGB channels provide the best OA performance, while IRGBD channels provide the best mIoU performance.


2008 ◽  
Vol 2008 ◽  
pp. 1-9 ◽  
Author(s):  
Peter Quax ◽  
Jeroen Dierckx ◽  
Bart Cornelissen ◽  
Wim Lamotte

The explosive growth of the number of applications based on networked virtual environment technology, both games and virtual communities, shows that these types of applications have become commonplace in a short period of time. However, from a research point of view, the inherent weaknesses in their architectures are quickly exposed. The Architecture for Large-Scale Virtual Interactive Communities (ALVICs) was originally developed to serve as a generic framework to deploy networked virtual environment applications on the Internet. While it has been shown to effectively scale to the numbers originally put forward, our findings have shown that, on a real-life network, such as the Internet, several drawbacks will not be overcome in the near future. It is, therefore, that we have recently started with the development of ALVIC-NG, which, while incorporating the findings from our previous research, makes several improvements on the original version, making it suitable for deployment on the Internet as it exists today.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
L. Orr ◽  
S. C. Chapman ◽  
J. W. Gjerloev ◽  
W. Guo

AbstractGeomagnetic substorms are a global magnetospheric reconfiguration, during which energy is abruptly transported to the ionosphere. Central to this are the auroral electrojets, large-scale ionospheric currents that are part of a larger three-dimensional system, the substorm current wedge. Many, often conflicting, magnetospheric reconfiguration scenarios have been proposed to describe the substorm current wedge evolution and structure. SuperMAG is a worldwide collaboration providing easy access to ground based magnetometer data. Here we show application of techniques from network science to analyze data from 137 SuperMAG ground-based magnetometers. We calculate a time-varying directed network and perform community detection on the network, identifying locally dense groups of connections. Analysis of 41 substorms exhibit robust structural change from many small, uncorrelated current systems before substorm onset, to a large spatially-extended coherent system, approximately 10 minutes after onset. We interpret this as strong indication that the auroral electrojet system during substorm expansions is inherently a large-scale phenomenon and is not solely due to many meso-scale wedgelets.


Sign in / Sign up

Export Citation Format

Share Document