A Model for Processing Skyline Queries in Crowd-sourced Databases

Nowadays, in most of the modern database applications, lots of critical queries and tasks cannot be completely addressed by machine. Crowd-sourcing database has become a new paradigm for harness human cognitive abilities to process these computer hard tasks. In particular, those problems that are difficult for machines but easier for humans can be solved better than ever, such as entity resolution, fuzzy matching for predicates and joins, and image recognition. Additionally, crowd-sourcing database allows performing database operators on incomplete data as human workers can be involved to provide estimated values during run-time. Skyline queries which received formidable attention by database community in the last decade, and exploited in a variety of applications such as multi-criteria decision making and decision support systems. Various works have been accomplished address the issues of skyline query in crowd-sourcing database. This includes a database with full and partial complete data. However, we argue that processing skyline queries with partial incomplete data in crowd-sourcing database has not received an appropriate attention. Therefore, an efficient approach processing skyline queries with partial incomplete data in crowd-sourcing database is needed. This paper attempts to present an efficient model tackling the issue of processing skyline queries in incomplete crowd-sourcing database. The main idea of the proposed model is exploiting the available data in the database to estimate the missing values. Besides, the model tries to explore the crowd-sourced database in order to provide more accurate results, when local database failed to provide precise values. In order to ensure high quality result could be obtained, certain factors should be considered for worker selection to carry out the task such as workers quality and the monetary cost. Other critical factors should be considered such as time latency to generate the results.

Download Full-text

Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing

Conceptual Modeling - Lecture Notes in Computer Science ◽

10.1007/978-3-642-41924-9_25 ◽

2013 ◽

pp. 298-312 ◽

Cited By ~ 11

Author(s):

Christoph Lofi ◽

Kinda El Maarry ◽

Wolf-Tilo Balke

Keyword(s):

Incomplete Data ◽

Crowd Sourcing ◽

Skyline Queries ◽

Error Models ◽

Data Error

Download Full-text

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽

10.3390/math9070786 ◽

2021 ◽

Vol 9 (7) ◽

pp. 786

Author(s):

Yenny Villuendas-Rey ◽

Eley Barroso-Cubas ◽

Oscar Camacho-Nieto ◽

Cornelio Yáñez-Márquez

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Bat Algorithm ◽

Hybrid Features ◽

Bee Colony ◽

Learning Tasks ◽

Clustering Data

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

Symmetry Breaking and Training from Incomplete Data with Radial Basis Boltzmann Machines

International Journal of Neural Systems ◽

10.1142/s0129065797000318 ◽

1997 ◽

Vol 08 (03) ◽

pp. 301-315 ◽

Cited By ~ 8

Author(s):

Marcel J. Nijman ◽

Hilbert J. Kappen

Keyword(s):

Symmetry Breaking ◽

Incomplete Data ◽

Missing Values ◽

Nearest Neighbor ◽

Boltzmann Machine ◽

K Nearest Neighbor ◽

Data Set ◽

Input Space ◽

Learning Rules ◽

Radial Basis

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.

Download Full-text

SCSA: Evaluating skyline queries in incomplete data

Applied Intelligence ◽

10.1007/s10489-018-1356-2 ◽

2018 ◽

Vol 49 (5) ◽

pp. 1636-1657 ◽

Cited By ~ 2

Author(s):

Yonis Gulzar ◽

Ali A. Alwan ◽

Radhwan Mohamed Abdullah ◽

Qin Xin ◽

Marwa B. Swidan

Keyword(s):

Incomplete Data ◽

Skyline Queries

Download Full-text

Prediction of Sudden Health Crises Owing to Congestive Heart Failure with Deep Learning Models

Revue d intelligence artificielle ◽

10.18280/ria.350108 ◽

2021 ◽

Vol 35 (1) ◽

pp. 71-76

Author(s):

Shaik Shabbeer ◽

Edara Srinivasa Reddy

Keyword(s):

Heart Failure ◽

Congestive Heart Failure ◽

Missing Values ◽

Medical Knowledge ◽

Data Generation ◽

Clinical Disorder ◽

Proposed Model ◽

Service Efficiency ◽

Mlp Model ◽

Mimic Iii

Artificial Intelligence (AI) has its roots in every area in the present scenario. Healthcare is one of the markets in which AI has greatly grown in recent years. The tremendous increase in health data generation and the substantial evolution of the robust data analysis tools have contributed to AI improvement in health care and research, leading to increased service efficiency. Health reporting is stored as Electronic Health Records (EHR), providing information on the patients sought temporarily. EHR data have different issues, such as heterogeneity, missing values, distortion, noise, time, etc. This study reflects the irregularity of appointment that refers to the irregular timing of the operations (patient visits). Congestive heart failure (CHF) is a grave clinical disorder caused by an insufficient blood supply in the bloodstream owing to a heart muscle dysfunction. Most people suffer from CHF which result in death or immediate recognition. A multi-layer perceptron (MLP) model was used to treat visit stage abnormalities. The studies on the Medical Knowledge Mart for Intensive Care-III (MIMIC-III) dataset and the findings obtained indicate that the lack of a visit stage affects the estimation of the clinical outcome. It has been demonstrated that the readmission and reduction of the prediction model for mortality conditions is beneficial. Compared with baseline models, the proposed model is successful.

Download Full-text

Streamflow estimation at partially gaged sites using multiple dependence conditions via vine copulas

10.5194/hess-2020-541 ◽

2020 ◽

Author(s):

Kuk-Hyun Ahn

Keyword(s):

Missing Values ◽

The Other ◽

Dependence Structure ◽

Proposed Model ◽

Pee Dee ◽

Streamflow Estimation ◽

Vine Copula ◽

Bivariate Copula ◽

Pee Dee River ◽

Vine Copulas

Abstract. Reliable estimates of missing streamflow values are relevant for water resources planning and management. This study proposes a multiple dependence condition model via vine copulas for the purpose of estimating streamflow at partially gaged sites. The proposed model is attractive in modeling the high dimensional joint distribution by building a hierarchy of conditional bivariate copulas when provided a complex streamflow gage network. The usefulness of the proposed model is firstly highlighted using a synthetic streamflow scenario. In this analysis, the bivariate copula model and a variant of the vine copulas are also employed to show the ability of the multiple dependence structure adopted in the proposed model. Furthermore, the evaluations are extended to a case study of 54 gages located within the Yadkin-Pee Dee River Basin, the eastern U. S. Both results inform that the proposed model is better suited for infilling missing values. After that, the performance of the vine copula is compared with six other infilling approaches to confirm its applicability. Results demonstrate that the proposed model produces more reliable streamflow estimates than the other approaches. In particular, when applied to partially gaged sites with sufficient available data, the proposed model clearly outperforms the other models. Even though the model is illustrated by a specific case, it can be extended to other regions with diverse hydro-climatological variables for the objective of infilling.

Download Full-text

Evolutionary Machine Learning for Classification with Incomplete Data

10.26686/wgtn.17072123 ◽

2021 ◽

Author(s):

◽

Cao Truong Tran

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Genetic Programming ◽

Incomplete Data ◽

Missing Values ◽

Machine Learning Techniques ◽

Feature Construction ◽

Classification Algorithms ◽

Learning Techniques ◽

Effectiveness And Efficiency

<p>Classification is a major task in machine learning and data mining. Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors. Existing most researchers working on classification with incomplete data focused on improving the effectiveness, but did not adequately address the issue of the efficiency of applying the classifiers to classify unseen instances, which is much more important than the act of creating classifiers. A common approach to classification with incomplete data is to use imputation methods to replace missing values with plausible values before building classifiers and classifying unseen instances. This approach provides complete data which can be then used by any classification algorithm, but sophisticated imputation methods are usually computationally intensive, especially for the application process of classification. Another approach to classification with incomplete data is to build a classifier that can directly work with missing values. This approach does not require time for estimating missing values, but it often generates inaccurate and complex classifiers when faced with numerous missing values. A recent approach to classification with incomplete data which also avoids estimating missing values is to build a set of classifiers which then is used to select applicable classifiers for classifying unseen instances. However, this approach is also often inaccurate and takes a long time to find applicable classifiers when faced with numerous missing values. The overall goal of the thesis is to simultaneously improve the effectiveness and efficiency of classification with incomplete data by using evolutionary machine learning techniques for feature selection, clustering, ensemble learning, feature construction and constructing classifiers. The thesis develops approaches for improving imputation for classification with incomplete data by integrating clustering and feature selection with imputation. The approaches improve both the effectiveness and the efficiency of using imputation for classification with incomplete data. The thesis develops wrapper-based feature selection methods to improve input space for classification algorithms that are able to work directly with incomplete data. The methods not only improve the classification accuracy, but also reduce the complexity of classifiers able to work directly with incomplete data. The thesis develops a feature construction method to improve input space for classification algorithms with incomplete data by proposing interval genetic programming-genetic programming with a set of interval functions. The method improves the classification accuracy and reduces the complexity of classifiers. The thesis develops an ensemble approach to classification with incomplete data by integrating imputation, feature selection, and ensemble learning. The results show that the approach is more accurate, and faster than previous common methods for classification with incomplete data. The thesis develops interval genetic programming to directly evolve classifiers for incomplete data. The results show that classifiers generated by interval genetic programming can be more effective and efficient than classifiers generated the combination of imputation and traditional genetic programming. Interval genetic programming is also more effective than common classification algorithms able to work directly with incomplete data. In summary, the thesis develops a range of approaches for simultaneously improving the effectiveness and efficiency of classification with incomplete data by using a range of evolutionary machine learning techniques.</p>

Download Full-text

Integration Development of Urban Agglomeration in Central Liaoning, China, by Trajectory Gravity Model

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100698 ◽

2021 ◽

Vol 10 (10) ◽

pp. 698

Author(s):

Ruren Li ◽

Shoujia Li ◽

Zhiwei Xie

Keyword(s):

Gravity Model ◽

Gravitational Force ◽

Main Idea ◽

Correlation Coefficients ◽

Urban Agglomeration ◽

Economic Research ◽

The Core ◽

Proposed Model ◽

Two Indices ◽

Spatial Trajectory

Integration development of urban agglomeration is important for regional economic research and management. In this paper, a method was proposed to study the integration development of urban agglomeration by trajectory gravity model. It can analyze the gravitational strength of the core city to other cities and characterize the spatial trajectory of its gravitational direction, expansion, etc. quantitatively. The main idea is to do the fitting analysis between the urban axes and the gravitational lines. The correlation coefficients retrieved from the fitting analysis can reflect the correlation of two indices. For the different cities in the same year, a higher value means a stronger relationship. There is a clear gravitational force between the cities when the value above 0.75. For the most cities in different years, the gravitational force between the core city with itself is increasing by years. At the same time, the direction of growth of the urban axes tends to increase in the direction of the gravitational force between cities. There is a clear tendency for the trajectories of the cities to move closer together. The proposed model was applied to the integration development of China Liaoning central urban agglomeration from 2008 to 2016. The results show that cities are constantly attracted to each other through urban gravity.

Download Full-text