A Framework on Data Mining on Uncertain Data with Related Research Issues in Service Industry

Data Mining ◽  
2013 ◽  
pp. 515-529
Author(s):  
Edward Hung

There has been a large amount of research work done on mining on relational databases that store data in exact values. However, in many real-life applications such as those commonly used in service industry, the raw data are usually uncertain when they are collected or produced. Sources of uncertain data include readings from sensors (such as RFID tagged in products in retail stores), classification results (e.g., identities of products or customers) of image processing using statistical classifiers, results from predictive programs used for stock market or targeted marketing as well as predictive churn model in customer relationship management. However, since traditional databases only store exact values, uncertain data are usually transformed into exact data by, for example, taking the mean value (for quantitative attributes) or by taking the value with the highest frequency or possibility. The shortcomings are obvious: (1) by approximating the uncertain source data values, the results from the mining tasks will also be approximate and may be wrong; (2) useful probabilistic information may be omitted from the results. Research on probabilistic databases began in 1980s. While there has been a great deal of work on supporting uncertainty in databases, there is increasing work on mining on such uncertain data. By classifying uncertain data into different categories, a framework is proposed to develop different probabilistic data mining techniques that can be applied directly on uncertain data in order to produce results that preserve the accuracy. In this chapter, we introduce the framework with a scheme to categorize uncertain data with different properties. We also propose a variety of definitions and approaches for different mining tasks on uncertain data with different properties. The advances in data mining application in this aspect are expected to improve the quality of services provided in various service industries.

2012 ◽  
pp. 316-330
Author(s):  
Edward Hung

There has been a large amount of research work done on mining on relational databases that store data in exact values. However, in many real-life applications such as those commonly used in service industry, the raw data are usually uncertain when they are collected or produced. Sources of uncertain data include readings from sensors (such as RFID tagged in products in retail stores), classification results (e.g., identities of products or customers) of image processing using statistical classifiers, results from predictive programs used for stock market or targeted marketing as well as predictive churn model in customer relationship management. However, since traditional databases only store exact values, uncertain data are usually transformed into exact data by, for example, taking the mean value (for quantitative attributes) or by taking the value with the highest frequency or possibility. The shortcomings are obvious: (1) by approximating the uncertain source data values, the results from the mining tasks will also be approximate and may be wrong; (2) useful probabilistic information may be omitted from the results. Research on probabilistic databases began in 1980s. While there has been a great deal of work on supporting uncertainty in databases, there is increasing work on mining on such uncertain data. By classifying uncertain data into different categories, a framework is proposed to develop different probabilistic data mining techniques that can be applied directly on uncertain data in order to produce results that preserve the accuracy. In this chapter, we introduce the framework with a scheme to categorize uncertain data with different properties. We also propose a variety of definitions and approaches for different mining tasks on uncertain data with different properties. The advances in data mining application in this aspect are expected to improve the quality of services provided in various service industries.


Author(s):  
Edward Hung

There has been a large amount of research work done on mining on relational databases that store data in exact values. However, in many real-life applications such as those commonly used in service industry, the raw data are usually uncertain when they are collected or produced. Sources of uncertain data include readings from sensors (such as RFID tagged in products in retail stores), classification results (e.g., identities of products or customers) of image processing using statistical classifiers, results from predictive programs used for stock market or targeted marketing as well as predictive churn model in customer relationship management. However, since traditional databases only store exact values, uncertain data are usually transformed into exact data by, for example, taking the mean value (for quantitative attributes) or by taking the value with the highest frequency or possibility. The shortcomings are obvious: (1) by approximating the uncertain source data values, the results from the mining tasks will also be approximate and may be wrong; (2) useful probabilistic information may be omitted from the results. Research on probabilistic databases began in 1980s. While there has been a great deal of work on supporting uncertainty in databases, there is increasing work on mining on such uncertain data. By classifying uncertain data into different categories, a framework is proposed to develop different probabilistic data mining techniques that can be applied directly on uncertain data in order to produce results that preserve the accuracy. In this chapter, we introduce the framework with a scheme to categorize uncertain data with different properties. We also propose a variety of definitions and approaches for different mining tasks on uncertain data with different properties. The advances in data mining application in this aspect are expected to improve the quality of services provided in various service industries.


Author(s):  
Richard Amoasi ◽  
Seth Tuffour Osei-Tutu ◽  
Margaret Amoasi

Marketing was previously considered as synonymous with selling of tangible products therefore much emphasis was placed on physical products to the service industry.  Many people did not find it necessary to practice marketing in the service industry, not knowing that this is the sector that needs marketing.  Kotler (1999) defined a service as any act of performance that one party can offer to another, that is essentially tangible as does not result in the ownership of anything. This research work was about Taxi Driver’s level of awareness of the role of quality customer service in the provision of transport service to passengers in New Juaben North. The research was to evaluate customers’ perception about services provided by the Taxi drivers, to evaluate Taxi drivers understanding of quality customer service and to establish the relationship between taxi drivers and the customers. .  The sample size for the study was one hundred and fifty (150).  It comprised of one hundred (100) customers and fifty (50) taxi drivers. The researchers used non-probability sampling, convenience sampling and purposive sampling.  Researchers choose this topic simply because empirical evidence proves that Taxi drivers in New Juaben South do not value passengers, they feel they are doing the passengers a favour because of taxi’s fastness as compared to the “Trotro”.   To help us carry out this research we developed two questionnaires for Taxi drivers and the passengers. The study revealed that Ghana Private Road Transport Union (GPRTU) and other Transport Associations organize training for its members to escape not from road accidents alone, but also how to handle passengers in that the passengers will intend develop positive relationships with them to improve customer relations and preferences towards taxi services than the Trotro which is in serious competition with the taxi business. Researchers concluded that providing quality service to passengers is the main marketing problem that Taxi drivers are facing and need special attention.  The study recommends that various Transport Associations in conjunction with customer groups in New Juaben South Municipality must educate Taxi drivers on the importance of the customer and why it is necessary to satisfy the customers.     Keywords: taxi, trotro, customer relationship,  quality service,  customer value.


2006 ◽  
Vol 05 (04) ◽  
pp. 611-621 ◽  
Author(s):  
ALEXANDER V. LOTOV

The paper is devoted to a visualization-based method for exploration of relational databases that contain large volumes of uncertain data. The visualization is aimed at exploration of properties of the data and selecting a small number of interesting items from the database. The method introduced here is a new development of the Reasonable Goals method, which has already been implemented on Internet in the form of the Web application server. Thus, the new method can be applied on Internet, too. It can be used for selection-aimed data mining in various fields including environmental planning, machinery design, financial planning (including credit operations), biology and medicine.


2009 ◽  
pp. 2543-2563 ◽  
Author(s):  
Narasimhaiah Gorla ◽  
Pang Wing Yan Betty

A new approach to vertical fragmentation in relational databases is proposed using association rules, a data-mining technique. Vertical fragmentation can enhance the performance of database systems by reducing the number of disk accesses needed by transactions. By adapting Apriori algorithm, a design methodology for vertical partitioning is proposed. The heuristic methodology is tested using two real-life databases for various minimum support levels and minimum confidence levels. In the smaller database, the partitioning solution obtained matched the optimal solution using exhaustive enumeration. The application of our method on the larger database resulted in the partitioning solution that has an improvement of 41.05% over unpartitioned solution and took less than a second to produce the solution. We provide future research directions on extending the procedure to distributed and object-oriented database designs.


2010 ◽  
pp. 2248-2268
Author(s):  
Narasimhaiah Gorla ◽  
Pang W.Y. Betty

A new approach to vertical fragmentation in relational databases is proposed using association rules, a data-mining technique. Vertical fragmentation can enhance the performance of database systems by reducing the number of disk accesses needed by transactions. By adapting Apriori algorithm, a design methodology for vertical partitioning is proposed. The heuristic methodology is tested using two real-life databases for various minimum support levels and minimum confidence levels. In the smaller database, the partitioning solution obtained matched the optimal solution using exhaustive enumeration. The application of our method on the larger database resulted in the partitioning solution that has an improvement of 41.05% over unpartitioned solution and took less than a second to produce the solution. We provide future research directions on extending the procedure to distributed and object-oriented database designs.


2020 ◽  
Vol 13 (5) ◽  
pp. 818-826
Author(s):  
Ranjan Kumar Panda ◽  
A. Sai Sabitha ◽  
Vikas Deep

Sustainability is defined as the practice of protecting natural resources for future use without harming the nature. Sustainable development includes the environmental, social, political, and economic issues faced by human being for existence. Water is the most vital resource for living being on this earth. The natural resources are being exploited with the increase in world population and shortfall of these resources may threaten humanity in the future. Water sustainability is a part of environmental sustainability. The water crisis is increasing gradually in many places of the world due to agricultural and industrial usage and rapid urbanization. Data mining tools and techniques provide a powerful methodology to understand water sustainability issues using rich environmental data and also helps in building models for possible optimization and reengineering. In this research work, a review on usage of supervised or unsupervised learning algorithms in water sustainability issues like water quality assessment, waste water collection system and water consumption is presented. Advanced technologies have also helped to resolve major water sustainability issues. Some major data mining optimization algorithms have been compared which are used in piped water distribution networks.


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-23
Author(s):  
Bo Liu ◽  
Haowen Zhong ◽  
Yanshan Xiao

Multi-view classification aims at designing a multi-view learning strategy to train a classifier from multi-view data, which are easily collected in practice. Most of the existing works focus on multi-view classification by assuming the multi-view data are collected with precise information. However, we always collect the uncertain multi-view data due to the collection process is corrupted with noise in real-life application. In this case, this article proposes a novel approach, called uncertain multi-view learning with support vector machine (UMV-SVM) to cope with the problem of multi-view learning with uncertain data. The method first enforces the agreement among all the views to seek complementary information of multi-view data and takes the uncertainty of the multi-view data into consideration by modeling reachability area of the noise. Then it proposes an iterative framework to solve the proposed UMV-SVM model such that we can obtain the multi-view classifier for prediction. Extensive experiments on real-life datasets have shown that the proposed UMV-SVM can achieve a better performance for uncertain multi-view classification in comparison to the state-of-the-art multi-view classification methods.


Sign in / Sign up

Export Citation Format

Share Document