The Use of Data Mining for Assessing Performance of Administrative Services

Advances in Data Mining and Database Management - Data Mining in Public and Private Sectors ◽

10.4018/978-1-60566-906-9.ch004 ◽

2010 ◽

pp. 67-82

Author(s):

Zdravko Pecar ◽

Ivan Bratko

Keyword(s):

Data Mining ◽

State Government ◽

Local Level ◽

Regression Tree ◽

Main Idea ◽

Basic Unit ◽

Learning Tools ◽

Administrative District ◽

Administrative Services ◽

Use Of Data

The aim of this research was to study the performance of 58 Slovenian administrative districts (state government offices at local level), to identify the factors that affect the performance, and how these effects interact. The main idea was to analyze the available statistical data relevant to the performance of the administrative districts with machine learning tools for data mining, and to extract from available data clear relations between various parameters of administrative districts and their performance. The authors introduced the concept of basic unit of administrative service, which enables the measurement of an administrative district’s performance. The main data mining tool used in this study was the method of regression tree induction. This method can handle numeric and discrete data, and has the benefit of providing clear insight into the relations between the parameters in the system, thereby facilitating the interpretation of the results of data mining. The authors investigated various relations between the parameters in their domain, for example, how the performance of an administrative district depends on the trends in the number of applications, employees’ level of professional qualification, etc. In the chapter, they report on a variety of (occasionally surprising) findings extracted from the data, and discuss how these findings can be used to improve decisions in managing administrative districts.

Download Full-text

Machine Learning and data mining tools applied for databases of low number of records

Advanced Engineering Research ◽

10.23947/2687-1653-2021-21-4-346-363 ◽

2022 ◽

Vol 21 (4) ◽

pp. 346-363

Author(s):

Hubert Anysz

Keyword(s):

Machine Learning ◽

Data Mining ◽

Computational Methods ◽

Large Datasets ◽

Learning Tools ◽

Data Preparation ◽

Preparation Methods ◽

Use Of Data ◽

Small Set ◽

Mining Tools

The use of data mining and machine learning tools is becoming increasingly common. Their usefulness is mainly noticeable in the case of large datasets, when information to be found or new relationships are extracted from information noise. The development of these tools means that datasets with much fewer records are being explored, usually associated with specific phenomena. This specificity most often causes the impossibility of increasing the number of cases, and that can facilitate the search for dependences in the phenomena under study. The paper discusses the features of applying the selected tools to a small set of data. Attempts have been made to present methods of data preparation, methods for calculating the performance of tools, taking into account the specifics of databases with a small number of records. The techniques selected by the author are proposed, which helped to break the deadlock in calculations, i.e., to get results much worse than expected. The need to apply methods to improve the accuracy of forecasts and the accuracy of classification was caused by a small amount of analysed data. This paper is not a review of popular methods of machine learning and data mining; nevertheless, the collected and presented material will help the reader to shorten the path to obtaining satisfactory results when using the described computational methods

Download Full-text

A Resamping Approach for Customer Gender Prediction Based on E-Commerce Data

Journal of Science and Technology Issue on Information and Communications Technology ◽

10.31130/jst.2017.40 ◽

2017 ◽

Vol 3 (1) ◽

pp. 76 ◽

Cited By ~ 1

Author(s):

Duong Tran Duc ◽

Pham Bao Son ◽

Tan Hanh ◽

Le Truong Thien

Keyword(s):

Data Mining ◽

Bayesian Network ◽

Web Applications ◽

Service Providers ◽

Main Idea ◽

Classification Methods ◽

Cost Sensitive Learning ◽

Network Method ◽

Demographic Attributes ◽

Privacy Issues

Demographic attributes of customers such as gender, age, etc. provide the important information for e-commerce service providers in marketing, personalization of web applications. However, the online customers often do not provide this kind of information due to the privacy issues and other reasons. In this paper, we proposed a method for predicting the gender of customers based on their catalog viewing data on e-commerce systems, such as the date and time of access, the products viewed, etc. The main idea is that we extract the features from catalog viewing information and employ the classification methods to predict the gender of the viewers. The experiments were conducted on the datasets provided by the PAKDD’15 Data Mining Competition and obtained the promising results with a simple feature design, especially with the Bayesian Network method along with other supporting techniques such as resampling, cost-sensitive learning, boosting etc.

Download Full-text

Data Mining Usage in Corporate Information Security: Intrusion Detection Applications

Business Systems Research Journal ◽

10.1515/bsrj-2017-0005 ◽

2017 ◽

Vol 8 (1) ◽

pp. 51-59 ◽

Cited By ~ 2

Author(s):

Masoud Al Quhtani

Keyword(s):

Data Mining ◽

Information Security ◽

Intrusion Detection ◽

Mining System ◽

Systematic Analysis ◽

New Methods ◽

Use Of Data ◽

Efficient Data ◽

Abuse Risk ◽

Corporate Information

AbstractBackground: The globalization era has brought with it the development of high technology, and therefore new methods of preserving and storing data. New data storing techniques ensure data are stored for longer periods of time, more efficiently and with a higher quality, but also with a higher data abuse risk. Objective: The goal of the paper is to provide a review of the data mining applications for the purpose of corporate information security, and intrusion detection in particular. Methods/approach: The review was conducted using the systematic analysis of the previously published papers on the usage of data mining in the field of corporate information security. Results: This paper demonstrates that the use of data mining applications is extremely useful and has a great importance for establishing corporate information security. Data mining applications are directly related to issues of intrusion detection and privacy protection. Conclusions: The most important fact that can be specified based on this study is that corporations can establish a sustainable and efficient data mining system that will ensure privacy and successful protection against unwanted intrusions.

Download Full-text

Fault Diagnosis of Automobile ECUs with Data Mining Technologies

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.40-41.156 ◽

2010 ◽

Vol 40-41 ◽

pp. 156-161 ◽

Cited By ~ 1

Author(s):

Yang Li ◽

Yan Qiang Li ◽

Zhi Xue Wang

Keyword(s):

Data Mining ◽

Fault Diagnosis ◽

Decision Tree ◽

Data Stream ◽

Electronic Control Unit ◽

Rapid Development ◽

Reliability And Validity ◽

Control Unit ◽

Use Of Data ◽

The Cost

With the rapid development of automotive ECUs(Electronic Control Unit), the fault diagnosis becomes increasingly complicated. And the link between fault and symptom becomes less obvious. In order to improve the maintenance quality and efficiency, the paper proposes a fault diagnosis approach based on data mining technologies. By making full use of data stream, we firstly extract fault symptom vectors by processing data stream, and then establish a diagnosis decision tree through the ID3 decision tree algorithm, and finally store the link rules between faults and the related symptoms into historical fault database as a foundation for the fault diagnosis. The database provides the basis of trend judgments for a future fault. To verify this approach, an example of diagnosing faults of entertainment ECU is showed. The test result testifies the reliability and validity of this diagnostic method and reduces the cost of ECU diagnosis.

Download Full-text

Predicting effects of selected impregnation processes on the observed bending strength of wood, with use of data mining models

BioResources ◽

10.15376/biores.16.3.4891-4904 ◽

2021 ◽

Vol 16 (3) ◽

pp. 4891-4904

Author(s):

Selahattin Bardak ◽

Timucin Bardak ◽

Hüseyin Peker ◽

Eser Sözen ◽

Yildiz Çabuk

Keyword(s):

Data Mining ◽

Goodness Of Fit ◽

Bending Strength ◽

Wood Preservation ◽

Ambient Conditions ◽

Data Mining Algorithms ◽

Use Of Data ◽

The Cost ◽

Mining Algorithms ◽

The Relationship

Wood materials have been used in many products such as furniture, stairs, windows, and doors for centuries. There are differences in methods used to adapt wood to ambient conditions. Impregnation is a widely used method of wood preservation. In terms of efficiency, it is critical to optimize the parameters for impregnation. Data mining techniques reduce most of the cost and operational challenges with accurate prediction in the wood industry. In this study, three data-mining algorithms were applied to predict bending strength in impregnated wood materials (Pinus sylvestris L. and Millettia laurentii). Models were created from real experimental data to examine the relationship between bending strength, diffusion time, vacuum duration, and wood type, based on decision trees (DT), random forest (RF), and Gaussian process (GP) algorithms. The highest bending strength was achieved with wenge (Millettia laurentii) wood in 10 bar vacuum and the diffusion condition during 25 min. The results showed that all algorithms are suitable for predicting bending strength. The goodness of fit for the testing phase was determined as 0.994, 0.986, and 0.989 in the DT, RF, and GP algorithms, respectively. Moreover, the importance of attributes was determined in the algorithms.

Download Full-text

Use of Data Mining in the prediction of risk factors of Type 2 diabetes mellitus in Gulf countries

Mathematical Modeling and Computing ◽

10.23939/mmc2021.04.638 ◽

2021 ◽

Vol 8 (4) ◽

pp. 638-645

Author(s):

W. Boutayeb ◽

◽

M. Badaoui ◽

H. Al Ali ◽

A. Boutayeb ◽

...

Keyword(s):

Diabetes Mellitus ◽

Risk Factors ◽

Data Mining ◽

Type 2 Diabetes ◽

Type 2 Diabetes Mellitus ◽

Gulf Coast ◽

Use Of Data ◽

Gulf Countries ◽

Unhealthy Diet

Prevalence of diabetes in Gulf countries is knowing a significant increase because of various risk factors, such as: obesity, unhealthy diet, physical inactivity and smoking. The aim of our proposed study is to use Data Mining and Data Analysis tools in order to determine different risk factors of the development of Type~2 diabetes mellitus (T2DM) in Gulf countries, from Gulf COAST dataset.

Download Full-text

Spatial Modelling of Gully Erosion Using GIS and R Programing: A Comparison among Three Data Mining Algorithms

Applied Sciences ◽

10.3390/app8081369 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1369 ◽

Cited By ~ 52

Author(s):

Alireza Arabameri ◽

Biswajeet Pradhan ◽

Hamid Reza Pourghasemi ◽

Khalil Rezaei ◽

Norman Kerle

Keyword(s):

Data Mining ◽

Spatial Relationship ◽

Area Under The Curve ◽

Regression Tree ◽

Drainage Density ◽

Gully Erosion ◽

Slope Aspect ◽

Topographic Wetness Index ◽

Boosted Regression Tree ◽

Area Index

Gully erosion triggers land degradation and restricts the use of land. This study assesses the spatial relationship between gully erosion (GE) and geo-environmental variables (GEVs) using Weights-of-Evidence (WoE) Bayes theory, and then applies three data mining methods—Random Forest (RF), boosted regression tree (BRT), and multivariate adaptive regression spline (MARS)—for gully erosion susceptibility mapping (GESM) in the Shahroud watershed, Iran. Gully locations were identified by extensive field surveys, and a total of 172 GE locations were mapped. Twelve gully-related GEVs: Elevation, slope degree, slope aspect, plan curvature, convergence index, topographic wetness index (TWI), lithology, land use/land cover (LU/LC), distance from rivers, distance from roads, drainage density, and NDVI were selected to model GE. The results of variables importance by RF and BRT models indicated that distance from road, elevation, and lithology had the highest effect on GE occurrence. The area under the curve (AUC) and seed cell area index (SCAI) methods were used to validate the three GE maps. The results showed that AUC for the three models varies from 0.911 to 0.927, whereas the RF model had a prediction accuracy of 0.927 as per SCAI values, when compared to the other models. The findings will be of help for planning and developing the studied region.

Download Full-text

Optimising horizontal and vertical partnership connections: bringing partnerships together to create a network advantage

Australian Journal of Primary Health ◽

10.1071/py09002 ◽

2009 ◽

Vol 15 (3) ◽

pp. 196 ◽

Cited By ~ 1

Author(s):

Carolyn Wallace

Keyword(s):

State Government ◽

Local Development ◽

Local Level ◽

National Policy ◽

State Level ◽

Community Partnership ◽

Transfer Of Knowledge ◽

Partnership Network ◽

Local Partnerships ◽

Network Advantage

Partnerships bring actors together to make horizontal connections between organisations. This has proven to be an effective model at the local level in Ireland. This paper explores possibilities for local partnerships to come together through a network to make vertical connections to national policy processes. It is written as a practice and innovation paper by the national coordinator of the Community Partnership Network in Ireland. A review of current practice and expectations of the Community Partnership Network indicates there has been greater success in providing support to members compared with impacts from strong vertical connections. This experience is common for a range of local actors in the community development sector in Ireland who find that there are insufficient meaningful connections between local and national institutions. This limits the opportunity for transfer of knowledge from the local level to inform national policy. Going forward, the notion of network advantage is explored as a means to make the necessary vertical connections. It is proposed that the outcomes from a network should cover the dimensions of: joint value creation, mutual capacity development and collective engagement with decision makers. There is real opportunity to apply this in Ireland as the three networks representing local development are about to merge into a single representative body for what are now local integrated development partnerships. Thinking about network advantage also provides possible application for creating stronger vertical linkages between local partnerships in Victoria, Australia and bodies at the state level, including the Victorian State Government.

Download Full-text

The use of data mining to classify Carménère and Merlot wines from Chile

Expert Systems ◽

10.1111/exsy.12361 ◽

2018 ◽

Vol 36 (2) ◽

pp. e12361

Author(s):

Nattane Luíza Costa ◽

Laura Andrea García Llobodanin ◽

Inar Alves Castro ◽

Rommel Barbosa

Keyword(s):

Data Mining ◽

Use Of Data

Download Full-text

Use of data mining and spectral profiles to differentiate condition after harvest of coffee plants

Engenharia Agrícola ◽

10.1590/s0100-69162012000100019 ◽

2012 ◽

Vol 32 (1) ◽

pp. 184-196 ◽

Cited By ~ 2

Author(s):

Rubens A. C. Lamparelli ◽

Jerry A. Johann ◽

Éder R. dos Santos ◽

Julio C. D. M. Esquerdo ◽

Jansle V. Rocha

Keyword(s):

Data Mining ◽

Expectation Maximization ◽

Spectral Behavior ◽

Use Of Data ◽

Spectral Profiles ◽

Coffee Plants ◽

Using Data ◽

Hyperion Image ◽

Noise Fraction ◽

Behavior Profiles

This study aimed at identifying different conditions of coffee plants after harvesting period, using data mining and spectral behavior profiles from Hyperion/EO1 sensor. The Hyperion image, with spatial resolution of 30 m, was acquired in August 28th, 2008, at the end of the coffee harvest season in the studied area. For pre-processing imaging, atmospheric and signal/noise effect corrections were carried out using Flaash and MNF (Minimum Noise Fraction Transform) algorithms, respectively. Spectral behavior profiles (38) of different coffee varieties were generated from 150 Hyperion bands. The spectral behavior profiles were analyzed by Expectation-Maximization (EM) algorithm considering 2; 3; 4 and 5 clusters. T-test with 5% of significance was used to verify the similarity among the wavelength cluster means. The results demonstrated that it is possible to separate five different clusters, which were comprised by different coffee crop conditions making possible to improve future intervention actions.

Download Full-text