A methodology for classification and validation of customer datasets

Purpose The purpose of this paper is to develop a method to classify customers according to their value to an organization. This process is complicated by the disconnected nature of a customer record in an industry such as insurance. With large numbers of customers, it is of significant benefit to managers and company analysts to create a broad classification for all customers. Design/methodology/approach The initial step is to construct a full customer history and extract a feature set suited to customer lifetime value calculations. This feature set must then be validated to determine its ability to classify customers in broad terms. Findings The method successfully classifies customer data sets with an accuracy of 90%. This study also discovered that by examining the average value for key variables in each customer segment, an algorithm can label the group of clusters with an accuracy of 99.3%. Research limitations/implications Working with a real-world data set, it is always the case that some features are unavailable as they were never recorded. This can impair the algorithm’s ability to make good classifications in all cases. Originality/value This study believes that this research makes a novel contribution as it automates the classification of customers but in addition, the approach provides a high-level classification result (recall and precision identify the best cluster configuration) and detailed insights into how each customer is classified by two validation metrics. This supports managers in terms of market spend on new and existing customers.

Download Full-text

Bridging the semantic gap with human perception based features for scene categorization

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-09-2016-0035 ◽

2017 ◽

Vol 10 (3) ◽

pp. 387-406 ◽

Cited By ~ 4

Author(s):

Padmavati Shrivastava ◽

K.K. Bhoyar ◽

A.S. Zadgaonkar

Keyword(s):

Classification System ◽

Human Perception ◽

Semantic Gap ◽

Svm Classifier ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Low Level ◽

Feature Evaluation ◽

High Level

Purpose The purpose of this paper is to build a classification system which mimics the perceptual ability of human vision, in gathering knowledge about the structure, content and the surrounding environment of a real-world natural scene, at a quick glance accurately. This paper proposes a set of novel features to determine the gist of a given scene based on dominant color, dominant direction, openness and roughness features. Design/methodology/approach The classification system is designed at two different levels. At the first level, a set of low level features are extracted for each semantic feature. At the second level the extracted features are subjected to the process of feature evaluation, based on inter-class and intra-class distances. The most discriminating features are retained and used for training the support vector machine (SVM) classifier for two different data sets. Findings Accuracy of the proposed system has been evaluated on two data sets: the well-known Oliva-Torralba data set and the customized image data set comprising of high-resolution images of natural landscapes. The experimentation on these two data sets with the proposed novel feature set and SVM classifier has provided 92.68 percent average classification accuracy, using ten-fold cross validation approach. The set of proposed features efficiently represent visual information and are therefore capable of narrowing the semantic gap between low-level image representation and high-level human perception. Originality/value The method presented in this paper represents a new approach for extracting low-level features of reduced dimensionality that is able to model human perception for the task of scene classification. The methods of mapping primitive features to high-level features are intuitive to the user and are capable of reducing the semantic gap. The proposed feature evaluation technique is general and can be applied across any domain.

Download Full-text

AMS-Net: An Attention-Based Multi-Scale Network for Classification of 3D Terracotta Warrior Fragments

Remote Sensing ◽

10.3390/rs13183713 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3713

Author(s):

Jie Liu ◽

Xin Cao ◽

Pingchuan Zhang ◽

Xueli Xu ◽

Yangyang Liu ◽

...

Keyword(s):

Real World ◽

Data Sets ◽

Semantic Features ◽

Real World Data ◽

Global Features ◽

Data Set ◽

Multi Scale ◽

Public Data ◽

High Level

As an essential step in the restoration of Terracotta Warriors, the results of fragments classification will directly affect the performance of fragments matching and splicing. However, most of the existing methods are based on traditional technology and have low accuracy in classification. A practical and effective classification method for fragments is an urgent need. In this case, an attention-based multi-scale neural network named AMS-Net is proposed to extract significant geometric and semantic features. AMS-Net is a hierarchical structure consisting of a multi-scale set abstraction block (MS-BLOCK) and a fully connected (FC) layer. MS-BLOCK consists of a local-global layer (LGLayer) and an improved multi-layer perceptron (IMLP). With a multi-scale strategy, LGLayer can parallel extract the local and global features from different scales. IMLP can concatenate the high-level and low-level features for classification tasks. Extensive experiments on the public data set (ModelNet40/10) and the real-world Terracotta Warrior fragments data set are conducted. The accuracy results with normal can achieve 93.52% and 96.22%, respectively. For real-world data sets, the accuracy is best among the existing methods. The robustness and effectiveness of the performance on the task of 3D point cloud classification are also investigated. It proves that the proposed end-to-end learning network is more effective and suitable for the classification of the Terracotta Warrior fragments.

Download Full-text

Auto-sharing parameters for transfer learning based on multi-objective optimization

Integrated Computer-Aided Engineering ◽

10.3233/ica-210655 ◽

2021 ◽

pp. 1-13

Author(s):

Hailin Liu ◽

Fangqing Gu ◽

Zixian Lin

Keyword(s):

Transfer Learning ◽

Optimization Problem ◽

Data Sets ◽

Multi Objective Optimization ◽

Particle Swarm Optimizer ◽

Real World Data ◽

Data Set ◽

Target Task ◽

Main Research ◽

Multi Objective

Transfer learning methods exploit similarities between different datasets to improve the performance of the target task by transferring knowledge from source tasks to the target task. “What to transfer” is a main research issue in transfer learning. The existing transfer learning method generally needs to acquire the shared parameters by integrating human knowledge. However, in many real applications, an understanding of which parameters can be shared is unknown beforehand. Transfer learning model is essentially a special multi-objective optimization problem. Consequently, this paper proposes a novel auto-sharing parameter technique for transfer learning based on multi-objective optimization and solves the optimization problem by using a multi-swarm particle swarm optimizer. Each task objective is simultaneously optimized by a sub-swarm. The current best particle from the sub-swarm of the target task is used to guide the search of particles of the source tasks and vice versa. The target task and source task are jointly solved by sharing the information of the best particle, which works as an inductive bias. Experiments are carried out to evaluate the proposed algorithm on several synthetic data sets and two real-world data sets of a school data set and a landmine data set, which show that the proposed algorithm is effective.

Download Full-text

Improving recommender systems’ performance on cold-start users and controversial items by a new similarity model

International Journal of Web Information Systems ◽

10.1108/ijwis-07-2015-0024 ◽

2016 ◽

Vol 12 (2) ◽

pp. 126-149 ◽

Cited By ~ 4

Author(s):

Masoud Mansoury ◽

Mehdi Shajari

Keyword(s):

Real World ◽

Design Methodology ◽

Cold Start ◽

Selection Function ◽

Data Sets ◽

Real World Data ◽

Content Type ◽

User Similarity ◽

Active User ◽

Similarity Model

Purpose This paper aims to improve the recommendations performance for cold-start users and controversial items. Collaborative filtering (CF) generates recommendations on the basis of similarity between users. It uses the opinions of similar users to generate the recommendation for an active user. As a similarity model or a neighbor selection function is the key element for effectiveness of CF, many variations of CF are proposed. However, these methods are not very effective, especially for users who provide few ratings (i.e. cold-start users). Design/methodology/approach A new user similarity model is proposed that focuses on improving recommendations performance for cold-start users and controversial items. To show the validity of the authors’ similarity model, they conducted some experiments and showed the effectiveness of this model in calculating similarity values between users even when only few ratings are available. In addition, the authors applied their user similarity model to a recommender system and analyzed its results. Findings Experiments on two real-world data sets are implemented and compared with some other CF techniques. The results show that the authors’ approach outperforms previous CF techniques in coverage metric while preserves accuracy for cold-start users and controversial items. Originality/value In the proposed approach, the conditions in which CF is unable to generate accurate recommendations are addressed. These conditions affect CF performance adversely, especially in the cold-start users’ condition. The authors show that their similarity model overcomes CF weaknesses effectively and improve its performance even in the cold users’ condition.

Download Full-text

A systematic review of machine learning-based missing value imputation techniques

Data Technologies and Applications ◽

10.1108/dta-12-2020-0298 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Tressy Thomas ◽

Enayat Rajabi

Keyword(s):

Machine Learning ◽

Selection Process ◽

Evaluation Metrics ◽

Correct Prediction ◽

Data Sets ◽

Data Set ◽

Missing Value ◽

Content Type ◽

Missing Value Imputation ◽

Literature Reviews

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

Download Full-text

Fast and accurate detection of surface defect based on improved YOLOv4

Assembly Automation ◽

10.1108/aa-04-2021-0044 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Jiawei Lian ◽

Junhong He ◽

Yun Niu ◽

Tianze Wang

Keyword(s):

Feature Extraction ◽

Real Time ◽

Surface Defect ◽

Steel Ingot ◽

Industrial Applications ◽

Data Sets ◽

Data Set ◽

Processing Technologies ◽

Content Type ◽

Public Data

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.

Download Full-text

Rapid video assessment for monitoring testing facility fraud

International Journal of Quality & Reliability Management ◽

10.1108/ijqrm-01-2017-0022 ◽

2018 ◽

Vol 35 (8) ◽

pp. 1508-1518

Author(s):

Rosembergue Pereira Souza ◽

Luiz Fernando Rust da Costa Carmo ◽

Luci Pirmez

Keyword(s):

Processing Time ◽

Threshold Value ◽

Real World Data ◽

Data Set ◽

Content Type ◽

Video Assessment ◽

Technical Requirements ◽

Testing Facility ◽

Rapid Processing ◽

Temporal Differencing

Purpose The purpose of this paper is to present a procedure for finding unusual patterns in accredited tests using a rapid processing method for analyzing video records. The procedure uses the temporal differencing technique for object tracking and considers only frames not identified as statistically redundant. Design/methodology/approach An accreditation organization is responsible for accrediting facilities to undertake testing and calibration activities. Periodically, such organizations evaluate accredited testing facilities. These evaluations could use video records and photographs of the tests performed by the facility to judge their conformity to technical requirements. To validate the proposed procedure, a real-world data set with video records from accredited testing facilities in the field of vehicle safety in Brazil was used. The processing time of this proposed procedure was compared with the time needed to process the video records in a traditional fashion. Findings With an appropriate threshold value, the proposed procedure could successfully identify video records of fraudulent services. Processing time was faster than when a traditional method was employed. Originality/value Manually evaluating video records is time consuming and tedious. This paper proposes a procedure to rapidly find unusual patterns in videos of accredited tests with a minimum of manual effort.

Download Full-text

EFFECTIVENESS OF GUIDANCE AND COUNSELING SERVICES BY USING THE APPLICATIONS INSTRUMENTATION AND ACTIVITY DATA SET IN SMA NEGERI 1 METRO LESSON 2009/2010

GUIDENA Jurnal Ilmu Pendidikan Psikologi Bimbingan dan Konseling ◽

10.24127/gdn.v4i1.366 ◽

2014 ◽

Vol 4 (1) ◽

pp. 35

Author(s):

Agus Wibowo

Keyword(s):

Qualitative Research ◽

Counseling Services ◽

Guidance And Counseling ◽

Data Sets ◽

Research Subjects ◽

Activity Data ◽

Data Set ◽

Research Level ◽

Data Collection Technique ◽

High Level

Abstract: Implementation of guidance and counseling services should be based on the needs and problems of students, so the effectiveness of the service will be achieved to the fullest. But the reality is a lot of implementation of guidance and counseling services in schools, do not notice it. So that the completion of the problems experienced by students sama.Berangkat always use the services of this, the research level of effectiveness of guidance and counseling that implementation has been using the application activity instrumentation and data sets as the basis for an implementation of the service. The method used is a qualitative research subjects that teachers BK and Students at SMA Negeri 1 Metro. Data collection technique through interview, observation and documentation. Research results show that by utilizing activity instrumentation applications and data sets, the counseling services have a high level of effectiveness. In carrying out the service, BK teachers can identify problems and needs experienced by students, so that the efforts of the assistance provided to be more precise, and problem students can terentaskan optimally.Keyword: Guidance and Counseling, Instrumentation Applications, Data Association

Download Full-text

An application of data envelopment analysis for Korean banks with negative data

Benchmarking An International Journal ◽

10.1108/bij-02-2016-0023 ◽

2017 ◽

Vol 24 (4) ◽

pp. 1052-1064 ◽

Cited By ~ 9

Author(s):

Yong Joo Lee ◽

Seong-Jong Joo ◽

Hong Gyun Park

Keyword(s):

Data Envelopment Analysis ◽

Data Sets ◽

Data Envelopment ◽

Negative Data ◽

Ownership Type ◽

Data Set ◽

Translation Invariant ◽

Content Type ◽

Dea Models ◽

Regional Banks

Purpose The purpose of this paper is to measure the comparative efficiency of 18 Korean commercial banks under the presence of negative observations and examine performance differences among them by grouping them according to their market conditions. Design/methodology/approach The authors employ two data envelopment analysis (DEA) models such as a Banker, Charnes, and Cooper (BCC) model and a modified slacks-based measure of efficiency (MSBM) model, which can handle negative data. The BCC model is proven to be translation invariant for inputs or outputs depending on output or input orientation. Meanwhile, the MSBM model is unit invariant in addition to translation invariant. The authors compare results from both models and choose one for interpreting results. Findings Most Korean banks recovered from the worst performance in 2011 and showed similar performance in recent years. Among three groups such as national banks, regional banks, and special banks, the most special banks demonstrated superb performance across models and years. Especially, the performance difference between the special banks and the regional banks was statistically significant. The authors concluded that the high performance of the special banks was due to their nationwide market access and ownership type. Practical implications This study demonstrates how to analyze and measure the efficiency of entities when variables contain negative observations using a data set for Korean banks. The authors have tried two major DEA models that are able to handle negative data and proposed a practical direction for future studies. Originality/value Although there are research papers for measuring the performance of banks in Korea, all of the papers in the topic have studied efficiency or productivity using positive data sets. However, variables such as net incomes and growth rates frequently include negative observations in bank data sets. This is the first paper to investigate the efficiency of bank operations in the presence of negative data in Korea.

Download Full-text

A scalable eigenspace-based fuzzy c-means for topic detection

Data Technologies and Applications ◽

10.1108/dta-11-2020-0262 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Hendri Murfi

Keyword(s):

Representation Learning ◽

Detection Methods ◽

Data Sets ◽

Topic Detection ◽

Data Set ◽

Content Type ◽

Running Time ◽

Fuzzy C Means ◽

Coherence Score ◽

Value Decomposition

PurposeThe aim of this research is to develop an eigenspace-based fuzzy c-means method for scalable topic detection.Design/methodology/approachThe eigenspace-based fuzzy c-means (EFCM) combines representation learning and clustering. The textual data are transformed into a lower-dimensional eigenspace using truncated singular value decomposition. Fuzzy c-means is performed on the eigenspace to identify the centroids of each cluster. The topics are provided by transforming back the centroids into the nonnegative subspace of the original space. In this paper, we extend the EFCM method for scalability by using the two approaches, i.e. single-pass and online. We call the developed topic detection methods as oEFCM and spEFCM.FindingsOur simulation shows that both oEFCM and spEFCM methods provide faster running times than EFCM for data sets that do not fit in memory. However, there is a decrease in the average coherence score. For both data sets that fit and do not fit into memory, the oEFCM method provides a tradeoff between running time and coherence score, which is better than spEFCM.Originality/valueThis research produces a scalable topic detection method. Besides this scalability capability, the developed method also provides a faster running time for the data set that fits in memory.

Download Full-text