scholarly journals Educational Data Clustering in a Weighted Feature Space Using Kernel K-Means and Transfer Learning Algorithms

Author(s):  
Vo Thi Ngoc Chau ◽  
Nguyen Hua Phung

Educational data clustering on the students’ data collected with a program can find several groups of the students sharing the similar characteristics in their behaviors and study performance. For some programs, it is not trivial for us to prepare enough data for the clustering task. Data shortage might then influence the effectiveness of the clustering process and thus, true clusters can not be discovered appropriately. On the other hand, there are other programs that have been well examined with much larger data sets available for the task. Therefore, it is wondered if we can exploit the larger data sets from other source programs to enhance the educational data clustering task on the smaller data sets from the target program. Thanks to transfer learning techniques, a transfer-learning-based clustering method is defined with the kernel k-means and spectral feature alignment algorithms in our paper as a solution to the educational data clustering task in such a context. Moreover, our method is optimized within a weighted feature space so that how much contribution of the larger source data sets to the clustering process can be automatically determined. This ability is the novelty of our proposed transfer learning-based clustering solution as compared to those in the existing works. Experimental results on several real data sets have shown that our method consistently outperforms the other methods using many various approaches with both external and internal validations.

Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1850
Author(s):  
Rashad A. R. Bantan ◽  
Farrukh Jamal ◽  
Christophe Chesneau ◽  
Mohammed Elgarhy

Unit distributions are commonly used in probability and statistics to describe useful quantities with values between 0 and 1, such as proportions, probabilities, and percentages. Some unit distributions are defined in a natural analytical manner, and the others are derived through the transformation of an existing distribution defined in a greater domain. In this article, we introduce the unit gamma/Gompertz distribution, founded on the inverse-exponential scheme and the gamma/Gompertz distribution. The gamma/Gompertz distribution is known to be a very flexible three-parameter lifetime distribution, and we aim to transpose this flexibility to the unit interval. First, we check this aspect with the analytical behavior of the primary functions. It is shown that the probability density function can be increasing, decreasing, “increasing-decreasing” and “decreasing-increasing”, with pliant asymmetric properties. On the other hand, the hazard rate function has monotonically increasing, decreasing, or constant shapes. We complete the theoretical part with some propositions on stochastic ordering, moments, quantiles, and the reliability coefficient. Practically, to estimate the model parameters from unit data, the maximum likelihood method is used. We present some simulation results to evaluate this method. Two applications using real data sets, one on trade shares and the other on flood levels, demonstrate the importance of the new model when compared to other unit models.


2014 ◽  
Vol 39 (2) ◽  
pp. 107-127 ◽  
Author(s):  
Artur Matyja ◽  
Krzysztof Siminski

Abstract The missing values are not uncommon in real data sets. The algorithms and methods used for the data analysis of complete data sets cannot always be applied to missing value data. In order to use the existing methods for complete data, the missing value data sets are preprocessed. The other solution to this problem is creation of new algorithms dedicated to missing value data sets. The objective of our research is to compare the preprocessing techniques and specialised algorithms and to find their most advantageous usage.


2021 ◽  
Vol 24 (1) ◽  
pp. 42-47
Author(s):  
N. P. Koryshev ◽  
◽  
I. A. Hodashinsky ◽  

The article presents a description of the algorithm for generating fuzzy rules for a fuzzy classifier using data clustering, metaheuristic, and the clustering quality index, as well as the results of performance testing on real data sets.


Images generated from a variety of sources and foundations today can pose difficulty for a user to interpret similarity in them or analyze them for further use because of their segmentation policies. This unconventionality can generate many errors, because of which the previously used traditional methodologies such as supervised learning techniques less resourceful, which requires huge quantity of labelled training data which mirrors the desired target data. This paper thus puts forward the mechanism of an alternative technique i.e. transfer learning to be used in image diagnosis so that efficiency and accuracy among images can be achieved. This type of mechanism deals with variation in the desired and actual data used for training and the outlier sensitivity, which ultimately enhances the predictions by giving better results in various areas, thus leaving the traditional methodologies behind. The following analysis further discusses about three types of transfer classifiers which can be applied using only small volume of training data sets and their contrast with the traditional method which requires huge quantities of training data having attributes with slight changes. The three different separators were compared amongst them and also together from the traditional methodology being used for a very common application used in our daily life. Also, commonly occurring problems such as the outlier sensitivity problem were taken into consideration and measures were taken to recognise and improvise them. On further research it was observed that the performance of transfer learning exceeds that of the conventional supervised learning approaches being used for small amount of characteristic training data provided reducing the stratification errors to a great extent


Author(s):  
Kyle Dillon Feuz ◽  
Diane J. Cook

Purpose – The purpose of this paper is to study heterogeneous transfer learning for activity recognition using heuristic search techniques. Many pervasive computing applications require information about the activities currently being performed, but activity recognition algorithms typically require substantial amounts of labeled training data for each setting. One solution to this problem is to leverage transfer learning techniques to reuse available labeled data in new situations. Design/methodology/approach – This paper introduces three novel heterogeneous transfer learning techniques that reverse the typical transfer model and map the target feature space to the source feature space and apply them to activity recognition in a smart apartment. This paper evaluates the techniques on data from 18 different smart apartments located in an assisted-care facility and compares the results against several baselines. Findings – The three transfer learning techniques are all able to outperform the baseline comparisons in several situations. Furthermore, the techniques are successfully used in an ensemble approach to achieve even higher levels of accuracy. Originality/value – The techniques in this paper represent a considerable step forward in heterogeneous transfer learning by removing the need to rely on instance – instance or feature – feature co-occurrence data.


2017 ◽  
Vol 7 (1) ◽  
pp. 43 ◽  
Author(s):  
Rezzy Eko Caraka ◽  
Hasbi Yasin ◽  
Adi Waridi Basyiruddin

Recently, instead of selecting a kernel has been proposed which uses SVR, where the weight of each kernel is optimized during training. Along this line of research, many pioneering kernel learning algorithms have been proposed. The use of kernels provides a powerful and principled approach to modeling nonlinear patterns through linear patterns in a feature space. Another bene?t is that the design of kernels and linear methods can be decoupled, which greatly facilitates the modularity of machine learning methods. We perform experiments on real data sets crude palm oil prices for application and better illustration using kernel radial basis. We see that evaluation gives a good to fit prediction and actual also good values showing the validity and accuracy of the realized model based on MAPE and R2. Keywords:  Crude Palm Oil; Forecasting; SVR; Radial Basis; Kernel


2021 ◽  
Vol 343 ◽  
pp. 05010
Author(s):  
Adina Sârb ◽  
Cristina Burja Udrea ◽  
Daniela Nagy – Oniţa ◽  
Liliana Itul ◽  
Maria Popa

According to ISO 9000, a quality management system is part of a set of related or interacting elements of an organization that sets policies and objectives, as well as the processes necessary to achieve the quality objectives. Quality is the extent to which a set of intrinsic characteristics of an object meets the requirements. Based on these definitions, the factory, considered in this paper, S.C. APULUM S.A.,decided to implement a quality management system since 1998. Subsequently, the organization’s attention is focus on the continuous improvement of the implemented quality management system. The purpose of this paper is to study the percent of specified defects specific to ceramic products in the future to improve the quality management system. In this regard, machine learning techniques were applied for defects forecasting for different types of products: mugs, pressed plates and jiggered plates. The experimental evaluation was performed on real data sets that contain percentages about different types of defects collected in 2018-2019. The experimental results show that for each type of product exists an algorithm that forecasts the future defects.


2022 ◽  
Author(s):  
Urja Banati ◽  
Vamika Prakash ◽  
Rashi Verma ◽  
Smriti Srivast

Abstract Soft Biometrics is a growing field that has been known to improve the recognition system as witnessed in the past decade. When combined with hard biometrics like iris, gait, fingerprint recognition etc. it has been seen that the efficiency of the system increases many folds. With the Pandemic came the need to recognise faces covered with mask in an efficient way- soft biometrics proved to be an aid in this. While recent advances in computer vision have helped in the estimation of age and gender - the system could be improved by extending the scope and detecting quite a few other soft biometric attributes that helps us in identifying a person, including but not limited to - eyeglasses, hair type and color, mustache, eyebrows etc. In this paper we propose a system of identification that uses the ocular and forehead part of the face as modalities to train our models that uses transfer learning techniques to help in the detection of 12 soft biometric attributes (FFHQ dataset) and 25 soft biometric attributes (CelebA dataset) for masked faces. We compare the results with the unmasked faces in order to see the variation of efficiency using these data-sets Throughout the paper we have implemented 4 enhanced models namely - enhanced Alexnet ,enhanced Resnet50, enhanced MobilenetV2 and enhanced SqueezeNet. The enhanced models apply transfer learning to the normal models and aids in improving accuracy. In the end we compare the results and see how the accuracy varies according to the model used and whether the images are masked or unmasked. We conclude that for images containing facial masks - using enhanced MobileNet would give a splendid accuracy of 92.5% (for FFHQ dataset) and 87% (for CelebA dataset).


2015 ◽  
Vol 24 (03) ◽  
pp. 1550003 ◽  
Author(s):  
Armin Daneshpazhouh ◽  
Ashkan Sami

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.


Sign in / Sign up

Export Citation Format

Share Document