Efficient and Scalable Multi-Task Regression on Massive Number of Tasks

Many real-world large-scale regression problems can be formulated as Multi-task Learning (MTL) problems with a massive number of tasks, as in retail and transportation domains. However, existing MTL methods still fail to offer both the generalization performance and the scalability for such problems. Scaling up MTL methods to problems with a tremendous number of tasks is a big challenge. Here, we propose a novel algorithm, named Convex Clustering Multi-Task regression Learning (CCMTL), which integrates with convex clustering on the k-nearest neighbor graph of the prediction models. Further, CCMTL efficiently solves the underlying convex problem with a newly proposed optimization method. CCMTL is accurate, efficient to train, and empirically scales linearly in the number of tasks. On both synthetic and real-world datasets, the proposed CCMTL outperforms seven state-of-the-art (SoA) multi-task learning methods in terms of prediction accuracy as well as computational efficiency. On a real-world retail dataset with 23,812 tasks, CCMTL requires only around 30 seconds to train on a single thread, while the SoA methods need up to hours or even days.

Download Full-text

Adaptive Multi-Type Fingerprint Indoor Positioning and Localization Method Based on Multi-Task Learning and Weight Coefficients K-Nearest Neighbor

Sensors ◽

10.3390/s20185416 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5416 ◽

Cited By ~ 1

Author(s):

Zhengwu Yuan ◽

Xupeng Zha ◽

Xiaojian Zhang

Keyword(s):

Nearest Neighbor ◽

Prediction Models ◽

Weighted Average ◽

Indoor Positioning ◽

Positioning Error ◽

K Nearest Neighbor ◽

Localization Method ◽

Task Learning ◽

Positioning Errors ◽

Weight Coefficients

The complex indoor environment makes the use of received fingerprints unreliable as an indoor positioning and localization method based on fingerprint data. This paper proposes an adaptive multi-type fingerprint indoor positioning and localization method based on multi-task learning (MTL) and Weight Coefficients K-Nearest Neighbor (WCKNN), which integrates magnetic field, Wi-Fi and Bluetooth fingerprints for positioning and localization. The MTL fuses the features of different types of fingerprints to search the potential relationship between them. It also exploits the synergy between the tasks, which can boost up positioning and localization performance. Then the WCKNN predicts another position of the fingerprints in a certain class determined by the obtained location. The final position is obtained by fusing the predicted positions using a weighted average method whose weights are the positioning errors provided by positioning error prediction models. Experimental results indicated that the proposed method achieved 98.58% accuracy in classifying locations with a mean positioning error of 1.95 m.

Download Full-text

Default Probability Prediction of Credit Applicants Using a New Fuzzy KNN Method with Optimal Weights

Advances in Business Information Systems and Analytics - Handbook of Research on Organizational Transformations through Big Data Analytics ◽

10.4018/978-1-4666-7272-7.ch024 ◽

2015 ◽

pp. 429-465

Author(s):

Abbas Keramati ◽

Niloofar Yousefi ◽

Amin Omidvar

Keyword(s):

Real World ◽

Nearest Neighbor ◽

Predictive Accuracy ◽

Credit Scoring ◽

Default Probability ◽

The Other ◽

K Nearest Neighbor ◽

Optimal Weights ◽

Probability Prediction ◽

Real World Datasets

Credit scoring has become a very important issue due to the recent growth of the credit industry. As the first objective, this chapter provides an academic database of literature between and proposes a classification scheme to classify the articles. The second objective of this chapter is to suggest the employing of the Optimally Weighted Fuzzy K-Nearest Neighbor (OWFKNN) algorithm for credit scoring. To show the performance of this method, two real world datasets from UCI database are used. In classification task, the empirical results demonstrate that the OWFKNN outperforms the conventional KNN and fuzzy KNN methods and also other methods. In the predictive accuracy of probability of default, the OWFKNN also show the best performance among the other methods. The results in this chapter suggest that the OWFKNN approach is mostly effective in estimating default probabilities and is a promising method to the fields of classification.

Download Full-text

Default Probability Prediction of Credit Applicants Using a New Fuzzy KNN Method With Optimal Weights

Intelligent Systems ◽

10.4018/978-1-5225-5643-5.ch082 ◽

2018 ◽

pp. 1838-1874

Author(s):

Abbas Keramati ◽

Niloofar Yousefi ◽

Amin Omidvar

Keyword(s):

Real World ◽

Nearest Neighbor ◽

Predictive Accuracy ◽

Credit Scoring ◽

Default Probability ◽

The Other ◽

K Nearest Neighbor ◽

Optimal Weights ◽

Probability Prediction ◽

Real World Datasets

Download Full-text

Mining Knowledge of Respiratory Rate Quantification and Abnormal Pattern Prediction

Cognitive Computation ◽

10.1007/s12559-021-09908-8 ◽

2021 ◽

Author(s):

Piotr Szczuko ◽

Adam Kurowski ◽

Piotr Odya ◽

Andrzej Czyżewski ◽

Bożena Kostek ◽

...

Keyword(s):

Signal Analysis ◽

Granular Computing ◽

Nearest Neighbor ◽

Prediction Models ◽

Respiratory Pattern ◽

Breathing Patterns ◽

K Nearest Neighbor ◽

Normal Breathing ◽

Abnormal Pattern ◽

Male Sex

AbstractThe described application of granular computing is motivated because cardiovascular disease (CVD) remains a major killer globally. There is increasing evidence that abnormal respiratory patterns might contribute to the development and progression of CVD. Consequently, a method that would support a physician in respiratory pattern evaluation should be developed. Group decision-making, tri-way reasoning, and rough set–based analysis were applied to granular computing. Signal attributes and anthropomorphic parameters were explored to develop prediction models to determine the percentage contribution of periodic-like, intermediate, and normal breathing patterns in the analyzed signals. The proposed methodology was validated employing k-nearest neighbor (k-NN) and UMAP (uniform manifold approximation and projection). The presented approach applied to respiratory pattern evaluation shows that median accuracies in a considerable number of cases exceeded 0.75. Overall, parameters related to signal analysis are indicated as more important than anthropomorphic features. It was also found that obesity characterized by a high WHR (waist-to-hip ratio) and male sex were predisposing factors for the occurrence of periodic-like or intermediate patterns of respiration. It may be among the essential findings derived from this study. Based on classification measures, it may be observed that a physician may use such a methodology as a respiratory pattern evaluation-aided method.

Download Full-text

Determination of Reactivity Ratios from Binary Copolymerization Using the k-Nearest Neighbor Non-Parametric Regression

Polymers ◽

10.3390/polym13213811 ◽

2021 ◽

Vol 13 (21) ◽

pp. 3811

Author(s):

Iosif Sorin Fazakas-Anca ◽

Arina Modrea ◽

Sorin Vlase

Keyword(s):

Experimental Data ◽

Nearest Neighbor ◽

Optimization Method ◽

Reactivity Ratios ◽

Data Sets ◽

K Nearest Neighbor ◽

Integration Algorithm ◽

Data Set ◽

Parametric Regression ◽

Non Parametric

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.

Download Full-text

Attention Enhanced Serial Unet++ Network for Removing Unevenly Distributed Haze

Electronics ◽

10.3390/electronics10222868 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2868

Author(s):

Wenxuan Zhao ◽

Yaqin Zhao ◽

Liqi Feng ◽

Jiaxi Tang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Real World ◽

Large Scale ◽

Learning Strategy ◽

Contextual Information ◽

Small Scale ◽

Image Dehazing ◽

Atmospheric Scattering ◽

Real World Datasets

The purpose of image dehazing is the reduction of the image degradation caused by suspended particles for supporting high-level visual tasks. Besides the atmospheric scattering model, convolutional neural network (CNN) has been used for image dehazing. However, the existing image dehazing algorithms are limited in face of unevenly distributed haze and dense haze in real-world scenes. In this paper, we propose a novel end-to-end convolutional neural network called attention enhanced serial Unet++ dehazing network (AESUnet) for single image dehazing. We attempt to build a serial Unet++ structure that adopts a serial strategy of two pruned Unet++ blocks based on residual connection. Compared with the simple Encoder–Decoder structure, the serial Unet++ module can better use the features extracted by encoders and promote contextual information fusion in different resolutions. In addition, we take some improvement measures to the Unet++ module, such as pruning, introducing the convolutional module with ResNet structure, and a residual learning strategy. Thus, the serial Unet++ module can generate more realistic images with less color distortion. Furthermore, following the serial Unet++ blocks, an attention mechanism is introduced to pay different attention to haze regions with different concentrations by learning weights in the spatial domain and channel domain. Experiments are conducted on two representative datasets: the large-scale synthetic dataset RESIDE and the small-scale real-world datasets I-HAZY and O-HAZY. The experimental results show that the proposed dehazing network is not only comparable to state-of-the-art methods for the RESIDE synthetic datasets, but also surpasses them by a very large margin for the I-HAZY and O-HAZY real-world dataset.

Download Full-text

Pengembangan Metode Ant Colony Optimization Pada Klasifikasi Tanaman Mangga Menggunakan K-Nearest Neighbor

Jurnal Buana Informatika ◽

10.24002/jbi.v8i4.1443 ◽

2017 ◽

Vol 8 (4) ◽

Author(s):

Febri Liantoni ◽

Luky Agus Hermanto

Keyword(s):

Edge Detection ◽

Ant Colony Optimization ◽

Nearest Neighbor ◽

Optimization Method ◽

Ant Colony ◽

K Nearest Neighbor ◽

Edge Image ◽

Image Detection ◽

Mango Leaves ◽

Leaf Midrib

Abstract. Leaf is one important part of a plant normally used to classify the types of plants. The introduction process of mango leaves of mangung and manalagi mango is done based on the leaf edge image detection. In this research the conventional edge detection process was replaced by ant colony optimization method. It is aimed to optimize the result of edge detection of mango leaf midrib and veins image. The application of ant colony optimization method successfully optimizes the result of edge detection of a mango leaf midrib and veins structure. This is demonstrated by the detection of bony edges of the leaf structure which is thicker and more detailed than using a conventional edge detection. Classification testing using k-nearest neighbor method obtained 66.67% accuracy. Keywords: edge detection, ant colony optimization, classification, k-nearest neighbor. Abstrak. Pengembangan Metode Ant Colony Optimization Pada Klasifikasi Tanaman Mangga Menggunakan K-Nearest Neighbor. Daun merupakan salah satu bagian penting dari tanaman yang biasanya digunakan untuk proses klasifikasi jenis tanaman. Proses pengenalan daun mangga gadung dan mangga manalagi dilakukan berdasarkan deteksi tepi citra struktur tulang daun. Pada penelitian ini proses deteksi tepi konvensional digantikan dengan metode ant colony optimization. Hal ini bertujuan untuk optimasi hasil deteksi tepi citra tulang daun mangga. Penerapan metode ant colony optimization berhasil mengoptimalkan hasil deteksi tepi struktur tulang daun mangga. Hal ini ditunjukkan berdasarkan dari hasil deteksi tepi citra struktur tulang daun yang lebih tebal dan lebih detail dibandingkan menggunakan deteksi tepi konvensional. Pengujian klasifikasi dengan metode k-nearest neighbor didapatkan nilai akurasi sebesar 66,67%.Kata Kunci: deteksi tepi, ant colony optimization, klasifiaksi, k-nearest neighbor.

Download Full-text

Similarity-based error prediction approach for real-time inflow forecasting

Hydrology Research ◽

10.2166/nh.2013.098 ◽

2013 ◽

Vol 45 (4-5) ◽

pp. 589-602 ◽

Cited By ~ 5

Author(s):

Mahmood Akbari ◽

Abbas Afshar

Keyword(s):

Real Time ◽

Nearest Neighbor ◽

Prediction Models ◽

Error Prediction ◽

K Nearest Neighbor ◽

Forecasting Models ◽

Main Challenge ◽

Inflow Forecasting ◽

Artificial Neural Network Ann ◽

Prediction Approach

Regardless of extensive researches on hydrologic forecasting models, the issue of updating the outputs from forecasting models has remained a main challenge. Most of the existing output updating methods are mainly based on the presence of persistence in the errors. This paper presents an alternative approach to updating the outputs from forecasting models in order to produce more accurate forecast results. The approach uses the concept of the similarity in errors for error prediction. The K nearest neighbor (KNN) algorithm is employed as a similarity-based error prediction model and improvements are made by new data, and two other forms of the KNN are developed in this study. The KNN models are applied for the error prediction of flow forecasting models in two catchments and the updated flows are compared to those of persistence-based methods such as autoregressive (AR) and artificial neural network (ANN) models. The results show that the similarity-based error prediction models can be recognized as an efficient alternative for real-time inflow forecasting, especially where the persistence in the error series of flow forecasting model is relatively low.

Download Full-text

Region-Based Graph Learning towards Large Scale Image Annotation

Graph-Based Methods in Computer Vision ◽

10.4018/978-1-4666-1891-6.ch013 ◽

2012 ◽

pp. 244-260

Author(s):

Bao Bing-Kun ◽

Yan Shuicheng

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Image Annotation ◽

Learning Algorithm ◽

Label Propagation ◽

Locality Sensitive Hashing ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph ◽

Modeling Data

Graph-based learning provides a useful approach for modeling data in image annotation problems. In this chapter, the authors introduce how to construct a region-based graph to annotate large scale multi-label images. It has been well recognized that analysis in semantic region level may greatly improve image annotation performance compared to that in whole image level. However, the region level approach increases the data scale to several orders of magnitude and lays down new challenges to most existing algorithms. To this end, each image is firstly encoded as a Bag-of-Regions based on multiple image segmentations. And then, all image regions are constructed into a large k-nearest-neighbor graph with efficient Locality Sensitive Hashing (LSH) method. At last, a sparse and region-aware image-based graph is fed into the multi-label extension of the Entropic graph regularized semi-supervised learning algorithm (Subramanya & Bilmes, 2009). In combination they naturally yield the capability in handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets well validate the effectiveness and efficiency of the framework for region-aware and scalable multi-label propagation.

Download Full-text

Self-Paced Robust Learning for Leveraging Clean Labels in Noisy Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6166 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6853-6860

Author(s):

Xuchao Zhang ◽

Xian Wu ◽

Fanglan Chen ◽

Liang Zhao ◽

Chang-Tien Lu

Keyword(s):

Real World ◽

Large Scale ◽

Learning Algorithm ◽

Noisy Data ◽

Training Set ◽

Robust Learning ◽

Robust Model ◽

Small Set ◽

Real World Datasets ◽

Theoretical Analyses

The success of training accurate models strongly depends on the availability of a sufficient collection of precisely labeled data. However, real-world datasets contain erroneously labeled data samples that substantially hinder the performance of machine learning models. Meanwhile, well-labeled data is usually expensive to obtain and only a limited amount is available for training. In this paper, we consider the problem of training a robust model by using large-scale noisy data in conjunction with a small set of clean data. To leverage the information contained via the clean labels, we propose a novel self-paced robust learning algorithm (SPRL) that trains the model in a process from more reliable (clean) data instances to less reliable (noisy) ones under the supervision of well-labeled data. The self-paced learning process hedges the risk of selecting corrupted data into the training set. Moreover, theoretical analyses on the convergence of the proposed algorithm are provided under mild assumptions. Extensive experiments on synthetic and real-world datasets demonstrate that our proposed approach can achieve a considerable improvement in effectiveness and robustness to existing methods.

Download Full-text