Optimization of Hierarchical Regression Model with Application to Optimizing Multi-Response Regression K-ary Trees

A fast, convenient and well-known way toward regression is to induce and prune a binary tree. However, there has been little attempt toward improving the performance of an induced regression tree. This paper presents a meta-algorithm capable of minimizing the regression loss function, thus, improving the accuracy of any given hierarchical model, such as k-ary regression trees. Our proposed method minimizes the loss function of each node one by one. At split nodes, this leads to solving an instance-based cost-sensitive classification problem over the node’s data points. At the leaf nodes, the method leads to a simple regression problem. In the case of binary univariate and multivariate regression trees, the computational complexity of training is linear over the samples. Hence, our method is scalable to large trees and datasets. We also briefly explore possibilities of applying proposed method to classification tasks. We show that our algorithm has significantly better test error compared to other state-ofthe- art tree algorithms. At the end, accuracy, memory usage and query time of our method are compared to recently introduced forest models. We depict that, most of the time, our proposed method is able to achieve better or similar accuracy while having tangibly faster query time and smaller number of nonzero weights.

Download Full-text

Boosted Fuzzy Granular Regression Trees

Mathematical Problems in Engineering ◽

10.1155/2021/9958427 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Wei Li ◽

Youmeng Luo ◽

Chao Tang ◽

Kaiqiang Zhang ◽

Xiaoyu Ma

Keyword(s):

Ensemble Learning ◽

Granular Computing ◽

Clustering Algorithm ◽

Regression Tree ◽

Regression Trees ◽

Rule Base ◽

Regression Problem ◽

Test Instance ◽

Intelligent Information ◽

Theory And Experiments

The regression problem is a valued problem in the domain of machine learning, and it has been widely employed in many fields such as meteorology, transportation, and material. Granular computing (GrC) is a good approach of exploring human intelligent information processing, which has the superiority of knowledge discovery. Ensemble learning is easy to execute parallelly. Based on granular computing and ensemble learning, we convert the regression problem into granular space equivalently to solve and proposed boosted fuzzy granular regression trees (BFGRT) to predict a test instance. The thought of BFGRT is as follows. First, a clustering algorithm with automatic optimization of clustering centers is presented. Next, in terms of the clustering algorithm, we employ MapReduce to parallelly implement fuzzy granulation of the data. Then, we design new operators and metrics of fuzzy granules to build fuzzy granular rule base. Finally, a fuzzy granular regression tree (FGRT) in the fuzzy granular space is presented. In the light of these, BFGRT can be designed by parallelly combing multiple FGRTs via random sampling attributes and MapReduce. Theory and experiments show that BFGRT is accurate, efficient, and robust.

Download Full-text

Improved YOLO Network for Free-Angle Remote Sensing Target Detection

Remote Sensing ◽

10.3390/rs13112171 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2171

Author(s):

Yuhao Qing ◽

Wenyi Liu ◽

Liuyan Feng ◽

Wanjia Gao

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Target Detection ◽

Multiple Scales ◽

Classification Problem ◽

Input Image ◽

Detection Accuracy ◽

Feature Maps ◽

Regression Problem ◽

Public Datasets

Despite significant progress in object detection tasks, remote sensing image target detection is still challenging owing to complex backgrounds, large differences in target sizes, and uneven distribution of rotating objects. In this study, we consider model accuracy, inference speed, and detection of objects at any angle. We also propose a RepVGG-YOLO network using an improved RepVGG model as the backbone feature extraction network, which performs the initial feature extraction from the input image and considers network training accuracy and inference speed. We use an improved feature pyramid network (FPN) and path aggregation network (PANet) to reprocess feature output by the backbone network. The FPN and PANet module integrates feature maps of different layers, combines context information on multiple scales, accumulates multiple features, and strengthens feature information extraction. Finally, to maximize the detection accuracy of objects of all sizes, we use four target detection scales at the network output to enhance feature extraction from small remote sensing target pixels. To solve the angle problem of any object, we improved the loss function for classification using circular smooth label technology, turning the angle regression problem into a classification problem, and increasing the detection accuracy of objects at any angle. We conducted experiments on two public datasets, DOTA and HRSC2016. Our results show the proposed method performs better than previous methods.

Download Full-text

Cost-sensitive design of error correcting output codes

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406217709303 ◽

2017 ◽

Vol 232 (10) ◽

pp. 1871-1881

Author(s):

Aijun Xue ◽

Xiaodan Wang

Keyword(s):

Classification Problem ◽

Learning Problems ◽

Classification Problems ◽

Cost Sensitive Learning ◽

Misclassification Costs ◽

Real World Applications ◽

Cost Sensitive Classification ◽

Comparable Performance ◽

The Given ◽

Error Correcting Output Codes

Many real world applications involve multiclass cost-sensitive learning problems. However, some well-worked binary cost-sensitive learning algorithms cannot be extended into multiclass cost-sensitive learning directly. It is meaningful to decompose the complex multiclass cost-sensitive classification problem into a series of binary cost-sensitive classification problems. So, in this paper we propose an alternative and efficient decomposition framework, using the original error correcting output codes. The main problem in our framework is how to evaluate the binary costs for each binary cost-sensitive base classifier. To solve this problem, we proposed to compute the expected misclassification costs starting from the given multiclass cost matrix. Furthermore, the general formulations to compute the binary costs are given. Experimental results on several synthetic and UCI datasets show that our method can obtain comparable performance in comparison with the state-of-the-art methods.

Download Full-text

Learning to Rank for Review Rating Prediction

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.6286 ◽

2014 ◽

Vol 556-562 ◽

pp. 6286-6289

Author(s):

Nian Li ◽

Li Yin ◽

Qing Xi Peng

Keyword(s):

Learning To Rank ◽

Empirical Studies ◽

Real Life ◽

Classification Problem ◽

Machine Learning Method ◽

Regression Problem ◽

The Public ◽

Rating Prediction ◽

Multi Classification ◽

New Machine

The Internet has experienced profound changes. Large amount of user-generated-contents provide valuable information to the public. Customers usually express their opinion in online shopping. After they finish the reviews, they give an overall rating to the product or service. In this paper, we focus on the review rating prediction problem. Previous studies usually regard this problem as a regression problem. We take a new machine learning method to solve the problem. Learning to rank method has been exploited to tackle the prediction. After feature selection, the maximum entropy classifier has been employed to solve the multi-classification problem. The real life dataset has been crawled to verify the proposed method. Empirical studies demonstrate the proposed method outperform the baseline methods.

Download Full-text

Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

Computational Intelligence and Neuroscience ◽

10.1155/2015/149702 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 12

Author(s):

Yoonseok Shin

Keyword(s):

Cost Estimation ◽

Construction Projects ◽

High Performance ◽

Learning Algorithm ◽

Early Stage ◽

Construction Project ◽

Regression Tree ◽

Building Construction ◽

Regression Problem ◽

Additional Information

Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.

Download Full-text

Realtime Localization and Estimation of Loads on Aircraft Wings from Depth Images

Sensors ◽

10.3390/s20123405 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3405

Author(s):

Diyar Khalis Bilal ◽

Mustafa Unel ◽

Mehmet Yildiz ◽

Bahattin Koc

Keyword(s):

Neural Networks ◽

Classification Problem ◽

Regression Problem ◽

Health Monitoring System ◽

Aircraft Wings ◽

Depth Images ◽

Aerial Vehicle ◽

The Neural Networks ◽

Critical Components ◽

Castigliano’S Theorem

This paper deals with the development of a realtime structural health monitoring system for airframe structures to localize and estimate the magnitude of the loads causing deflections to the critical components, such as wings. To this end, a framework that is based on artificial neural networks is developed where features that are extracted from a depth camera are utilized. The localization of the load is treated as a multinomial logistic classification problem and the load magnitude estimation as a logistic regression problem. The neural networks trained for classification and regression are preceded with an autoencoder, through which maximum informative data at a much smaller scale are extracted from the depth features. The effectiveness of the proposed method is validated by an experimental study performed on a composite unmanned aerial vehicle (UAV) wing subject to concentrated and distributed loads, and the results obtained by the proposed method are superior when compared with a method based on Castigliano’s theorem.

Download Full-text

A new re-encoding ECOC using reject option

Applied Intelligence ◽

10.1007/s10489-020-01642-2 ◽

2020 ◽

Vol 50 (10) ◽

pp. 3090-3100 ◽

Cited By ~ 3

Author(s):

Lei Lei ◽

Yafei Song ◽

Xi Luo

Keyword(s):

Loss Function ◽

Operating Characteristic ◽

Classification Model ◽

Threshold Values ◽

Reject Option ◽

Roc Convex Hull ◽

Low Degree ◽

New Variant ◽

Cost Sensitive Classification ◽

The Cost

Abstract When training base classifier by ternary Error Correcting Output Codes (ECOC), it is well know that some classes are ignored. On this account, a non-competent classifier emerges when it classify an instance whose real label does not belong to the meta-subclasses. Meanwhile, the classic ECOC dichotomizers can only produce binary outputs and have no capability of rejection for classification. To overcome the non-competence problem and better model the multi-class problem for reducing the classification cost, we embed reject option to ECOC and present a new variant of ECOC algorithm called as Reject-Option-based Re-encoding ECOC (ROECOC). The cost-sensitive classification model and cost-loss function based on Receiver Operating Characteristic (ROC) curve are built respectively. The optimal reject threshold values are obtained by combing the condition to be met for minimizing the loss function and the ROC convex hull. In so doing, reject option (t1, t2) provides a three-symbol output to make dichotomizers more competent and ROECOC more universal and practical for cost-sensitive classification issue. Experimental results on two kinds of datasets show that our scheme with low-degree freedom of initialized ECOC can effectively enhance accuracy and reduce cost.

Download Full-text

A Multi-Classification Method Based on Gaussian Processes

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.198-199.1333 ◽

2012 ◽

Vol 198-199 ◽

pp. 1333-1337 ◽

Cited By ~ 2

Author(s):

San Xi Wei ◽

Zong Hai Sun

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Gaussian Process ◽

Gaussian Processes ◽

Good Accuracy ◽

Classification Problem ◽

Decision Time ◽

Support Vector ◽

Regression Problem ◽

Multi Classification

Gaussian processes (GPs) is a very promising technology that has been applied both in the regression problem and the classification problem. In recent years, models based on Gaussian process priors have attracted much attention in the machine learning. Binary (or two-class, C=2) classification using Gaussian process is a very well-developed method. In this paper, a Multi-classification (C>2) method is illustrated, which is based on Binary GPs classification. A good accuracy can be obtained through this method. Meanwhile, a comparison about decision time and accuracy between this method and Support Vector Machine (SVM) is made during the experiments.

Download Full-text

Identificación de perfiles en la satisfacción de los usuarios de repositorios digitales a través de un árbol de regresión / Identification of profiles in the satisfaction of users of digital repositories through a regression tree

RIDE Revista Iberoamericana para la Investigación y el Desarrollo Educativo ◽

10.23913/ride.v9i17.367 ◽

2018 ◽

Vol 9 (17) ◽

pp. 1-19

Author(s):

Ramón Ventura Roque Hernández ◽

José Melchor Medina Quintero ◽

Adán López Mendoza ◽

Demián Ábrego Almazán

Keyword(s):

Regression Tree ◽

Regression Trees ◽

Classification And Regression Trees ◽

Digital Repositories ◽

Classification And Regression ◽

Estudiantes Universitarios

En los últimos años, las universidades han promovido el acceso a los repositorios digitales para localizar fuentes de información que faciliten el proceso de investigación científica. Sin embargo, son escasos los estudios que han evaluado la satisfacción de los usuarios en relación con el empleo de estos recursos tecnológicos. Este trabajo, en consecuencia, tuvo como objetivo identificar perfiles en la satisfacción de estudiantes universitarios con el manejo de estas herramientas. Para ello, se aplicó un cuestionario con 26 preguntas agrupadas en 7 dimensiones que permitieron recabar respuestas de 219 participantes de una universidad con presencia en Nuevo Laredo y Ciudad Victoria (Tamaulipas, México). En esta labor, se analizaron dos variables como posibles predictores en la construcción de perfiles de satisfacción de uso: la primera se relacionó con la interfaz del repositorio (interactividad, confianza, oportunidad de acceso, facilidad de uso, atractivo visual e innovación), mientras que la segunda se vinculó con el estudiante (sexo, nivel de estudios máximo y lugar de origen). Para esta tarea se utilizó el paquete estadístico SPSS y se aplicó la técnica de minería de datos denominada árbol de regresión, con método de crecimiento denominado CRT (classification and regression trees). A partir de los datos recabados, se obtuvo un árbol que describe tres perfiles con niveles de satisfacción bajo, medio y alto. Las personas con bajo nivel de satisfacción fueron quienes percibieron que los repositorios no eran fáciles de utilizar. El nivel medio de satisfacción se observó en personas que consideraron que los repositorios eran fáciles de usar, aunque no tuvieron confianza en la seguridad que ofrecían ni percibieron un alto nivel de innovación en ellos. Por último, los más altos niveles de satisfacción se evidenciaron en estudiantes que opinaron que los repositorios eran fáciles de manejar y tenían un nivel confiable de seguridad. Los resultados hacen posible el entendimiento de la satisfacción de los usuarios en términos de las variables estudiadas, con el objetivo de priorizarlas en el diseño e implementación de nuevos repositorios institucionales para brindar mejores experiencias de uso orientadas al óptimo aprovechamiento de estos recursos.

Download Full-text

Regionalization of meso-scale physically based nitrogen modeling outputs to the macro-scale by the use of regression trees

Advances in Geosciences ◽

10.5194/adgeo-31-15-2012 ◽

2012 ◽

Vol 31 ◽

pp. 15-21 ◽

Cited By ~ 1

Author(s):

A. Künne ◽

M. Fink ◽

H. Kipka ◽

P. Krause ◽

W.-A. Flügel

Keyword(s):

Computing Time ◽

Regression Tree ◽

Regression Trees ◽

Detailed Knowledge ◽

Detailed Model ◽

Excess Nitrogen ◽

Landscape Type ◽

Macro Scale ◽

Physically Based ◽

Meso Scale

Abstract. In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).

Download Full-text