scholarly journals Optimization of Hierarchical Regression Model with Application to Optimizing Multi-Response Regression K-ary Trees

Author(s):  
Pooya Tavallali ◽  
Peyman Tavallali ◽  
Mukesh Singhal

A fast, convenient and well-known way toward regression is to induce and prune a binary tree. However, there has been little attempt toward improving the performance of an induced regression tree. This paper presents a meta-algorithm capable of minimizing the regression loss function, thus, improving the accuracy of any given hierarchical model, such as k-ary regression trees. Our proposed method minimizes the loss function of each node one by one. At split nodes, this leads to solving an instance-based cost-sensitive classification problem over the node’s data points. At the leaf nodes, the method leads to a simple regression problem. In the case of binary univariate and multivariate regression trees, the computational complexity of training is linear over the samples. Hence, our method is scalable to large trees and datasets. We also briefly explore possibilities of applying proposed method to classification tasks. We show that our algorithm has significantly better test error compared to other state-ofthe- art tree algorithms. At the end, accuracy, memory usage and query time of our method are compared to recently introduced forest models. We depict that, most of the time, our proposed method is able to achieve better or similar accuracy while having tangibly faster query time and smaller number of nonzero weights.

2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Wei Li ◽  
Youmeng Luo ◽  
Chao Tang ◽  
Kaiqiang Zhang ◽  
Xiaoyu Ma

The regression problem is a valued problem in the domain of machine learning, and it has been widely employed in many fields such as meteorology, transportation, and material. Granular computing (GrC) is a good approach of exploring human intelligent information processing, which has the superiority of knowledge discovery. Ensemble learning is easy to execute parallelly. Based on granular computing and ensemble learning, we convert the regression problem into granular space equivalently to solve and proposed boosted fuzzy granular regression trees (BFGRT) to predict a test instance. The thought of BFGRT is as follows. First, a clustering algorithm with automatic optimization of clustering centers is presented. Next, in terms of the clustering algorithm, we employ MapReduce to parallelly implement fuzzy granulation of the data. Then, we design new operators and metrics of fuzzy granules to build fuzzy granular rule base. Finally, a fuzzy granular regression tree (FGRT) in the fuzzy granular space is presented. In the light of these, BFGRT can be designed by parallelly combing multiple FGRTs via random sampling attributes and MapReduce. Theory and experiments show that BFGRT is accurate, efficient, and robust.


2021 ◽  
Vol 13 (11) ◽  
pp. 2171
Author(s):  
Yuhao Qing ◽  
Wenyi Liu ◽  
Liuyan Feng ◽  
Wanjia Gao

Despite significant progress in object detection tasks, remote sensing image target detection is still challenging owing to complex backgrounds, large differences in target sizes, and uneven distribution of rotating objects. In this study, we consider model accuracy, inference speed, and detection of objects at any angle. We also propose a RepVGG-YOLO network using an improved RepVGG model as the backbone feature extraction network, which performs the initial feature extraction from the input image and considers network training accuracy and inference speed. We use an improved feature pyramid network (FPN) and path aggregation network (PANet) to reprocess feature output by the backbone network. The FPN and PANet module integrates feature maps of different layers, combines context information on multiple scales, accumulates multiple features, and strengthens feature information extraction. Finally, to maximize the detection accuracy of objects of all sizes, we use four target detection scales at the network output to enhance feature extraction from small remote sensing target pixels. To solve the angle problem of any object, we improved the loss function for classification using circular smooth label technology, turning the angle regression problem into a classification problem, and increasing the detection accuracy of objects at any angle. We conducted experiments on two public datasets, DOTA and HRSC2016. Our results show the proposed method performs better than previous methods.


Author(s):  
Aijun Xue ◽  
Xiaodan Wang

Many real world applications involve multiclass cost-sensitive learning problems. However, some well-worked binary cost-sensitive learning algorithms cannot be extended into multiclass cost-sensitive learning directly. It is meaningful to decompose the complex multiclass cost-sensitive classification problem into a series of binary cost-sensitive classification problems. So, in this paper we propose an alternative and efficient decomposition framework, using the original error correcting output codes. The main problem in our framework is how to evaluate the binary costs for each binary cost-sensitive base classifier. To solve this problem, we proposed to compute the expected misclassification costs starting from the given multiclass cost matrix. Furthermore, the general formulations to compute the binary costs are given. Experimental results on several synthetic and UCI datasets show that our method can obtain comparable performance in comparison with the state-of-the-art methods.


2014 ◽  
Vol 556-562 ◽  
pp. 6286-6289
Author(s):  
Nian Li ◽  
Li Yin ◽  
Qing Xi Peng

The Internet has experienced profound changes. Large amount of user-generated-contents provide valuable information to the public. Customers usually express their opinion in online shopping. After they finish the reviews, they give an overall rating to the product or service. In this paper, we focus on the review rating prediction problem. Previous studies usually regard this problem as a regression problem. We take a new machine learning method to solve the problem. Learning to rank method has been exploited to tackle the prediction. After feature selection, the maximum entropy classifier has been employed to solve the multi-classification problem. The real life dataset has been crawled to verify the proposed method. Empirical studies demonstrate the proposed method outperform the baseline methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Yoonseok Shin

Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.


Sensors ◽  
2020 ◽  
Vol 20 (12) ◽  
pp. 3405
Author(s):  
Diyar Khalis Bilal ◽  
Mustafa Unel ◽  
Mehmet Yildiz ◽  
Bahattin Koc

This paper deals with the development of a realtime structural health monitoring system for airframe structures to localize and estimate the magnitude of the loads causing deflections to the critical components, such as wings. To this end, a framework that is based on artificial neural networks is developed where features that are extracted from a depth camera are utilized. The localization of the load is treated as a multinomial logistic classification problem and the load magnitude estimation as a logistic regression problem. The neural networks trained for classification and regression are preceded with an autoencoder, through which maximum informative data at a much smaller scale are extracted from the depth features. The effectiveness of the proposed method is validated by an experimental study performed on a composite unmanned aerial vehicle (UAV) wing subject to concentrated and distributed loads, and the results obtained by the proposed method are superior when compared with a method based on Castigliano’s theorem.


2020 ◽  
Vol 50 (10) ◽  
pp. 3090-3100 ◽  
Author(s):  
Lei Lei ◽  
Yafei Song ◽  
Xi Luo

Abstract When training base classifier by ternary Error Correcting Output Codes (ECOC), it is well know that some classes are ignored. On this account, a non-competent classifier emerges when it classify an instance whose real label does not belong to the meta-subclasses. Meanwhile, the classic ECOC dichotomizers can only produce binary outputs and have no capability of rejection for classification. To overcome the non-competence problem and better model the multi-class problem for reducing the classification cost, we embed reject option to ECOC and present a new variant of ECOC algorithm called as Reject-Option-based Re-encoding ECOC (ROECOC). The cost-sensitive classification model and cost-loss function based on Receiver Operating Characteristic (ROC) curve are built respectively. The optimal reject threshold values are obtained by combing the condition to be met for minimizing the loss function and the ROC convex hull. In so doing, reject option (t1, t2) provides a three-symbol output to make dichotomizers more competent and ROECOC more universal and practical for cost-sensitive classification issue. Experimental results on two kinds of datasets show that our scheme with low-degree freedom of initialized ECOC can effectively enhance accuracy and reduce cost.


2012 ◽  
Vol 198-199 ◽  
pp. 1333-1337 ◽  
Author(s):  
San Xi Wei ◽  
Zong Hai Sun

Gaussian processes (GPs) is a very promising technology that has been applied both in the regression problem and the classification problem. In recent years, models based on Gaussian process priors have attracted much attention in the machine learning. Binary (or two-class, C=2) classification using Gaussian process is a very well-developed method. In this paper, a Multi-classification (C>2) method is illustrated, which is based on Binary GPs classification. A good accuracy can be obtained through this method. Meanwhile, a comparison about decision time and accuracy between this method and Support Vector Machine (SVM) is made during the experiments.


Author(s):  
Ramón Ventura Roque Hernández ◽  
José Melchor Medina Quintero ◽  
Adán López Mendoza ◽  
Demián Ábrego Almazán

En los últimos años, las universidades han promovido el acceso a los repositorios digitales para localizar fuentes de información que faciliten el proceso de investigación científica. Sin embargo, son escasos los estudios que han evaluado la satisfacción de los usuarios en relación con el empleo de estos recursos tecnológicos. Este trabajo, en consecuencia, tuvo como objetivo identificar perfiles en la satisfacción de estudiantes universitarios con el manejo de estas herramientas. Para ello, se aplicó un cuestionario con 26 preguntas agrupadas en 7 dimensiones que permitieron recabar respuestas de 219 participantes de una universidad con presencia en Nuevo Laredo y Ciudad Victoria (Tamaulipas, México). En esta labor, se analizaron dos variables como posibles predictores en la construcción de perfiles de satisfacción de uso: la primera se relacionó con la interfaz del repositorio (interactividad, confianza, oportunidad de acceso, facilidad de uso, atractivo visual e innovación), mientras que la segunda se vinculó con el estudiante (sexo, nivel de estudios máximo y lugar de origen). Para esta tarea se utilizó el paquete estadístico SPSS y se aplicó la técnica de minería de datos denominada árbol de regresión, con método de crecimiento denominado CRT (classification and regression trees). A partir de los datos recabados, se obtuvo un árbol que describe tres perfiles con niveles de satisfacción bajo, medio y alto. Las personas con bajo nivel de satisfacción fueron quienes percibieron que los repositorios no eran fáciles de utilizar. El nivel medio de satisfacción se observó en personas que consideraron que los repositorios eran fáciles de usar, aunque no tuvieron confianza en la seguridad que ofrecían ni percibieron un alto nivel de innovación en ellos. Por último, los más altos niveles de satisfacción se evidenciaron en estudiantes que opinaron que los repositorios eran fáciles de manejar y tenían un nivel confiable de seguridad. Los resultados hacen posible el entendimiento de la satisfacción de los usuarios en términos de las variables estudiadas, con el objetivo de priorizarlas en el diseño e implementación de nuevos repositorios institucionales para brindar mejores experiencias de uso orientadas al óptimo aprovechamiento de estos recursos.


2012 ◽  
Vol 31 ◽  
pp. 15-21 ◽  
Author(s):  
A. Künne ◽  
M. Fink ◽  
H. Kipka ◽  
P. Krause ◽  
W.-A. Flügel

Abstract. In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).


Sign in / Sign up

Export Citation Format

Share Document