Combining Multiple Metadata Types in Movies Recommendation Using Ensemble Algorithms

Author(s):  
Bruno Souza Cabral ◽  
Renato Dompieri Beltrao ◽  
Marcelo Garcia Manzato ◽  
Frederico Araújo Durão
Keyword(s):  
2021 ◽  
pp. 1-10
Author(s):  
Ahmet Tezcan Tekin ◽  
Tolga Kaya ◽  
Ferhan Cebi

The use of fuzzy logic in machine learning is becoming widespread. In machine learning problems, the data, which have different characteristics, are trained and predicted together. Training the model consisting of data with different characteristics can increase the rate of error in prediction. In this study, we suggest a new approach to assembling prediction with fuzzy clustering. Our approach aims to cluster the data according to their fuzzy membership value and model it with similar characteristics. This approach allows for efficient clustering of objects with more than one cluster characteristic. On the other hand, our approach will enable us to combine boosting type ensemble algorithms, which are various forms of assemblies that are widely used in machine learning due to their excellent success in the literature. We used a mobile game’s customers’ marketing and gameplay data for predicting their customer lifetime value for testing our approach. Customer lifetime value prediction for users is crucial for determining the marketing cost cap for companies. The findings reveal that using a fuzzy method to ensemble the algorithms outperforms implementing the algorithms individually.


2021 ◽  
Vol 11 (9) ◽  
pp. 4280
Author(s):  
Iurii Katser ◽  
Viacheslav Kozitsin ◽  
Victor Lobachev ◽  
Ivan Maksimov

Offline changepoint detection (CPD) algorithms are used for signal segmentation in an optimal way. Generally, these algorithms are based on the assumption that signal’s changed statistical properties are known, and the appropriate models (metrics, cost functions) for changepoint detection are used. Otherwise, the process of proper model selection can become laborious and time-consuming with uncertain results. Although an ensemble approach is well known for increasing the robustness of the individual algorithms and dealing with mentioned challenges, it is weakly formalized and much less highlighted for CPD problems than for outlier detection or classification problems. This paper proposes an unsupervised CPD ensemble (CPDE) procedure with the pseudocode of the particular proposed ensemble algorithms and the link to their Python realization. The approach’s novelty is in aggregating several cost functions before the changepoint search procedure running during the offline analysis. The numerical experiment showed that the proposed CPDE outperforms non-ensemble CPD procedures. Additionally, we focused on analyzing common CPD algorithms, scaling, and aggregation functions, comparing them during the numerical experiment. The results were obtained on the two anomaly benchmarks that contain industrial faults and failures—Tennessee Eastman Process (TEP) and Skoltech Anomaly Benchmark (SKAB). One of the possible applications of our research is the estimation of the failure time for fault identification and isolation problems of the technical diagnostics.


2021 ◽  
Vol 13 (8) ◽  
pp. 1595
Author(s):  
Chunhua Li ◽  
Lizhi Zhou ◽  
Wenbin Xu

Wetland vegetation aboveground biomass (AGB) directly indicates wetland ecosystem health and is critical for water purification, carbon cycle, and biodiversity conservation. Accurate AGB estimation is essential for the monitoring and supervision of ecosystems, especially in seasonal floodplain wetlands. This paper explored the capability of spectral and texture features from the Sentinel-2 Multispectral Instrument (MSI) for modeling grassland AGB using random forest (RF) and extreme gradient boosting (XGBoost) algorithms in Shengjin Lake wetland (a Ramsar site). We use five-fold cross-validation to verify the model effectiveness. The results indicated that the RF and XGBoost models had a robust and efficient performance (with root mean square error (RMSE) of 126.571 g·m−2 and R2 of 0.844 for RF, RMSE of 112.425 g·m−2 and R2 of 0.869 for XGBoost), and the XGBoost models, by contrast, performed better. Both traditional and red-edge vegetation indices (VIs) obtained satisfactory results of AGB estimation (RMSE = 127.936 g·m−2, RMSE = 125.879 g·m−2 in XGBoost models, respectively), with the red-edge VIs contributed more to the AGB models. Moreover, we selected eight gray-level co-occurrence matrix (GLCM) textures calculated by four processing window sizes using the mean value of four offsets, and further analyzed the results of three analysis sets. Textures derived from traditional and red-edge bands using a 7 × 7 window size performed better in biomass estimation. This finding suggested that textures derived from the traditional bands were as important as the red-edge bands. The introduction of textures moderately improved the accuracy of modeling AGB, whereas the use of textures alo ne was not satisfactory. This research demonstrated that using the Sentinel-2 MSI and the two ensemble algorithms is an effective method for long-term dynamic monitoring and assessment of grass AGB in seasonal floodplain wetlands, which can support sustainable management and carbon accounting of wetland ecosystems.


2021 ◽  
Author(s):  
Noor Azmiya Bt Sirajun Noor ◽  
Irraivan Elamvazuthi ◽  
Norashikin Yahya

2019 ◽  
Vol 11 (17) ◽  
pp. 2057 ◽  
Author(s):  
Majid Shadman Roodposhti ◽  
Arko Lucieer ◽  
Asim Anees ◽  
Brett Bryan

This paper assesses the performance of DoTRules—a dictionary of trusted rules—as a supervised rule-based ensemble framework based on the mean-shift segmentation for hyperspectral image classification. The proposed ensemble framework consists of multiple rule sets with rules constructed based on different class frequencies and sequences of occurrences. Shannon entropy was derived for assessing the uncertainty of every rule and the subsequent filtering of unreliable rules. DoTRules is not only a transparent approach for image classification but also a tool to map rule uncertainty, where rule uncertainty assessment can be applied as an estimate of classification accuracy prior to image classification. In this research, the proposed image classification framework is implemented using three world reference hyperspectral image datasets. We found that the overall accuracy of classification using the proposed ensemble framework was superior to state-of-the-art ensemble algorithms, as well as two non-ensemble algorithms, at multiple training sample sizes. We believe DoTRules can be applied more generally to the classification of discrete data such as hyperspectral satellite imagery products.


2020 ◽  
Vol 12 (3) ◽  
pp. 1016 ◽  
Author(s):  
Jiwu Wang ◽  
Nina Liu ◽  
Yichen Ruan

Innovation is an inevitable way for cities to achieve sustainable development. The occurrence of innovation activities is a complex systemic behavior. Its spatial distribution has some location selection laws, which are the result of interaction and feedback between various spatial influence factors. We explain the impact mechanism from the microscale using a street unit in a city. Hangzhou was selected as a case study. First, we systematically selected factors influencing the spatial distribution of innovation activities as the independent variable based on the demands of innovation subjects. Patents were used as the dependent variable to represent the spatial distribution of innovation activities. Second, ensemble algorithms (Boosting) were used to analyze the influence contribution of independent variables to dependent variables. Then, based on the aspects of innovation driving force, which are innovation resources and innovation environments, relevant factors were divided into the following seven categories: innovation industry concentration, knowledge intensity, innovative talent resources, service facilities, external transportation convenience, public transportation convenience, and ecological environment. We interpreted the impact mechanism and made corresponding suggestions for urban innovation space planning.


Sign in / Sign up

Export Citation Format

Share Document