scholarly journals Iterative Compilation Optimization Based on Metric Learning and Collaborative Filtering

2022 ◽  
Vol 19 (1) ◽  
pp. 1-25
Author(s):  
Hongzhi Liu ◽  
Jie Luo ◽  
Ying Li ◽  
Zhonghai Wu

Pass selection and phase ordering are two critical compiler auto-tuning problems. Traditional heuristic methods cannot effectively address these NP-hard problems especially given the increasing number of compiler passes and diverse hardware architectures. Recent research efforts have attempted to address these problems through machine learning. However, the large search space of candidate pass sequences, the large numbers of redundant and irrelevant features, and the lack of training program instances make it difficult to learn models well. Several methods have tried to use expert knowledge to simplify the problems, such as using only the compiler passes or subsequences in the standard levels (e.g., -O1, -O2, and -O3) provided by compiler designers. However, these methods ignore other useful compiler passes that are not contained in the standard levels. Principal component analysis (PCA) and exploratory factor analysis (EFA) have been utilized to reduce the redundancy of feature data. However, these unsupervised methods retain all the information irrelevant to the performance of compilation optimization, which may mislead the subsequent model learning. To solve these problems, we propose a compiler pass selection and phase ordering approach, called Iterative Compilation based on Metric learning and Collaborative filtering (ICMC) . First, we propose a data-driven method to construct pass subsequences according to the observed collaborative interactions and dependency among passes on a given program set. Therefore, we can make use of all available compiler passes and prune the search space. Then, a supervised metric learning method is utilized to retain useful feature information for compilation optimization while removing both the irrelevant and the redundant information. Based on the learned similarity metric, a neighborhood-based collaborative filtering method is employed to iteratively recommend a few superior compiler passes for each target program. Last, an iterative data enhancement method is designed to alleviate the problem of lacking training program instances and to enhance the performance of iterative pass recommendations. The experimental results using the LLVM compiler on all 32 cBench programs show the following: (1) ICMC significantly outperforms several state-of-the-art compiler phase ordering methods, (2) it performs the same or better than the standard level -O3 on all the test programs, and (3) it can reach an average performance speedup of 1.20 (up to 1.46) compared with the standard level -O3.

2021 ◽  
Vol 15 (6) ◽  
pp. 1-20
Author(s):  
Dongsheng Li ◽  
Haodong Liu ◽  
Chao Chen ◽  
Yingying Zhao ◽  
Stephen M. Chu ◽  
...  

In collaborative filtering (CF) algorithms, the optimal models are usually learned by globally minimizing the empirical risks averaged over all the observed data. However, the global models are often obtained via a performance tradeoff among users/items, i.e., not all users/items are perfectly fitted by the global models due to the hard non-convex optimization problems in CF algorithms. Ensemble learning can address this issue by learning multiple diverse models but usually suffer from efficiency issue on large datasets or complex algorithms. In this article, we keep the intermediate models obtained during global model learning as the snapshot models, and then adaptively combine the snapshot models for individual user-item pairs using a memory network-based method. Empirical studies on three real-world datasets show that the proposed method can extensively and significantly improve the accuracy (up to 15.9% relatively) when applied to a variety of existing collaborative filtering methods.


2021 ◽  
Vol 11 (14) ◽  
pp. 6387
Author(s):  
Li Xu ◽  
Jianzhong Hu

Active infrared thermography (AIRT) is a significant defect detection and evaluation method in the field of non-destructive testing, on account of the fact that it promptly provides visual information and that the results could be used for quantitative research of defects. At present, the quantitative evaluation of defects is an urgent problem to be solved in this field. In this work, a defect depth recognition method based on gated recurrent unit (GRU) networks is proposed to solve the problem of insufficient accuracy in defect depth recognition. AIRT is applied to obtain the raw thermal sequences of the surface temperature field distribution of the defect specimen. Before training the GRU model, principal component analysis (PCA) is used to reduce the dimension and to eliminate the correlation of the raw datasets. Then, the GRU model is employed to automatically recognize the depth of the defect. The defect depth recognition performance of the proposed method is evaluated through an experiment on polymethyl methacrylate (PMMA) with flat bottom holes. The results indicate that the PCA-processed datasets outperform the raw temperature datasets in model learning when assessing defect depth characteristics. A comparison with the BP network shows that the proposed method has better performance in defect depth recognition.


2019 ◽  
Vol 3 (2) ◽  
pp. 11-18
Author(s):  
George Mweshi

Extracting useful and novel information from the large amount of collected data has become a necessity for corporations wishing to maintain a competitive advantage. One of the biggest issues in handling these significantly large datasets is the curse of dimensionality. As the dimension of the data increases, the performance of the data mining algorithms employed to mine the data deteriorates. This deterioration is mainly caused by the large search space created as a result of having irrelevant, noisy and redundant features in the data. Feature selection is one of the various techniques that can be used to remove these unnecessary features. Feature selection consequently reduces the dimension of the data as well as the search space which in turn increases the efficiency and the accuracy of the mining algorithms. In this paper, we investigate the ability of Genetic Programming (GP), an evolutionary algorithm searching strategy capable of automatically finding solutions in complex and large search spaces, to perform feature selection. We implement a basic GP algorithm and perform feature selection on 5 benchmark classification datasets from UCI repository. To test the competitiveness and feasibility of the GP approach, we examine the classification performance of four classifiers namely J48, Naives Bayes, PART, and Random Forests using the GP selected features, all the original features and the features selected by the other commonly used feature selection techniques i.e. principal component analysis, information gain, relief-f and cfs. The experimental results show that not only does GP select a smaller set of features from the original features, classifiers using GP selected features achieve a better classification performance than using all the original features. Furthermore, compared to the other well-known feature selection techniques, GP achieves very competitive results.


Processes ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 123 ◽  
Author(s):  
Yuhui Ying ◽  
Zhi Li ◽  
Minglei Yang ◽  
Wenli Du

In the traditional performance assessment method, different modes of data are classified mainly by expert knowledge. Thus, human interference is highly probable. The traditional method is also incapable of distinguishing transition data from steady-state data, which reduces the accuracy of the monitor model. To solve these problems, this paper proposes a method of multimode operating performance visualization and nonoptimal cause identification. First, multimode data identification is realized by subtractive clustering algorithm (SCA), which can reduce human influence and eliminate transition data. Then, the multi-space principal component analysis (MsPCA) is used to characterize the independent characteristics of different datasets, which enhances the robustness of the model with respect to the performance of independent variables. Furthermore, a self-organizing map (SOM) is used to train these characteristics and map them into a two-dimensional plane, by which the visualization of the process monitor is realized. For the online assessment, the operating performance of the current process is evaluated according to the projection position of the data on the visual model. Then, the cause of the nonoptimal performance is identified. Finally, the Tennessee Eastman (TE) process is used to verify the effectiveness of the proposed method.


Energies ◽  
2020 ◽  
Vol 13 (14) ◽  
pp. 3530
Author(s):  
Katarzyna Maciejowska ◽  
Bartosz Uniejewski ◽  
Tomasz Serafin

Recently, the development in combining point forecasts of electricity prices obtained with different length of calibration windows have provided an extremely efficient and simple tool for improving predictive accuracy. However, the proposed methods are strongly dependent on expert knowledge and may not be directly transferred from one to another model or market. Hence, we consider a novel extension and propose to use principal component analysis (PCA) to automate the procedure of averaging over a rich pool of predictions. We apply PCA to a panel of over 650 point forecasts obtained for different calibration windows length. The robustness of the approach is evaluated with three different forecasting tasks, i.e., forecasting day-ahead prices, forecasting intraday ID3 prices one day in advance, and finally very short term forecasting of ID3 prices (i.e., six hours before delivery). The empirical results are compared using the Mean Absolute Error measure and Giacomini and White test for conditional predictive ability (CPA). The results indicate that PCA averaging not only yields significantly more accurate forecasts than individual predictions but also outperforms other forecast averaging schemes.


2020 ◽  
Vol 47 (2) ◽  
pp. 113-122 ◽  
Author(s):  
Fabio Carvalho ◽  
Kerry A Brown ◽  
Adam D Gordon ◽  
Gabriel U Yesuf ◽  
Marie Jeanne Raherilalao ◽  
...  

AbstractDespite their legal protection status, protected areas (PAs) can benefit from priority ranks when ongoing threats to their biodiversity and habitats outpace the financial resources available for their conservation. It is essential to develop methods to prioritize PAs that are not computationally demanding in order to suit stakeholders in developing countries where technical and financial resources are limited. We used expert knowledge-derived biodiversity measures to generate individual and aggregate priority ranks of 98 mostly terrestrial PAs on Madagascar. The five variables used were state of knowledge (SoK), forest loss, forest loss acceleration, PA size and relative species diversity, estimated by using standardized residuals from negative binomial models of SoK regressed onto species diversity. We compared our aggregate ranks generated using unweighted averages and principal component analysis (PCA) applied to each individual variable with those generated via Markov chain (MC) and PageRank algorithms. SoK significantly affected the measure of species diversity and highlighted areas where more research effort was needed. The unweighted- and PCA-derived ranks were strongly correlated, as were the MC and PageRank ranks. However, the former two were weakly correlated with the latter two. We recommend using these methods simultaneously in order to provide decision-makers with the flexibility to prioritize those PAs in need of additional research and conservation efforts.


Hacquetia ◽  
2013 ◽  
Vol 12 (2) ◽  
pp. 23-37
Author(s):  
Richard Hrivnák ◽  
Jaroslav Košťál ◽  
Michal Slezák ◽  
Anna Petrášová ◽  
Melánia Feszterová

Abstract In some regions of Slovakia, black alder forest vegetation has not been documented appropriately yet. This paper is the first vegetation study presenting the phytosociological data and measured environmental parameters from the western part of central Slovakia. The data set was classified by using a modified TWINSPAN algorithm, which allowed us to discern floristically and ecologically distinctive plant communities. They correspond to the associations Stellario nemorum-Alnetum glutinosae Lohmeyer 1957 (riparian alder vegetation on mesic to humid sites along small brooks) and Carici acutiformis-Alnetum glutinosae Scamoni 1935 (eutrophic black alder carr forests in the colline zone) with the variants of Ligustrum vulgare and Galium palustre. The community Carici elongatae-Alnetum glutinosae Schwickerath 1933 (mesotrophic to eutrophic alder carr vegetation growing on permanently waterlogged soils), documented only with two phytosociological relevés, was distinguished following expert knowledge. A floristic and ecological pattern of these associations is presented. The major compositional gradients were interpreted based on Ellenberg’s indicator values and the values of environmental variables recorded during the field sampling in the growing season 2011. The principal component analysis revealed the importance of soil moisture, light availability, portion of open water and soil surface for species composition variability at the association level, whereas the variants of Carici acutiformis-Alnetum glutinosae were sorted along the acidity gradient.


Author(s):  
Ramon Fraga Pereira ◽  
Mor Vered ◽  
Felipe Meneguzzi ◽  
Miquel Ramírez

This paper revisits probabilistic, model-based goal recognition to study the implications of the use of nominal models to estimate the posterior probability distribution over a finite set of hypothetical goals. Existing model-based approaches rely on expert knowledge to produce symbolic descriptions of the dynamic constraints domain objects are subject to, and these are assumed to produce correct predictions. We abandon this assumption to consider the use of nominal models that are learnt from observations on transitions of systems with unknown dynamics. Leveraging existing work on the acquisition of domain models via learning for Hybrid Planning we adapt and evaluate existing goal recognition approaches to analyze how prediction errors, inherent to system dynamics identification and model learning techniques have an impact over recognition error rates.


Author(s):  
Kazimierz Kiełkowicz ◽  
Damian Grela

<p>Growing popularity of the Bat Algorithm has encouraged researchers to focus their work on its further improvements. Most work has been done within the area of hybridization of Bat Algorithm with other metaheuristics or local search methods. Unfortunately, most of these modifications not only improves the quality of obtained solutions, but also increases the number of control parameters that are needed to be set in order to obtain solutions of expected quality. This makes such solutions quite impractical. What more, there is no clear indication what these parameters do in term of a search process. In this paper authors are trying to incorporate Mamdani type Fuzzy Logic Controller (FLC) to tackle some of these mentioned shortcomings by using the FLC to control the exploration phase of a bio-inspired metaheuristic. FLC also allows us to incorporate expert knowledge about the problem at hand and define expected behaviors of system – here process of searching in multidimensional search space by modeling the process of bats hunting for their prey.</p>


2017 ◽  
Vol 2017 (45) ◽  
pp. 96-103
Author(s):  
V.V. Lytvyn ◽  
◽  
R.V. Vovnjanka ◽  
D.G. Dosyn ◽  
◽  
...  

The solution of the applied task of constructing intelligent agents (IA) of action planning is proposed. The mathematical support of functioning of intellectual agents of action planning on the basis of ontologies is developed, which made it possible to formalize the behavior of such agents in the state space. The use of ontologies allows narrowing the search space for path from the initial state to the target state, rejecting irrelevant alternatives. A method of narrowing the search area for optimal IA activity is proposed. To assess the reaction of the environment on the behaviour of the IA a method based on reinforcement learning is developed. The two-criterion optimization problem of dynamic programming is formulated, which is solved by one of the iterative methods – by principal component analysis or by the multiple criterion method, depending on the possibility to numerically estimate the target functions of this optimization problem. The architecture of the system of planning the actions of specialized intelligence agents is proposed. It consists of an ontology that contains ontology of tasks, the solution of which is aimed at the functioning of a specialized IA, and a domain ontology, which sets out alternatives to solving individual subtasks. On the example of the problem of corrosion protection of the water supply or gas pipeline pipe the efficiency of the proposed approach is investigated. The software for the functioning of intelligent action planning agents based on constructed models, methods and algorithms has been developed, which make it possible to implement the individual components and functional modules of intellectual action planning agents on the basis of ontologies.


Sign in / Sign up

Export Citation Format

Share Document