Iterative Compilation Optimization Based on Metric Learning and Collaborative Filtering

Hongzhi Liu; Jie Luo; Ying Li; Zhonghai Wu

doi:10.1145/3480250

Iterative Compilation Optimization Based on Metric Learning and Collaborative Filtering

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3480250 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-25

Author(s):

Hongzhi Liu ◽

Jie Luo ◽

Ying Li ◽

Zhonghai Wu

Keyword(s):

Collaborative Filtering ◽

Training Program ◽

Expert Knowledge ◽

Metric Learning ◽

Principal Component ◽

Search Space ◽

Model Learning ◽

Iterative Compilation ◽

Standard Level ◽

Phase Ordering

Pass selection and phase ordering are two critical compiler auto-tuning problems. Traditional heuristic methods cannot effectively address these NP-hard problems especially given the increasing number of compiler passes and diverse hardware architectures. Recent research efforts have attempted to address these problems through machine learning. However, the large search space of candidate pass sequences, the large numbers of redundant and irrelevant features, and the lack of training program instances make it difficult to learn models well. Several methods have tried to use expert knowledge to simplify the problems, such as using only the compiler passes or subsequences in the standard levels (e.g., -O1, -O2, and -O3) provided by compiler designers. However, these methods ignore other useful compiler passes that are not contained in the standard levels. Principal component analysis (PCA) and exploratory factor analysis (EFA) have been utilized to reduce the redundancy of feature data. However, these unsupervised methods retain all the information irrelevant to the performance of compilation optimization, which may mislead the subsequent model learning. To solve these problems, we propose a compiler pass selection and phase ordering approach, called Iterative Compilation based on Metric learning and Collaborative filtering (ICMC) . First, we propose a data-driven method to construct pass subsequences according to the observed collaborative interactions and dependency among passes on a given program set. Therefore, we can make use of all available compiler passes and prune the search space. Then, a supervised metric learning method is utilized to retain useful feature information for compilation optimization while removing both the irrelevant and the redundant information. Based on the learned similarity metric, a neighborhood-based collaborative filtering method is employed to iteratively recommend a few superior compiler passes for each target program. Last, an iterative data enhancement method is designed to alleviate the problem of lacking training program instances and to enhance the performance of iterative pass recommendations. The experimental results using the LLVM compiler on all 32 cBench programs show the following: (1) ICMC significantly outperforms several state-of-the-art compiler phase ordering methods, (2) it performs the same or better than the standard level -O3 on all the test programs, and (3) it can reach an average performance speedup of 1.20 (up to 1.46) compared with the standard level -O3.

Download Full-text

NeuSE: A Neural Snapshot Ensemble Method for Collaborative Filtering

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3450526 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-20

Author(s):

Dongsheng Li ◽

Haodong Liu ◽

Chao Chen ◽

Yingying Zhao ◽

Stephen M. Chu ◽

...

Keyword(s):

Collaborative Filtering ◽

Optimization Problems ◽

Empirical Studies ◽

Large Datasets ◽

Model Learning ◽

Global Models ◽

Convex Optimization Problems ◽

Memory Network ◽

Real World Datasets ◽

Performance Tradeoff

In collaborative filtering (CF) algorithms, the optimal models are usually learned by globally minimizing the empirical risks averaged over all the observed data. However, the global models are often obtained via a performance tradeoff among users/items, i.e., not all users/items are perfectly fitted by the global models due to the hard non-convex optimization problems in CF algorithms. Ensemble learning can address this issue by learning multiple diverse models but usually suffer from efficiency issue on large datasets or complex algorithms. In this article, we keep the intermediate models obtained during global model learning as the snapshot models, and then adaptively combine the snapshot models for individual user-item pairs using a memory network-based method. Empirical studies on three real-world datasets show that the proposed method can extensively and significantly improve the accuracy (up to 15.9% relatively) when applied to a variety of existing collaborative filtering methods.

Download Full-text

A Method of Defect Depth Recognition in Active Infrared Thermography Based on GRU Networks

Applied Sciences ◽

10.3390/app11146387 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6387

Author(s):

Li Xu ◽

Jianzhong Hu

Keyword(s):

Infrared Thermography ◽

Visual Information ◽

Evaluation Method ◽

Recognition Performance ◽

Principal Component ◽

Defect Depth ◽

Model Learning ◽

Flat Bottom ◽

Bp Network ◽

Active Infrared Thermography

Active infrared thermography (AIRT) is a significant defect detection and evaluation method in the field of non-destructive testing, on account of the fact that it promptly provides visual information and that the results could be used for quantitative research of defects. At present, the quantitative evaluation of defects is an urgent problem to be solved in this field. In this work, a defect depth recognition method based on gated recurrent unit (GRU) networks is proposed to solve the problem of insufficient accuracy in defect depth recognition. AIRT is applied to obtain the raw thermal sequences of the surface temperature field distribution of the defect specimen. Before training the GRU model, principal component analysis (PCA) is used to reduce the dimension and to eliminate the correlation of the raw datasets. Then, the GRU model is employed to automatically recognize the depth of the defect. The defect depth recognition performance of the proposed method is evaluated through an experiment on polymethyl methacrylate (PMMA) with flat bottom holes. The results indicate that the PCA-processed datasets outperform the raw temperature datasets in model learning when assessing defect depth characteristics. A comparison with the BP network shows that the proposed method has better performance in defect depth recognition.

Download Full-text

Feature Selection using Genetic Programming

Zambia ICT Journal ◽

10.33260/zictjournal.v3i2.62 ◽

2019 ◽

Vol 3 (2) ◽

pp. 11-18

Author(s):

George Mweshi

Keyword(s):

Feature Selection ◽

Genetic Programming ◽

Information Gain ◽

Principal Component ◽

Search Space ◽

Classification Performance ◽

The Other ◽

Searching Strategy ◽

Mining Algorithms ◽

Feature Selection Techniques

Extracting useful and novel information from the large amount of collected data has become a necessity for corporations wishing to maintain a competitive advantage. One of the biggest issues in handling these significantly large datasets is the curse of dimensionality. As the dimension of the data increases, the performance of the data mining algorithms employed to mine the data deteriorates. This deterioration is mainly caused by the large search space created as a result of having irrelevant, noisy and redundant features in the data. Feature selection is one of the various techniques that can be used to remove these unnecessary features. Feature selection consequently reduces the dimension of the data as well as the search space which in turn increases the efficiency and the accuracy of the mining algorithms. In this paper, we investigate the ability of Genetic Programming (GP), an evolutionary algorithm searching strategy capable of automatically finding solutions in complex and large search spaces, to perform feature selection. We implement a basic GP algorithm and perform feature selection on 5 benchmark classification datasets from UCI repository. To test the competitiveness and feasibility of the GP approach, we examine the classification performance of four classifiers namely J48, Naives Bayes, PART, and Random Forests using the GP selected features, all the original features and the features selected by the other commonly used feature selection techniques i.e. principal component analysis, information gain, relief-f and cfs. The experimental results show that not only does GP select a smaller set of features from the original features, classifiers using GP selected features achieve a better classification performance than using all the original features. Furthermore, compared to the other well-known feature selection techniques, GP achieves very competitive results.

Download Full-text

Multimode Operating Performance Visualization and Nonoptimal Cause Identification

Processes ◽

10.3390/pr8010123 ◽

2020 ◽

Vol 8 (1) ◽

pp. 123 ◽

Cited By ~ 1

Author(s):

Yuhui Ying ◽

Zhi Li ◽

Minglei Yang ◽

Wenli Du

Keyword(s):

Clustering Algorithm ◽

Expert Knowledge ◽

Operating Performance ◽

Principal Component ◽

Assessment Method ◽

Subtractive Clustering ◽

Self Organizing Map ◽

Visual Model ◽

Performance Visualization ◽

Traditional Performance

In the traditional performance assessment method, different modes of data are classified mainly by expert knowledge. Thus, human interference is highly probable. The traditional method is also incapable of distinguishing transition data from steady-state data, which reduces the accuracy of the monitor model. To solve these problems, this paper proposes a method of multimode operating performance visualization and nonoptimal cause identification. First, multimode data identification is realized by subtractive clustering algorithm (SCA), which can reduce human influence and eliminate transition data. Then, the multi-space principal component analysis (MsPCA) is used to characterize the independent characteristics of different datasets, which enhances the robustness of the model with respect to the performance of independent variables. Furthermore, a self-organizing map (SOM) is used to train these characteristics and map them into a two-dimensional plane, by which the visualization of the process monitor is realized. For the online assessment, the operating performance of the current process is evaluated according to the projection position of the data on the visual model. Then, the cause of the nonoptimal performance is identified. Finally, the Tennessee Eastman (TE) process is used to verify the effectiveness of the proposed method.

Download Full-text

PCA Forecast Averaging—Predicting Day-Ahead and Intraday Electricity Prices

Energies ◽

10.3390/en13143530 ◽

2020 ◽

Vol 13 (14) ◽

pp. 3530

Author(s):

Katarzyna Maciejowska ◽

Bartosz Uniejewski ◽

Tomasz Serafin

Keyword(s):

Predictive Accuracy ◽

Expert Knowledge ◽

Predictive Ability ◽

Principal Component ◽

Absolute Error ◽

Electricity Prices ◽

Short Term ◽

White Test ◽

The Mean ◽

Short Term Forecasting

Recently, the development in combining point forecasts of electricity prices obtained with different length of calibration windows have provided an extremely efficient and simple tool for improving predictive accuracy. However, the proposed methods are strongly dependent on expert knowledge and may not be directly transferred from one to another model or market. Hence, we consider a novel extension and propose to use principal component analysis (PCA) to automate the procedure of averaging over a rich pool of predictions. We apply PCA to a panel of over 650 point forecasts obtained for different calibration windows length. The robustness of the approach is evaluated with three different forecasting tasks, i.e., forecasting day-ahead prices, forecasting intraday ID3 prices one day in advance, and finally very short term forecasting of ID3 prices (i.e., six hours before delivery). The empirical results are compared using the Mean Absolute Error measure and Giacomini and White test for conditional predictive ability (CPA). The results indicate that PCA averaging not only yields significantly more accurate forecasts than individual predictions but also outperforms other forecast averaging schemes.

Download Full-text

Methods for prioritizing protected areas using individual and aggregate rankings

Environmental Conservation ◽

10.1017/s0376892920000090 ◽

2020 ◽

Vol 47 (2) ◽

pp. 113-122 ◽

Cited By ~ 1

Author(s):

Fabio Carvalho ◽

Kerry A Brown ◽

Adam D Gordon ◽

Gabriel U Yesuf ◽

Marie Jeanne Raherilalao ◽

...

Keyword(s):

Species Diversity ◽

Protected Areas ◽

Negative Binomial ◽

Expert Knowledge ◽

Research Effort ◽

Legal Protection ◽

Principal Component ◽

Financial Resources ◽

Forest Loss ◽

Standardized Residuals

AbstractDespite their legal protection status, protected areas (PAs) can benefit from priority ranks when ongoing threats to their biodiversity and habitats outpace the financial resources available for their conservation. It is essential to develop methods to prioritize PAs that are not computationally demanding in order to suit stakeholders in developing countries where technical and financial resources are limited. We used expert knowledge-derived biodiversity measures to generate individual and aggregate priority ranks of 98 mostly terrestrial PAs on Madagascar. The five variables used were state of knowledge (SoK), forest loss, forest loss acceleration, PA size and relative species diversity, estimated by using standardized residuals from negative binomial models of SoK regressed onto species diversity. We compared our aggregate ranks generated using unweighted averages and principal component analysis (PCA) applied to each individual variable with those generated via Markov chain (MC) and PageRank algorithms. SoK significantly affected the measure of species diversity and highlighted areas where more research effort was needed. The unweighted- and PCA-derived ranks were strongly correlated, as were the MC and PageRank ranks. However, the former two were weakly correlated with the latter two. We recommend using these methods simultaneously in order to provide decision-makers with the flexibility to prioritize those PAs in need of additional research and conservation efforts.

Download Full-text

BLACK ALDER DOMINATED FOREST VEGETATION IN THE WESTERN PART OF CENTRAL SLOVAKIA – SPECIES COMPOSITION AND ECOLOGY

Hacquetia ◽

10.2478/hacq-2013-0010 ◽

2013 ◽

Vol 12 (2) ◽

pp. 23-37

Author(s):

Richard Hrivnák ◽

Jaroslav Košťál ◽

Michal Slezák ◽

Anna Petrášová ◽

Melánia Feszterová

Keyword(s):

Species Composition ◽

Expert Knowledge ◽

Light Availability ◽

Principal Component ◽

Environmental Parameters ◽

Soil Surface ◽

Open Water ◽

Forest Vegetation ◽

Data Set ◽

Black Alder

Abstract In some regions of Slovakia, black alder forest vegetation has not been documented appropriately yet. This paper is the first vegetation study presenting the phytosociological data and measured environmental parameters from the western part of central Slovakia. The data set was classified by using a modified TWINSPAN algorithm, which allowed us to discern floristically and ecologically distinctive plant communities. They correspond to the associations Stellario nemorum-Alnetum glutinosae Lohmeyer 1957 (riparian alder vegetation on mesic to humid sites along small brooks) and Carici acutiformis-Alnetum glutinosae Scamoni 1935 (eutrophic black alder carr forests in the colline zone) with the variants of Ligustrum vulgare and Galium palustre. The community Carici elongatae-Alnetum glutinosae Schwickerath 1933 (mesotrophic to eutrophic alder carr vegetation growing on permanently waterlogged soils), documented only with two phytosociological relevés, was distinguished following expert knowledge. A floristic and ecological pattern of these associations is presented. The major compositional gradients were interpreted based on Ellenberg’s indicator values and the values of environmental variables recorded during the field sampling in the growing season 2011. The principal component analysis revealed the importance of soil moisture, light availability, portion of open water and soil surface for species composition variability at the association level, whereas the variants of Carici acutiformis-Alnetum glutinosae were sorted along the acidity gradient.

Download Full-text

Online Probabilistic Goal Recognition over Nominal Models

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/770 ◽

2019 ◽

Cited By ~ 2

Author(s):

Ramon Fraga Pereira ◽

Mor Vered ◽

Felipe Meneguzzi ◽

Miquel Ramírez

Keyword(s):

Expert Knowledge ◽

Error Rates ◽

Prediction Errors ◽

Model Learning ◽

Dynamic Constraints ◽

Posterior Probability Distribution ◽

Model Based ◽

Finite Set ◽

Domain Models ◽

Goal Recognition

This paper revisits probabilistic, model-based goal recognition to study the implications of the use of nominal models to estimate the posterior probability distribution over a finite set of hypothetical goals. Existing model-based approaches rely on expert knowledge to produce symbolic descriptions of the dynamic constraints domain objects are subject to, and these are assumed to produce correct predictions. We abandon this assumption to consider the use of nominal models that are learnt from observations on transitions of systems with unknown dynamics. Leveraging existing work on the acquisition of domain models via learning for Hybrid Planning we adapt and evaluate existing goal recognition approaches to analyze how prediction errors, inherent to system dynamics identification and model learning techniques have an impact over recognition error rates.

Download Full-text

FLC control for tuning exploration phase in bio-inspired metaheuristic

Annales Universitatis Mariae Curie-Sklodowska sectio AI – Informatica ◽

10.17951/ai.2016.16.2.32 ◽

2017 ◽

Vol 16 (2) ◽

pp. 32

Author(s):

Kazimierz Kiełkowicz ◽

Damian Grela

Keyword(s):

Fuzzy Logic Controller ◽

Expert Knowledge ◽

Bat Algorithm ◽

Search Space ◽

Search Process ◽

Control Parameters ◽

Local Search Methods ◽

Multidimensional Search ◽

Exploration Phase

<p>Growing popularity of the Bat Algorithm has encouraged researchers to focus their work on its further improvements. Most work has been done within the area of hybridization of Bat Algorithm with other metaheuristics or local search methods. Unfortunately, most of these modifications not only improves the quality of obtained solutions, but also increases the number of control parameters that are needed to be set in order to obtain solutions of expected quality. This makes such solutions quite impractical. What more, there is no clear indication what these parameters do in term of a search process. In this paper authors are trying to incorporate Mamdani type Fuzzy Logic Controller (FLC) to tackle some of these mentioned shortcomings by using the FLC to control the exploration phase of a bio-inspired metaheuristic. FLC also allows us to incorporate expert knowledge about the problem at hand and define expected behaviors of system – here process of searching in multidimensional search space by modeling the process of bats hunting for their prey.</p>

Download Full-text

Specialized intelligent agents actions planning methods based on ontological approach

Information extraction and processing ◽

10.15407/vidbir2017.45.096 ◽

2017 ◽

Vol 2017 (45) ◽

pp. 96-103

Author(s):

V.V. Lytvyn ◽

◽

R.V. Vovnjanka ◽

D.G. Dosyn ◽

◽

...

Keyword(s):

Intelligent Agents ◽

Optimization Problem ◽

Principal Component ◽

Action Planning ◽

Search Space ◽

Initial State ◽

Ontological Approach ◽

Planning Methods ◽

The Individual ◽

Intelligent Action

The solution of the applied task of constructing intelligent agents (IA) of action planning is proposed. The mathematical support of functioning of intellectual agents of action planning on the basis of ontologies is developed, which made it possible to formalize the behavior of such agents in the state space. The use of ontologies allows narrowing the search space for path from the initial state to the target state, rejecting irrelevant alternatives. A method of narrowing the search area for optimal IA activity is proposed. To assess the reaction of the environment on the behaviour of the IA a method based on reinforcement learning is developed. The two-criterion optimization problem of dynamic programming is formulated, which is solved by one of the iterative methods – by principal component analysis or by the multiple criterion method, depending on the possibility to numerically estimate the target functions of this optimization problem. The architecture of the system of planning the actions of specialized intelligence agents is proposed. It consists of an ontology that contains ontology of tasks, the solution of which is aimed at the functioning of a specialized IA, and a domain ontology, which sets out alternatives to solving individual subtasks. On the example of the problem of corrosion protection of the water supply or gas pipeline pipe the efficiency of the proposed approach is investigated. The software for the functioning of intelligent action planning agents based on constructed models, methods and algorithms has been developed, which make it possible to implement the individual components and functional modules of intellectual action planning agents on the basis of ontologies.

Download Full-text