evaluation methodology
Recently Published Documents





2022 ◽  
Vol 16 (3) ◽  
pp. 1-32
Junchen Jin ◽  
Mark Heimann ◽  
Di Jin ◽  
Danai Koutra

While most network embedding techniques model the proximity between nodes in a network, recently there has been significant interest in structural embeddings that are based on node equivalences , a notion rooted in sociology: equivalences or positions are collections of nodes that have similar roles—i.e., similar functions, ties or interactions with nodes in other positions—irrespective of their distance or reachability in the network. Unlike the proximity-based methods that are rigorously evaluated in the literature, the evaluation of structural embeddings is less mature. It relies on small synthetic or real networks with labels that are not perfectly defined, and its connection to sociological equivalences has hitherto been vague and tenuous. With new node embedding methods being developed at a breakneck pace, proper evaluation, and systematic characterization of existing approaches will be essential to progress. To fill in this gap, we set out to understand what types of equivalences structural embeddings capture. We are the first to contribute rigorous intrinsic and extrinsic evaluation methodology for structural embeddings, along with carefully-designed, diverse datasets of varying sizes. We observe a number of different evaluation variables that can lead to different results (e.g., choice of similarity measure, classifier, and label definitions). We find that degree distributions within nodes’ local neighborhoods can lead to simple yet effective baselines in their own right and guide the future development of structural embedding. We hope that our findings can influence the design of further node embedding methods and also pave the way for more comprehensive and fair evaluation of structural embedding methods.

2022 ◽  
Vol 40 (1) ◽  
pp. 1-22
Amir H. Jadidinejad ◽  
Craig Macdonald ◽  
Iadh Ounis

Recommendation systems are often evaluated based on user’s interactions that were collected from an existing, already deployed recommendation system. In this situation, users only provide feedback on the exposed items and they may not leave feedback on other items since they have not been exposed to them by the deployed system. As a result, the collected feedback dataset that is used to evaluate a new model is influenced by the deployed system, as a form of closed loop feedback. In this article, we show that the typical offline evaluation of recommender systems suffers from the so-called Simpson’s paradox. Simpson’s paradox is the name given to a phenomenon observed when a significant trend appears in several different sub-populations of observational data but disappears or is even reversed when these sub-populations are combined together. Our in-depth experiments based on stratified sampling reveal that a very small minority of items that are frequently exposed by the deployed system plays a confounding factor in the offline evaluation of recommendation systems. In addition, we propose a novel evaluation methodology that takes into account the confounder, i.e., the deployed system’s characteristics. Using the relative comparison of many recommendation models as in the typical offline evaluation of recommender systems, and based on the Kendall rank correlation coefficient, we show that our proposed evaluation methodology exhibits statistically significant improvements of 14% and 40% on the examined open loop datasets (Yahoo! and Coat), respectively, in reflecting the true ranking of systems with an open loop (randomised) evaluation in comparison to the standard evaluation.

Dimitrios Dimitriou ◽  
Maria Sartzetaki

In most cases, the decision to invest in a new airport is not simple, mainly because of the complications in the planning process, the amount of capital that needs to be invested before the establishment of the business, and the number of stakeholders involved in the decision. The decision process is more complicated in restricted economic and financing conditions, where the performance of the business plan is strongly related to regional development prospects and future airport business outputs in the medium and long term. This paper provides an evaluation methodology approach to support decisions on airport development projects. The proposed methodology provides an evaluation framework based on a combination of an ex ante assessment analysis, considering the airport’s economic impact and its contribution to a specific regional economy. The Input–Output (IO) analysis framework is used to determine the economic footprint of the airport development. A series of key performance indicators (KPIs) are introduced to review the project performance in a given economic system. The case study is examined, focussing on a new airport at Heraklion in Crete (in the Kasteli valley), one of the most attractive tourist destinations in the southeast Mediterranean. Conventional wisdom is to present a systematic approach appropriate to relevant projects, providing essential tools that support decisions at the level of strategic planning. The approach is essential to provide key messages to national governments, decision makers, and stakeholders on the contribution of an airport investment to regional economic development and its contribution to the business ecosystem in the post-COVID-19 era.

2022 ◽  
Vol 2022 ◽  
pp. 1-12
Guangjun Sun ◽  
Zhijie Yuan ◽  
Bingyan Wu ◽  
Fu Zhao

The actual earthquake resistance performance and the seismic damage state of bridges during future earthquakes are important issues that need to be resolved. Using an expressway reinforced concrete (RC) girder bridge in a high seismic intensity area of China as the research object, the damage correlation between different structural components of the bridge is analyzed, and the key components that determine the structural safety state of the bridge are determined. Then, the safety evaluation indexes of the bridge pier and bearing are researched, and a two-stage seismic safety evaluation methodology for RC girder bridges is proposed. The first stage is a rapid and general evaluation using empirical statistical methods, and the second stage is a precise evaluation obtained by calculating the damage index of the components. Subsequently, the seismic damage prediction matrix is presented. Considering the modification of the bridge span number, service life, and skew angle, a seismic safety evaluation from a typical single bridge to a group of bridges of the same type is implemented. Finally, an actual expressway bridge in China is presented as a numerical example to illustrate the application of the method. The research results show that damage to the key components, including bearings, piers, and abutments, is the deciding factor of the bridge damage state. The seismic damage states of piers and bearings can be conveniently assessed according to the pier top displacement angle and bearing shear deformation during earthquakes. According to the suggested standard of RC girder bridge seismic damage, the seismic safety evaluation of the whole bridge structure can be obtained using the seismic safety evaluation of individual key components of the bridge structure. According to the evaluation results of individual bridges and considering the modification of influencing factors, an earthquake performance evaluation of a group of bridges of the same type can be obtained. The two-stage seismic safety evaluation methodology proposed in this study is effective and efficient.

2022 ◽  
pp. 147715352110515
Z Li ◽  
F Zhang ◽  
X Song ◽  
R Dang

Spectral energy radiated by light sources is the primary source of colour damage in highly photosensitive artworks (HPAs). However, spectral power distributions differ for different light sources, and the absorption and reflection characteristics of different materials, when irradiated by each narrow spectral energy band, also differ. This could result in large differences in the degree of radiation damage for materials under the same lighting intensity. In this paper, the suitability of different light sources used to illuminate HPAs was experimentally investigated over a long period of time by irradiating nine types of typical HPA materials with 10 different narrow-band light sources. By analysing the colour difference data of the illuminated material against the amount of exposure, a mathematical model relating the spectral composition and the damage to the colour of HPA materials was obtained. Based on this, a colour damage evaluation equation for light sources used for lighting HPAs was proposed. Finally, the equations were discussed using an example.

10.2196/30474 ◽  
2022 ◽  
Vol 9 (1) ◽  
pp. e30474
Alex Mariakakis ◽  
Ravi Karkar ◽  
Shwetak N Patel ◽  
Julie A Kientz ◽  
James Fogarty ◽  

Background Developers, designers, and researchers use rapid prototyping methods to project the adoption and acceptability of their health intervention technology (HIT) before the technology becomes mature enough to be deployed. Although these methods are useful for gathering feedback that advances the development of HITs, they rarely provide usable evidence that can contribute to our broader understanding of HITs. Objective In this research, we aim to develop and demonstrate a variation of vignette testing that supports developers and designers in evaluating early-stage HIT designs while generating usable evidence for the broader research community. Methods We proposed a method called health concept surveying for untangling the causal relationships that people develop around conceptual HITs. In health concept surveying, investigators gather reactions to design concepts through a scenario-based survey instrument. As the investigator manipulates characteristics related to their HIT, the survey instrument also measures proximal cognitive factors according to a health behavior change model to project how HIT design decisions may affect the adoption and acceptability of an HIT. Responses to the survey instrument were analyzed using path analysis to untangle the causal effects of these factors on the outcome variables. Results We demonstrated health concept surveying in 3 case studies of sensor-based health-screening apps. Our first study (N=54) showed that a wait time incentive could influence more people to go see a dermatologist after a positive test for skin cancer. Our second study (N=54), evaluating a similar application design, showed that although visual explanations of algorithmic decisions could increase participant trust in negative test results, the trust would not have been enough to affect people’s decision-making. Our third study (N=263) showed that people might prioritize test specificity or sensitivity depending on the nature of the medical condition. Conclusions Beyond the findings from our 3 case studies, our research uses the framing of the Health Belief Model to elicit and understand the intrinsic and extrinsic factors that may affect the adoption and acceptability of an HIT without having to build a working prototype. We have made our survey instrument publicly available so that others can leverage it for their own investigations.

Axioms ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 17
Fuguang Bao ◽  
Linghao Mao ◽  
Yiling Zhu ◽  
Cancan Xiao ◽  
Chonghuan Xu

At present, association rules have been widely used in prediction, personalized recommendation, risk analysis and other fields. However, it has been pointed out that the traditional framework to evaluate association rules, based on Support and Confidence as measures of importance and accuracy, has several drawbacks. Some papers presented several new evaluation methods; the most typical methods are Lift, Improvement, Validity, Conviction, Chi-square analysis, etc. Here, this paper first analyzes the advantages and disadvantages of common measurement indicators of association rules and then puts forward four new measure indicators (i.e., Bi-support, Bi-lift, Bi-improvement, and Bi-confidence) based on the analysis. At last, this paper proposes a novel Bi-directional interestingness measure framework to improve the traditional one. In conclusion, the bi-directional interestingness measure framework (Bi-support and Bi-confidence framework) is superior to the traditional ones in the aspects of the objective criterion, comprehensive definition, and practical application.

2021 ◽  
Vol 39 (6) ◽  
pp. 826-837
Jaehong PARK ◽  
Gunwoo LEE ◽  
Cheol OH ◽  
Jae Hun KIM ◽  
Dukgeun YUN

2021 ◽  
Vol 11 (3-4) ◽  
pp. 1-45
Sina Mohseni ◽  
Niloofar Zarei ◽  
Eric D. Ragan

The need for interpretable and accountable intelligent systems grows along with the prevalence of artificial intelligence ( AI ) applications used in everyday life. Explainable AI ( XAI ) systems are intended to self-explain the reasoning behind system decisions and predictions. Researchers from different disciplines work together to define, design, and evaluate explainable systems. However, scholars from different disciplines focus on different objectives and fairly independent topics of XAI research, which poses challenges for identifying appropriate design and evaluation methodology and consolidating knowledge across efforts. To this end, this article presents a survey and framework intended to share knowledge and experiences of XAI design and evaluation methods across multiple disciplines. Aiming to support diverse design goals and evaluation methods in XAI research, after a thorough review of XAI related papers in the fields of machine learning, visualization, and human-computer interaction, we present a categorization of XAI design goals and evaluation methods. Our categorization presents the mapping between design goals for different XAI user groups and their evaluation methods. From our findings, we develop a framework with step-by-step design guidelines paired with evaluation methods to close the iterative design and evaluation cycles in multidisciplinary XAI teams. Further, we provide summarized ready-to-use tables of evaluation methods and recommendations for different goals in XAI research.

Sign in / Sign up

Export Citation Format

Share Document