scholarly journals Diagnostic Evaluation of Policy-Gradient-Based Ranking

Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 37
Author(s):  
Hai-Tao Yu ◽  
Degen Huang ◽  
Fuji Ren ◽  
Lishuang Li

Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been a strong and continuing interest in exploring how to deploy popular techniques, such as reinforcement learning and adversarial learning, to solve ranking problems. However, armed with the aforesaid popular techniques, most studies tend to show how effective a new method is. A comprehensive comparison between techniques and an in-depth analysis of their deficiencies are somehow overlooked. This paper is motivated by the observation that recent ranking methods based on either reinforcement learning or adversarial learning boil down to policy-gradient-based optimization. Based on the widely used benchmark collections with complete information (where relevance labels are known for all items), such as MSLRWEB30K and Yahoo-Set1, we thoroughly investigate the extent to which policy-gradient-based ranking methods are effective. On one hand, we analytically identify the pitfalls of policy-gradient-based ranking. On the other hand, we experimentally compare a wide range of representative methods. The experimental results echo our analysis and show that policy-gradient-based ranking methods are, by a large margin, inferior to many conventional ranking methods. Regardless of whether we use reinforcement learning or adversarial learning, the failures are largely attributable to the gradient estimation based on sampled rankings, which significantly diverge from ideal rankings. In particular, the larger the number of documents per query and the more fine-grained the ground-truth labels, the greater the impact policy-gradient-based ranking suffers. Careful examination of this weakness is highly recommended for developing enhanced methods based on policy gradient.

Nanophotonics ◽  
2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Sean Hooten ◽  
Raymond G. Beausoleil ◽  
Thomas Van Vaerenbergh

Abstract We present a proof-of-concept technique for the inverse design of electromagnetic devices motivated by the policy gradient method in reinforcement learning, named PHORCED (PHotonic Optimization using REINFORCE Criteria for Enhanced Design). This technique uses a probabilistic generative neural network interfaced with an electromagnetic solver to assist in the design of photonic devices, such as grating couplers. We show that PHORCED obtains better performing grating coupler designs than local gradient-based inverse design via the adjoint method, while potentially providing faster convergence over competing state-of-the-art generative methods. As a further example of the benefits of this method, we implement transfer learning with PHORCED, demonstrating that a neural network trained to optimize 8° grating couplers can then be re-trained on grating couplers with alternate scattering angles while requiring >10× fewer simulations than control cases.


2019 ◽  
Vol 141 (8) ◽  
Author(s):  
Riccardo Da Soghe ◽  
Cosimo Bianchini ◽  
Jacopo D’Errico ◽  
Lorenzo Tarchi

Impinging jet arrays are typically used to cool several gas turbine parts. Some examples of such applications can be found in the internal cooling of high-pressure turbine airfoils or in the turbine blade tip clearances control of aero-engines. The effect of the wall-to-jets temperature ratio (TR) on heat transfer is generally neglected by the correlations available in the open literature. In the present contribution, the impact of the temperature ratio on the heat transfer for a real engine active clearance control system is analyzed by means of validated computational fluid dynamics (CFD) computations. At different jets Reynolds number and considering several impingement array arrangements, a wide range of target wall-to-jets temperature ratio is accounted for. Computational results prove that both local and averaged Nusselt numbers reduce with increasing. An in-depth analysis of the numerical data shows that the last mentioned evidence is motivated by both the heat transfer incurring between the spent coolant flow and the fresh jets and the variation of gas properties with temperature through the boundary layer. A scaling procedure, based on the TR power law, was proposed to estimate the Nusselt number at different wall temperature levels necessary to correct available open-literature correlations, typically developed with small temperature differences, for real engine applications.


Author(s):  
Wenjie Shi ◽  
Shiji Song ◽  
Cheng Wu

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.


Author(s):  
Lisa-Maria N. Neudert

As concerns over misinformation, political bots, and the impact of social media on public discourse manifest in Germany, this chapter explores the role of computational propaganda in and around German politics. The research sheds light on how algorithms, automation, and big data are leveraged to manipulate the German public, presenting real-time social media data and rich evidence from interviews with a wide range of German Internet experts—bot developers, policymakers, cyberwarfare specialists, victims of automated attacks, and social media moderators. In addition, the chapter examines how the ongoing public debate surrounding the threats of right-wing political currents and foreign election interference in the Federal Election 2017 has created sentiments of concern and fear. Imposed regulation, multi-stakeholder actionism, and sustained media attention remain unsubstantiated by empirical findings of computational propaganda. The chapter provides an in-depth analysis of social media discourse during the German parliamentary election 2016. Pioneering the methodological assessment of the magnitude of automation and junk news, the author finds limited evidence of computational propaganda in Germany. The author concludes that the impact of computational propaganda, nonetheless, is substantial in Germany, promoting a dispersed civic debate, political vigilance, and restrictive countermeasures that leave a deep imprint on the freedom and openness of the public discourse in Germany.


Author(s):  
Riccardo Da Soghe ◽  
Cosimo Bianchini ◽  
Jacopo D’Errico ◽  
Lorenzo Tarchi

Impinging jet arrays are typically used to cool several gas turbine parts. Some examples of such applications can be found in the internal cooling of high pressure turbine airfoils or in the turbine blade tip clearances control of aero-engines. The effect of wall-to-jets temperature ratio on heat transfer is generally neglected by the correlations available in the open literature. In present contribution, the impact of the temperature ratio on the heat transfer for a real engine Active Clearance Control system, is analyzed by means of validated CFD computations. At different jets Reynolds number and considering several impingement array arrangements, a wide range of target wall-to-jets temperature ratio (TR) is accounted for. Computational results prove that both local and averaged Nusselt numbers reduce with increasing temperature ratios. An in-depth analysis of the numerical data show that the last mentioned evidence is motivated by both the heat transfer incurring between the spent coolant flow and the fresh jets and the variation of gas properties with temperature through the boundary layer. A scaling procedure, based on TR power law, was proposed to estimate the Nusselt number at different wall temperature levels necessary to correct available open-literature correlations, typically developed with small temperature differences, for real engine applications.


Author(s):  
Muhammad Masood ◽  
Finale Doshi-Velez

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently explore the space of reasonable solutions.  Unfortunately, existing approaches that quantify uncertainty over policies are not ultimately relevant to finding policies with qualitatively distinct behaviors.  In this work, we formalize the difference between policies as a difference between the distribution of trajectories induced by each policy, which encourages diversity with respect to both state visitation and action choices.  We derive a gradient-based optimization technique that can be combined with existing policy gradient methods to now identify diverse collections of well-performing policies.  We demonstrate our approach on benchmarks and a healthcare task.


2018 ◽  
Vol 56 (1) ◽  
pp. 26-46 ◽  
Author(s):  
Shufang Huang ◽  
Jin Chen ◽  
Liang Liang

Purpose The link between openness and innovative performance has been established as an inverted U-shape relationship, namely, the openness-performance connection is not always positive. The purpose of this paper is to introduce the concept of partner heterogeneity to characterize the influence of “quality” changes in partners on innovative performance, that is, the focus of this paper. Given that partner heterogeneity is crucial in explaining open innovative performance, it is also worth placing the examination of this key construct in emerging regions such as China. Design/methodology/approach The sample selection of this study covers a wide range of industries, but requires that the sample firms be manufacturing enterprises with an open innovation strategy. With opportunities and challenges associated with partner collaboration toward open innovation, the Chinese province of Zhejiang has established its reputation. Thus, empirical data were collected randomly from data pool of Zhejiang Province Economic and Information Commission, as well as a survey questionnaire. Data were using a cross-sectional survey methodology encompassing diverse organizations, industries, and nations. Findings Empirical testing of this assumption in a sample of 217 manufacturing firms indicates that partner heterogeneities, which are classified as organizational heterogeneity, industry heterogeneity, and national heterogeneity are all positively associated with innovative performance, but the strength of this association is influenced by environmental turbulence. Technological turbulence significantly and positively modulates the relationships of organizational and national heterogeneities with innovative performance. Market turbulence also plays a significant positive role on the relationship between national heterogeneity and innovative performance, while technological and market turbulence roles on the relationship between industry heterogeneity and innovative performance are not confirmed. Originality/value This paper refines the connotative dimensions of partner heterogeneity around the core concept of partner heterogeneity in open innovation in the context of emerging region, China. The study presents a systematic, in-depth analysis, and verifies the impact mechanisms of partner heterogeneity in open innovation on innovative performance by integrating the resource-based view, organizational learning theory, and transaction cost theory.


2018 ◽  
Vol 51 (22) ◽  
pp. 405-411 ◽  
Author(s):  
Tamás Bécsi ◽  
Szilárd Aradi ◽  
Ádám Szabó ◽  
Péter Gáspár

Sign in / Sign up

Export Citation Format

Share Document