scholarly journals An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming

2021 ◽  
Vol 22 (6) ◽  
pp. 2847
Author(s):  
Yu Shi ◽  
Jianshen Zhu ◽  
Naveed Ahmed Azam ◽  
Kazuya Haraguchi ◽  
Liang Zhao ◽  
...  

A novel framework for inverse quantitative structure–activity relationships (inverse QSAR) has recently been proposed and developed using both artificial neural networks and mixed integer linear programming. However, classes of chemical graphs treated by the framework are limited. In order to deal with an arbitrary graph in the framework, we introduce a new model, called a two-layered model, and develop a corresponding method. In this model, each chemical graph is regarded as two parts: the exterior and the interior. The exterior consists of maximal acyclic induced subgraphs with bounded height, the interior is the connected subgraph obtained by ignoring the exterior, and the feature vector consists of the frequency of adjacent atom pairs in the interior and the frequency of chemical acyclic graphs in the exterior. Our method is more flexible than the existing method in the sense that any type of graphs can be inferred. We compared the proposed method with an existing method using several data sets obtained from PubChem database. The new method could infer more general chemical graphs with up to 50 non-hydrogen atoms. The proposed inverse QSAR method can be applied to the inference of more general chemical graphs than before.

2020 ◽  
Vol 19 (2) ◽  
pp. 21-35
Author(s):  
Ryan Beal ◽  
Timothy J. Norman ◽  
Sarvapali D. Ramchurn

AbstractThis paper outlines a novel approach to optimising teams for Daily Fantasy Sports (DFS) contests. To this end, we propose a number of new models and algorithms to solve the team formation problems posed by DFS. Specifically, we focus on the National Football League (NFL) and predict the performance of real-world players to form the optimal fantasy team using mixed-integer programming. We test our solutions using real-world data-sets from across four seasons (2014-2017). We highlight the advantage that can be gained from using our machine-based methods and show that our solutions outperform existing benchmarks, turning a profit in up to 81.3% of DFS game-weeks over a season.


2020 ◽  
Vol 32 (3) ◽  
pp. 763-778
Author(s):  
Zhuqi Miao ◽  
Balabhaskar Balasundaram

A γ-quasi-clique in a simple undirected graph refers to a subset of vertices that induces a subgraph with edge density at least γ. When γ equals one, this definition corresponds to a classical clique. When γ is less than one, it relaxes the requirement of all possible edges by the clique definition. Quasi-clique detection has been used in graph-based data mining to find dense clusters, especially in large-scale error-prone data sets in which the clique model can be overly restrictive. The maximum γ-quasi-clique problem, seeking a γ-quasi-clique of maximum cardinality in the given graph, can be formulated as an optimization problem with a linear objective function and a single quadratic constraint in binary variables. This article investigates the Lagrangian dual of this formulation and develops an upper-bounding technique using the geometry of ellipsoids to bound the Lagrangian dual. The tightness of the upper bound is compared with those obtained from multiple mixed-integer programming formulations of the problem via experiments on benchmark instances.


2019 ◽  
Vol 17 (1) ◽  
pp. 1269-1280 ◽  
Author(s):  
Csilla Bujtás ◽  
Pakanun Dokyeesun ◽  
Vesna Iršič ◽  
Sandi Klavžar

Abstract The connected domination game on a graph G is played by Dominator and Staller according to the rules of the standard domination game with the additional requirement that at each stage of the game the selected vertices induce a connected subgraph of G. If Dominator starts the game and both players play optimally, then the number of vertices selected during the game is the connected game domination number of G. Here this invariant is studied on Cartesian product graphs. A general upper bound is proved and demonstrated to be sharp on Cartesian products of stars with paths or cycles. The connected game domination number is determined for Cartesian products of P3 with arbitrary paths or cycles, as well as for Cartesian products of an arbitrary graph with Kk for the cases when k is relatively large. A monotonicity theorem is proved for products with one complete factor. A sharp general lower bound on the connected game domination number of Cartesian products is also established.


2016 ◽  
Vol 99 (113) ◽  
pp. 99-108
Author(s):  
Zoran Maksimovic

We give a new mixed integer linear programming (MILP) formulation for Maximum Degree Bounded Connected Subgraph Problem (MDBCSP). The proposed MILP formulation is the first in literature with polynomial number of constraints. Therefore, it will be possible to solve optimally much more instances before in a reasonable time.


Author(s):  
John Alasdair Warwicker ◽  
Steffen Rebennack

The problem of fitting continuous piecewise linear (PWL) functions to discrete data has applications in pattern recognition and engineering, amongst many other fields. To find an optimal PWL function, the positioning of the breakpoints connecting adjacent linear segments must not be constrained and should be allowed to be placed freely. Although the univariate PWL fitting problem has often been approached from a global optimisation perspective, recently, two mixed-integer linear programming approaches have been presented that solve for optimal PWL functions. In this paper, we compare the two approaches: the first was presented by Rebennack and Krasko [Rebennack S, Krasko V (2020) Piecewise linear function fitting via mixed-integer linear programming. INFORMS J. Comput. 32(2):507–530] and the second by Kong and Maravelias [Kong L, Maravelias CT (2020) On the derivation of continuous piecewise linear approximating functions. INFORMS J. Comput. 32(3):531–546]. Both formulations are similar in that they use binary variables and logical implications modelled by big-[Formula: see text] constructs to ensure the continuity of the PWL function, yet the former model uses fewer binary variables. We present experimental results comparing the time taken to find optimal PWL functions with differing numbers of breakpoints across 10 data sets for three different objective functions. Although neither of the two formulations is superior on all data sets, the presented computational results suggest that the formulation presented by Rebennack and Krasko is faster. This might be explained by the fact that it contains fewer complicating binary variables and sparser constraints. Summary of Contribution: This paper presents a comparison of the mixed-integer linear programming models presented in two recent studies published in the INFORMS Journal on Computing. Because of the similarity of the formulations of the two models, it is not clear which one is preferable. We present a detailed comparison of the two formulations, including a series of comparative experimental results across 10 data sets that appeared across both papers. We hope that our results will allow readers to take an objective view as to which implementation they should use.


2008 ◽  
Vol 86 (5) ◽  
pp. 691-698 ◽  
Author(s):  
J M Holmes ◽  
B V Kozelov ◽  
F Sigernes ◽  
D A Lorentzen ◽  
C S Deehr

Simultaneous optical ground-based observations of auroral Balmer excited hydrogen atoms were performed during the boreal winters of 2002–2003 and 2003–2004 at Ny-Ålesund, Svalbard (NYA 76.26°N 110.98°E geomagnetic) and Longyearbyen, Svalbard (LYR 75.31°N 111.88°E geomagnetic). Balmer α (Hα) with a natural wavelength of 6563Å was detected at Ny-Ålesund, while Balmer β (Hβ) at 4861Å was measured at Longyearbyen. The emissions are well known to originate from precipitating protons whose charge exchanges with the neutral atmosphere lead to a diffuse, Doppler-shifted emission region. Measurements are made using Ebert–Fastie spectrometers that are located 118 km apart on a line of roughly constant geomagnetic longitude, thus making this configuration suitable for studying the variation of proton energy with geomagnetic latitude, the so-called “velocity filter” effect. For two different days, analysis of the spectrometer data sets was performed, yielding in both cases positive energy differences between LYR and NYA in support of the velocity filter concept. To reduce uncertainties in the determined energies obtained from the Doppler profile, distributions of energy difference were constructed using the entire time period (up to 2 h) for each case.PACS Nos.: 92.60.hw 94.20.Ac, 94.30.Aa


2014 ◽  
Vol 70 (a1) ◽  
pp. C286-C286
Author(s):  
Jens Luebben ◽  
Simon Grabowsky ◽  
Alison Edwards ◽  
Wolfgang Morgenroth ◽  
George Sheldrick ◽  
...  

"Anisotropic parametrisation of the thermal displacements of hydrogen atoms in single-crystal X-ray structure refinement is not possible with independent atom model (IAM) scattering factors. This is due to the weak scattering contribution of hydrogen atoms. Only when aspherical scattering factors are used can carefully measured Bragg data provide such information. For conventional structure determinations parameters of ""riding"" hydrogen atoms are frequently constrained to values of their ""parent"" heavy atom. Usually values of 1.2 and 1.5 times X-U_eq are assigned to H-U_iso in these cases. Such constraints yield reasonable structural models for room-temperature data. However, todays small molecule X-Ray diffraction experiments are usually carried out at significantly lower temperatures. To further study the temperature dependence of ADPs we have evaluated several data sets of N-Acetyl-L-4-Hydroxyproline Monohydrate at temperatures ranging from 9 K to 250 K. Methods compared were HAR [1], Invariom refinement [2], time-of-flight Neutron diffraction and the TLS+ONIOM approach [3]. In the TLS+ONIOM approach non-hydrogen ADPs from Invariom refinement provided ADPs for the TLS-fit. Hydrogen atoms in all methods were grouped and analyzed according to their Invariom name. We reach a good agreement of the temperature dependence of H-U_iso/X-U_eq. At very low temperatures the ratio H-U_iso/X-U_eq can be as high as 4, e.g. for Hydrogen attached to a sp3 carbon atom with three non-Hydrogen atom neighbors. Since all methods consistently show that the H-U_iso/X-U_eq ratio is temperature dependent, this effect should be taken into account in conventional structure determinations."


2019 ◽  
Vol 1 (3) ◽  
pp. 221-240
Author(s):  
Hari Bandi ◽  
Dimitris Bertsimas ◽  
Rahul Mazumder

We consider the problem of estimating the parameters of a multivariate Gaussian mixture model (GMM) given access to n samples that are believed to have come from a mixture of multiple subpopulations. State-of-the-art algorithms used to recover these parameters use heuristics to either maximize the log-likelihood of the sample or try to fit first few moments of the GMM to the sample moments. In contrast, we present here a novel mixed-integer optimization (MIO) formulation that optimally recovers the parameters of the GMM by minimizing a discrepancy measure (either the Kolmogorov–Smirnov or the total variation distance) between the empirical distribution function and the distribution function of the GMM whenever the mixture component weights are known. We also present an algorithm for multidimensional data that optimally recovers corresponding means and covariance matrices. We show that the MIO approaches are practically solvable for data sets with n in the tens of thousands in minutes and achieve an average improvement of 60%–70% and 50%–60% on mean absolute percentage error in estimating the means and the covariance matrices, respectively, over the expectation–maximization (EM) algorithm independent of the sample size n. As the separation of the Gaussians decreases and, correspondingly, the problem becomes more difficult, the edge in performance in favor of the MIO methods widens. Finally, we also show that the MIO methods outperform the EM algorithm with an average improvement of 4%–5% on the out-of-sample accuracy for real-world data sets.


Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4460 ◽  
Author(s):  
Michael Stadler ◽  
Zack Pecenak ◽  
Patrick Mathiesen ◽  
Kelsey Fahy ◽  
Jan Kleissl

Mixed Integer Linear Programming (MILP) optimization algorithms provide accurate and clear solutions for Microgrid and Distributed Energy Resources projects. Full-scale optimization approaches optimize all time-steps of data sets (e.g., 8760 time-step and higher resolutions), incurring extreme and unpredictable run-times, often prohibiting such approaches for effective Microgrid designs. To reduce run-times down-sampling approaches exist. Given that the literature evaluates the full-scale and down-sampling approaches only for limited numbers of case studies, there is a lack of a more comprehensive study involving multiple Microgrids. This paper closes this gap by comparing results and run-times of a full-scale 8760 h time-series MILP to a peak preserving day-type MILP for 13 real Microgrid projects. The day-type approach reduces the computational time between 85% and almost 100% (from 2 h computational time to less than 1 min). At the same time the day-type approach keeps the objective function (OF) differences below 1.5% for 77% of the Microgrids. The other cases show OF differences between 6% and 13%, which can be reduced to 1.5% or less by applying a two-stage hybrid approach that designs the Microgrid based on down-sampled data and then performs a full-scale dispatch algorithm. This two stage approach results in 20–99% run-time savings.


Sign in / Sign up

Export Citation Format

Share Document