State of the Art Iterative Docking with Logistic Regression and Morgan Fingerprints

10.26434/chemrxiv.14348117.v1 ◽

2021 ◽

Author(s):

Lewis Martin

Keyword(s):

Logistic Regression ◽

Message Passing ◽

Large Scale ◽

State Of The Art ◽

Computation Time ◽

Beta Lactamase ◽

Brute Force ◽

Virtual Libraries ◽

Ligand Discovery ◽

D4 Receptor

There is renewed interest in docking campaigns for ligand-discovery since the advent of ultra-large scale virtual libraries. Using brute-force search, the scale of the libraries suggests highly parallelized compute should be used to avoid years-long computations. This paper reports a re-analysis of docking data from an ultra-large docking campaign at the D4 receptor and AmpC beta lactamase, and demonstrates large reductions in computation time to identify the top-ranked ligands. A search of ‘baseline’ featurizations shows that logistic regression on Morgan fingerprints with pharmacophoric atom invariants can match the reported performance on the same task using message-passing networks. With this approach, an ultra-large docking campaign could be performed in a matter of weeks using consumer-grade CPUs with RDKit and scikit-learn. All code and figures are available at <a href="https://github.com/ljmartin/dockop">https://github.com/ljmartin/dockop</a>

Download Full-text

Hybrid Graph Neural Networks for Crowd Counting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6839 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11693-11700 ◽

Cited By ~ 2

Author(s):

Ao Luo ◽

Fan Yang ◽

Xin Li ◽

Dong Nie ◽

Zhicheng Jiao ◽

...

Keyword(s):

Network Architecture ◽

Message Passing ◽

Large Scale ◽

State Of The Art ◽

Density Variation ◽

Feature Maps ◽

Crowd Counting ◽

Multi Scale ◽

Crowd Density ◽

Graph Neural Networks

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.

Download Full-text

Parallel Implementation of Non-slicing Floorplans with MPI and OpenMP

10.32920/ryerson.14647368 ◽

2021 ◽

Author(s):

Oluvaseun Owojaiye

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computation Time ◽

Sequential Algorithm ◽

Design Stage ◽

Single Chip ◽

Solution Quality ◽

Early Design Stage

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text

Privacy-Preserving Interdomain Routing at Internet Scale

Proceedings on Privacy Enhancing Technologies ◽

10.1515/popets-2017-0033 ◽

2017 ◽

Vol 2017 (3) ◽

pp. 147-167 ◽

Cited By ~ 9

Author(s):

Gilad Asharov ◽

Daniel Demmler ◽

Michael Schapira ◽

Thomas Schneider ◽

Gil Segev ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

State Of The Art ◽

Computation Time ◽

Interdomain Routing ◽

Organizational Networks ◽

Sensitive Information ◽

Slow Convergence ◽

Level Scale ◽

Real World Problems

Abstract The Border Gateway Protocol (BGP) computes routes between the organizational networks that make up today’s Internet. Unfortunately, BGP suffers from deficiencies, including slow convergence, security problems, a lack of innovation, and the leakage of sensitive information about domains’ routing preferences. To overcome some of these problems, we revisit the idea of centralizing and using secure multi-party computation (MPC) for interdomain routing which was proposed by Gupta et al. (ACM HotNets’12). We implement two algorithms for interdomain routing with state-of-the-art MPC protocols. On an empirically derived dataset that approximates the topology of today’s Internet (55 809 nodes), our protocols take as little as 6 s of topology-independent precomputation and only 3 s of online time. We show, moreover, that when our MPC approach is applied at country/region-level scale, runtimes can be as low as 0.17 s online time and 0.20 s pre-computation time. Our results motivate the MPC approach for interdomain routing and furthermore demonstrate that current MPC techniques are capable of efficiently tackling real-world problems at a large scale.

Download Full-text

Parallel Implementation of Non-slicing Floorplans with MPI and OpenMP

10.32920/ryerson.14647368.v1 ◽

2021 ◽

Author(s):

Oluvaseun Owojaiye

Keyword(s):

Message Passing ◽

Large Scale ◽

Message Passing Interface ◽

Parallel Implementation ◽

Computation Time ◽

Sequential Algorithm ◽

Design Stage ◽

Single Chip ◽

Solution Quality ◽

Early Design Stage

Advancement in technology has brought considerable improvement to processor design and now manufacturers design multiple processors on a single chip. Supercomputers today consists of cluster of interconnected nodes that collaborate together to solve complex and advanced computation problems. Message Passing Interface and Open Multiprocessing are the popularly used programming models to optimize sequential codes by parallelizing them on the different multiprocessor architecture that exist today. In this thesis, we parallelize the non-slicing floorplan algorithm based on Multilevel Floorplanning/placement of large scale modules using B*tree (MB*tree) with MPI and OpenMP on distributed and shared memory architectures respectively. In VLSI (Very Large Scale Integration) design automation, floorplanning is an initial and vital task performed in the early design stage. Experimental results using MCNC benchmark circuits show that our parallel algorithm produced better results than the corresponding sequential algorithm; we were able to speed up the algorithm up to 4 times, hence reducing computation time and maintaining floorplan solution quality. On the other hand, we compared both parallel versions; and the OpenMP results gave slightly better than the corresponding MPI results.

Download Full-text

Big Data Processing on Volunteer Computing

ACM Transactions on Internet Technology ◽

10.1145/3409801 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1-20

Author(s):

Zhihan Lv ◽

Dongliang Chen ◽

Amit Kumar Singh

Keyword(s):

Big Data ◽

Complex Networks ◽

Shortest Path ◽

Message Passing ◽

Large Scale ◽

Computation Time ◽

Query Time ◽

Approximate Algorithm ◽

Calculation Time ◽

Approximate Shortest Path

In order to calculate the node big data contained in complex networks and realize the efficient calculation of complex networks, based on voluntary computing, taking ICE middleware as the communication medium, the loose coupling distributed framework DCBV based on voluntary computing is proposed. Then, the Master, Worker, and MiddleWare layers in the framework, and the development structure of a DCBV framework are designed. The task allocation and recovery strategy, message passing and communication mode, and fault tolerance processing are discussed. Finally, to calculate and verify parameters such as the average shortest path of the framework and shorten calculation time, an improved accurate shortest path algorithm, the N-SPFA algorithm, is proposed. Under different datasets, the node calculation and performance of the N-SPFA algorithm are explored. The algorithm is compared with four approximate shortest-path algorithms: Combined Link and Attribute (CLA), Lexicographic Breadth First Search (LBFS), Approximate algorithm of shortest path length based on center distance of area division (CDZ), and Hub Vertex of area and Core Expressway (HEA-CE). The results show that when the number of CPU threads is 4, the computation time of the DCBV framework is the shortest (514.63 ms). As the number of CPU cores increases, the overall computation time of the framework decreases gradually. For every 2 additional CPU cores, the number of tasks increases by 1. When the number of Worker nodes is 8 and the number of nodes is 1, the computation time of the framework is the shortest (210,979 ms), and the IO statistics data increase with the increase of Worker nodes. When the datasets are Undirected01 and Undirected02, the computation time of the N-SPFA algorithm is the shortest, which is 4520 ms and 7324 ms, respectively. However, the calculation time in the ca-condmat_undirected dataset is 175,292 ms, and the performance is slightly worse. Overall, however, the performance of the N-SPFA and SPFA algorithms is good. Therefore, the two algorithms are combined. For networks with less complexity, the computational scale coefficient of the SPFA algorithm can be set to 0.06, and for general networks, 0.2. When compared with other algorithms in different datasets, the pretreatment time, average query time, and overall query time of N-SPFA algorithm are the shortest, being 49.67 ms, 5.12 ms, and 94,720 ms, respectively. The accuracy (1.0087) and error rate (0.024) are also the best. In conclusion, voluntary computing can be applied to the processing of big data, which has a good reference significance for the distributed analysis of large-scale complex networks.

Download Full-text

Double Precision Is Not Needed for Many-Body Calculations: New Conventional Wisdom

10.26434/chemrxiv.6104804.v1 ◽

2018 ◽

Author(s):

Pavel Pokhilko ◽

Evgeny Epifanovsky ◽

Anna I. Krylov

Keyword(s):

Large Scale ◽

Computation Time ◽

Coupled Cluster ◽

Double Precision ◽

Many Body ◽

Single Precision ◽

Parallel Performance ◽

Point Representation ◽

Electron Repulsion Integrals ◽

Cluster Methods

Using single precision floating point representation reduces the size of data and computation time by a factor of two relative to double precision conventionally used in electronic structure programs. For large-scale calculations, such as those encountered in many-body theories, reduced memory footprint alleviates memory and input/output bottlenecks. Reduced size of data can lead to additional gains due to improved parallel performance on CPUs and various accelerators. However, using single precision can potentially reduce the accuracy of computed observables. Here we report an implementation of coupled-cluster and equation-of-motion coupled-cluster methods with single and double excitations in single precision. We consider both standard implementation and one using Cholesky decomposition or resolution-of-the-identity of electron-repulsion integrals. Numerical tests illustrate that when single precision is used in correlated calculations, the loss of accuracy is insignificant and pure single-precision implementation can be used for computing energies, analytic gradients, excited states, and molecular properties. In addition to pure single-precision calculations, our implementation allows one to follow a single-precision calculation by clean-up iterations, fully recovering double-precision results while retaining significant savings.

Download Full-text

Building Attention and Edge Convolution Neural Networks for Bioactivity and Physical-Chemical Property Prediction

10.26434/chemrxiv.9873599.v2 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael Withnall ◽

Edvard Lindelöf ◽

Ola Engkvist ◽

Hongming Chen

Keyword(s):

Message Passing ◽

State Of The Art ◽

A Priori ◽

Model Performance ◽

Learning Approaches ◽

Physical Chemical ◽

Chemical Descriptor ◽

Derived Properties ◽

Memory Schemes ◽

Hyperparameter Selection

We introduce Attention and Edge Memory schemes to the existing Message Passing Neural Network framework for graph convolution, and benchmark our approaches against eight different physical-chemical and bioactivity datasets from the literature. We remove the need to introduce a priori knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.

Download Full-text

Documentary data and the study of past droughts: a global state of the art

Climate of the Past ◽

10.5194/cp-14-1915-2018 ◽

2018 ◽

Vol 14 (12) ◽

pp. 1915-1960 ◽

Cited By ~ 34

Author(s):

Rudolf Brázdil ◽

Andrea Kiss ◽

Jürg Luterbacher ◽

David J. Nash ◽

Ladislava Řezníčková

Keyword(s):

Large Scale ◽

State Of The Art ◽

Drought Indices ◽

Documentary Evidence ◽

Climatic Trends ◽

Instrumental Observations ◽

Spatio Temporal ◽

Epigraphic Evidence ◽

Administrative Evidence

Abstract. The use of documentary evidence to investigate past climatic trends and events has become a recognised approach in recent decades. This contribution presents the state of the art in its application to droughts. The range of documentary evidence is very wide, including general annals, chronicles, memoirs and diaries kept by missionaries, travellers and those specifically interested in the weather; records kept by administrators tasked with keeping accounts and other financial and economic records; legal-administrative evidence; religious sources; letters; songs; newspapers and journals; pictographic evidence; chronograms; epigraphic evidence; early instrumental observations; society commentaries; and compilations and books. These are available from many parts of the world. This variety of documentary information is evaluated with respect to the reconstruction of hydroclimatic conditions (precipitation, drought frequency and drought indices). Documentary-based drought reconstructions are then addressed in terms of long-term spatio-temporal fluctuations, major drought events, relationships with external forcing and large-scale climate drivers, socio-economic impacts and human responses. Documentary-based drought series are also considered from the viewpoint of spatio-temporal variability for certain continents, and their employment together with hydroclimate reconstructions from other proxies (in particular tree rings) is discussed. Finally, conclusions are drawn, and challenges for the future use of documentary evidence in the study of droughts are presented.

Download Full-text

VIPPrint: Validating Synthetic Image Detection and Source Linking Methods on a Large Scale Dataset of Printed Documents

Journal of Imaging ◽

10.3390/jimaging7030050 ◽

2021 ◽

Vol 7 (3) ◽

pp. 50

Author(s):

Anselmo Ferreira ◽

Ehsan Nowroozi ◽

Mauro Barni

Keyword(s):

Large Scale ◽

State Of The Art ◽

Child Pornography ◽

Forensic Analysis ◽

Synthetic Image ◽

Image Detection ◽

Face Images ◽

Large Scale Dataset ◽

Scanned Images ◽

Analysis Of The Images

The possibility of carrying out a meaningful forensic analysis on printed and scanned images plays a major role in many applications. First of all, printed documents are often associated with criminal activities, such as terrorist plans, child pornography, and even fake packages. Additionally, printing and scanning can be used to hide the traces of image manipulation or the synthetic nature of images, since the artifacts commonly found in manipulated and synthetic images are gone after the images are printed and scanned. A problem hindering research in this area is the lack of large scale reference datasets to be used for algorithm development and benchmarking. Motivated by this issue, we present a new dataset composed of a large number of synthetic and natural printed face images. To highlight the difficulties associated with the analysis of the images of the dataset, we carried out an extensive set of experiments comparing several printer attribution methods. We also verified that state-of-the-art methods to distinguish natural and synthetic face images fail when applied to print and scanned images. We envision that the availability of the new dataset and the preliminary experiments we carried out will motivate and facilitate further research in this area.

Download Full-text