Exploring graph traversal algorithms in graph-based molecular generation

Mapping Intimacies ◽

10.33774/chemrxiv-2021-5c5l1-v2 ◽

2021 ◽

Author(s):

Rocío Mercado ◽

Esben Bjerrum ◽

Ola Engkvist

Keyword(s):

Natural Products ◽

Search Algorithm ◽

Molecular Graph ◽

Molecular Shape ◽

Generative Models ◽

Training Data ◽

Depth First Search ◽

Graph Traversal ◽

The Impact ◽

Traversal Algorithm

Here we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better coverage of training data features compared to a depth-first traversal. We have quantified these differences using a variety of metrics on a dataset of natural products. These metrics include: percent validity, molecular coverage, and molecular shape. We also observe that using either a breadth- or depth-first traversal it is possible to over-train the generative models, at which point the results with the graph traversal algorithm are identical

Download Full-text

Space-Efficient Fully Dynamic DFS in Undirected Graphs †

Algorithms ◽

10.3390/a12030052 ◽

2019 ◽

Vol 12 (3) ◽

pp. 52 ◽

Cited By ~ 1

Author(s):

Kengo Nakamura ◽

Kunihiko Sadakane

Keyword(s):

Undirected Graph ◽

Undirected Graphs ◽

Edge Connectivity ◽

Worst Case ◽

Depth First Search ◽

Insertions And Deletions ◽

Adjacency List ◽

Graph Traversal ◽

Dynamic Connectivity ◽

Traversal Algorithm

Depth-first search (DFS) is a well-known graph traversal algorithm and can be performed in O ( n + m ) time for a graph with n vertices and m edges. We consider the dynamic DFS problem, that is, to maintain a DFS tree of an undirected graph G under the condition that edges and vertices are gradually inserted into or deleted from G. We present an algorithm for this problem, which takes worst-case O ( m n · polylog ( n ) ) time per update and requires only ( 3 m + o ( m ) ) log n bits of space. This algorithm reduces the space usage of dynamic DFS algorithm to only 1.5 times as much space as that of the adjacency list of the graph. We also show applications of our dynamic DFS algorithm to dynamic connectivity, biconnectivity, and 2-edge-connectivity problems under vertex insertions and deletions.

Download Full-text

StyleGANs and Transfer Learning for Generating Synthetic Images in Industrial Applications

Symmetry ◽

10.3390/sym13081497 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1497

Author(s):

Harold Achicanoy ◽

Deisy Chaves ◽

Maria Trujillo

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Data Augmentation ◽

Industrial Applications ◽

Generative Models ◽

Training Data ◽

Generative Adversarial Networks ◽

Augmentation Strategy ◽

Synthetic Images ◽

The Impact

Deep learning applications on computer vision involve the use of large-volume and representative data to obtain state-of-the-art results due to the massive number of parameters to optimise in deep models. However, data are limited with asymmetric distributions in industrial applications due to rare cases, legal restrictions, and high image-acquisition costs. Data augmentation based on deep learning generative adversarial networks, such as StyleGAN, has arisen as a way to create training data with symmetric distributions that may improve the generalisation capability of built models. StyleGAN generates highly realistic images in a variety of domains as a data augmentation strategy but requires a large amount of data to build image generators. Thus, transfer learning in conjunction with generative models are used to build models with small datasets. However, there are no reports on the impact of pre-trained generative models, using transfer learning. In this paper, we evaluate a StyleGAN generative model with transfer learning on different application domains—training with paintings, portraits, Pokémon, bedrooms, and cats—to generate target images with different levels of content variability: bean seeds (low variability), faces of subjects between 5 and 19 years old (medium variability), and charcoal (high variability). We used the first version of StyleGAN due to the large number of publicly available pre-trained models. The Fréchet Inception Distance was used for evaluating the quality of synthetic images. We found that StyleGAN with transfer learning produced good quality images, being an alternative for generating realistic synthetic images in the evaluated domains.

Download Full-text

Efficient Operative Cost Reduction in Distribution Grids Considering the Optimal Placement and Sizing of D-STATCOMs Using a Discrete-Continuous VSA

Applied Sciences ◽

10.3390/app11052175 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2175

Author(s):

Oscar Danilo Montoya ◽

Walter Gil-González ◽

Jesus C. Hernández

Keyword(s):

Objective Function ◽

Reactive Power ◽

Programming Model ◽

Search Algorithm ◽

Solution Space ◽

Distribution Networks ◽

Optimal Placement ◽

Mixed Integer ◽

Solution Vector ◽

The Impact

The problem of reactive power compensation in electric distribution networks is addressed in this research paper from the point of view of the combinatorial optimization using a new discrete-continuous version of the vortex search algorithm (DCVSA). To explore and exploit the solution space, a discrete-continuous codification of the solution vector is proposed, where the discrete part determines the nodes where the distribution static compensator (D-STATCOM) will be installed, and the continuous part of the codification determines the optimal sizes of the D-STATCOMs. The main advantage of such codification is that the mixed-integer nonlinear programming model (MINLP) that represents the problem of optimal placement and sizing of the D-STATCOMs in distribution networks only requires a classical power flow method to evaluate the objective function, which implies that it can be implemented in any programming language. The objective function is the total costs of the grid power losses and the annualized investment costs in D-STATCOMs. In addition, to include the impact of the daily load variations, the active and reactive power demand curves are included in the optimization model. Numerical results in two radial test feeders with 33 and 69 buses demonstrate that the proposed DCVSA can solve the MINLP model with best results when compared with the MINLP solvers available in the GAMS software. All the simulations are implemented in MATLAB software using its programming environment.

Download Full-text

Copy-Move Forgery Detection (CMFD) Using Deep Learning for Image and Video Forensics

Journal of Imaging ◽

10.3390/jimaging7030059 ◽

2021 ◽

Vol 7 (3) ◽

pp. 59

Author(s):

Yohanna Rodriguez-Ortega ◽

Dora M. Ballesteros ◽

Diego Renza

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Training Data ◽

Video Editing ◽

Forgery Detection ◽

Copy Move Forgery Detection ◽

The Impact ◽

And Training ◽

Selection Of ◽

Traditional Image

With the exponential growth of high-quality fake images in social networks and media, it is necessary to develop recognition algorithms for this type of content. One of the most common types of image and video editing consists of duplicating areas of the image, known as the copy-move technique. Traditional image processing approaches manually look for patterns related to the duplicated content, limiting their use in mass data classification. In contrast, approaches based on deep learning have shown better performance and promising results, but they present generalization problems with a high dependence on training data and the need for appropriate selection of hyperparameters. To overcome this, we propose two approaches that use deep learning, a model by a custom architecture and a model by transfer learning. In each case, the impact of the depth of the network is analyzed in terms of precision (P), recall (R) and F1 score. Additionally, the problem of generalization is addressed with images from eight different open access datasets. Finally, the models are compared in terms of evaluation metrics, and training and inference times. The model by transfer learning of VGG-16 achieves metrics about 10% higher than the model by a custom architecture, however, it requires approximately twice as much inference time as the latter.

Download Full-text

Underwater Acoustic Target Recognition Based on Generative Adversarial Network Data Augmentation

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2737 ◽

2021 ◽

Vol 263 (2) ◽

pp. 4558-4564

Author(s):

Minghong Zhang ◽

Xinwei Luo

Keyword(s):

Data Augmentation ◽

Target Recognition ◽

Training Data ◽

Small Samples ◽

Generative Adversarial Network ◽

Data Set ◽

Underwater Acoustic ◽

Adversarial Network ◽

Acoustic Target ◽

The Impact

Underwater acoustic target recognition is an important aspect of underwater acoustic research. In recent years, machine learning has been developed continuously, which is widely and effectively applied in underwater acoustic target recognition. In order to acquire good recognition results and reduce the problem of overfitting, Adequate data sets are essential. However, underwater acoustic samples are relatively rare, which has a certain impact on recognition accuracy. In this paper, in addition of the traditional audio data augmentation method, a new method of data augmentation using generative adversarial network is proposed, which uses generator and discriminator to learn the characteristics of underwater acoustic samples, so as to generate reliable underwater acoustic signals to expand the training data set. The expanded data set is input into the deep neural network, and the transfer learning method is applied to further reduce the impact caused by small samples by fixing part of the pre-trained parameters. The experimental results show that the recognition result of this method is better than the general underwater acoustic recognition method, and the effectiveness of this method is verified.

Download Full-text

Abstract 18020: Identifying Hospitalizations Related to Heart Failure in Dronedarone Users Who Are Supplementary Medicare Beneficiaries

Circulation ◽

10.1161/circ.130.suppl_2.18020 ◽

2014 ◽

Vol 130 (suppl_2) ◽

Author(s):

Chuntao Wu ◽

Andrew Koren ◽

Jane Thammakhoune ◽

Jasmanda Wu ◽

Hayet Kechemir ◽

...

Keyword(s):

Heart Failure ◽

Confidence Interval ◽

Claims Data ◽

Search Algorithm ◽

Medicare Beneficiaries ◽

Current Procedural Terminology ◽

Diagnosis Codes ◽

Medicare Database ◽

The Impact

Background: When using inpatient claims data to identify hospitalizations in supplemental Medicare beneficiaries, e.g., in the MarketScan database, there is a concern that the coverage of hospitalizations in such inpatient claims may be incomplete. However, whether hospitalizations are covered by inpatient claims or not, they incur professional charges that are recorded in the professional claims data in the MarketScan Medicare database. In the context of identifying hospitalizations that might be related to heart failure (HF) in dronedarone users, we compared different approaches to identify such hospitalizations. Objective: To assess the impact of using professional claims in addition to inpatient claims on identifying hospitalizations that might be related to HF. Methods: A total of 20,834 dronedarone users who were supplemental Medicare beneficiaries between July 2009 (launch date in US) and December 2012 were identified in the MarketScan database. The hospitalizations that might be related to HF within 30 days prior to initiating dronedarone were identified by searching (1) inpatient claims and (2) both inpatient and professional claims using related ICD-9-CM diagnosis codes for HF and Current Procedural Terminology codes for hospitalizations. Results: A total of 1,162 patients who had HF hospitalizations within 30 days prior to initiating dronedarone were identified by searching inpatient claims between July 2009 and December 2012. Supplementing with professional claims identified an additional 177 patients who had HF hospitalizations, increasing the total number to 1,339. Therefore, 13.2% (177/1,399) of the patients who had HF hospitalizations could only be identified in professional claims. Thus, the prevalence of hospitalizations that might be related to HF within 30 days prior to initiating dronedarone was 5.6% (1,162/20,834; 95% confidence interval (CI): 5.3 - 5.9%) when hospitalizations were identified using inpatient claims alone. Adding professional claims in the search algorithm, the prevalence of HF hospitalizations was 6.4% (1,339/20,834, 95% CI: 6.1 - 6.8%). Conclusions: Using professional claims, in addition to inpatient claims, can improve the identification of hospitalizations that might be related to HF.

Download Full-text

Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images

Sensors ◽

10.3390/s18103232 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3232 ◽

Cited By ~ 17

Author(s):

Yan Liu ◽

Qirui Ren ◽

Jiahui Geng ◽

Meng Ding ◽

Jiangyun Li

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Large Scale ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Training Data ◽

Land Resources ◽

Remote Sensing Images ◽

Training Strategy ◽

The Impact

Efficient and accurate semantic segmentation is the key technique for automatic remote sensing image analysis. While there have been many segmentation methods based on traditional hand-craft feature extractors, it is still challenging to process high-resolution and large-scale remote sensing images. In this work, a novel patch-wise semantic segmentation method with a new training strategy based on fully convolutional networks is presented to segment common land resources. First, to handle the high-resolution image, the images are split as local patches and then a patch-wise network is built. Second, training data is preprocessed in several ways to meet the specific characteristics of remote sensing images, i.e., color imbalance, object rotation variations and lens distortion. Third, a multi-scale training strategy is developed to solve the severe scale variation problem. In addition, the impact of conditional random field (CRF) is studied to improve the precision. The proposed method was evaluated on a dataset collected from a capital city in West China with the Gaofen-2 satellite. The dataset contains ten common land resources (Grassland, Road, etc.). The experimental results show that the proposed algorithm achieves 54.96% in terms of mean intersection over union (MIoU) and outperforms other state-of-the-art methods in remote sensing image segmentation.

Download Full-text

Regiospecific Methylation of a Dietary Flavonoid Scaffold Selectively Enhances IL-1β Production following Toll-like Receptor 2 Stimulation in THP-1 Monocytes

Journal of Biological Chemistry ◽

10.1074/jbc.m113.453514 ◽

2013 ◽

Vol 288 (29) ◽

pp. 21126-21135 ◽

Cited By ~ 11

Author(s):

Eng-Kiat Lim ◽

Paul J. Mitchell ◽

Najmeeyah Brown ◽

Rebecca A. Drummond ◽

Gordon D. Brown ◽

...

Keyword(s):

Innate Immunity ◽

Natural Products ◽

Intestinal Microflora ◽

Rank Order ◽

Inflammasome Activation ◽

Toll Like Receptor ◽

Immune Health ◽

Toll Like Receptor 2 ◽

Dietary Flavonoid ◽

The Impact

It is now recognized that innate immunity to intestinal microflora plays a significant role in mediating immune health, and modulation of microbial sensing may underpin the impact of plant natural products in the diet or when used as nutraceuticals. In this context, we have examined five classes of plant-derived flavonoids (flavonols, flavones, flavanones, catechins, and cyanidin) for their ability to regulate cytokine release induced by the Toll-like receptor 2 (TLR2) agonist Pam3CSK4. We found that the flavonols selectively co-stimulated IL-1β secretion but had no impact on the secretion of IL-6. Importantly, this costimulation of TLR2-induced cytokine secretion was dependent on regiospecific methylation of the flavonol scaffold with a rank order of quercetin-3,4′-dimethylether > quercetin-3-methylether > casticin. The mechanism underpinning this costimulation did not involve enhanced inflammasome activation. In contrast, the methylated flavonols enhanced IL-1β gene expression through transcriptional regulation, involving mechanisms that operate downstream of the initial NF-κB and STAT1 activation events. These studies demonstrate an exquisite level of control of scaffold bioactivity by regiospecific methylation, with important implications for understanding how natural products affect innate immunity and for their development as novel immunomodulators for clinical use.

Download Full-text

A Hybrid GSA-K-Mean Classifier Algorithm to Predict Diabetes Mellitus

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2017100106 ◽

2017 ◽

Vol 8 (4) ◽

pp. 99-112 ◽

Cited By ~ 7

Author(s):

Rojalina Priyadarshini ◽

Rabindra Kumar Barik ◽

Nilamadhab Dash ◽

Brojo Kishore Mishra ◽

Rachita Misra

Keyword(s):

Search Algorithm ◽

Gravitational Search Algorithm ◽

Cluster Head ◽

Training Data ◽

Learning Classifier ◽

Initial Cluster ◽

Positive Class ◽

Initial Placement ◽

Negative Class ◽

Inherent Problem

Lots of research has been carried out globally to design a machine classifier which could predict it from some physical and bio-medical parameters. In this work a hybrid machine learning classifier has been proposed to design an artificial predictor to correctly classify diabetic and non-diabetic people. The classifier is an amalgamation of the widely used K-means algorithm and Gravitational search algorithm (GSA). GSA has been used as an optimization tool which will compute the best centroids from the two classes of training data; the positive class (who are diabetic) and negative class (who are non-diabetic). In K-means algorithm instead of using random samples as initial cluster head, the optimized centroids from GSA are used as the cluster centers. The inherent problem associated with k-means algorithm is the initial placement of cluster centers, which may cause convergence delay thereby degrading the overall performance. This problem is tried to overcome by using a combined GSA and K-means.

Download Full-text

DANNP: an efficient artificial neural network pruning tool

PeerJ Computer Science ◽

10.7717/peerj-cs.137 ◽

2017 ◽

Vol 3 ◽

pp. e137 ◽

Cited By ~ 7

Author(s):

Mona Alshahrani ◽

Othman Soufan ◽

Arturo Magana-Mora ◽

Vladimir B. Bajic

Keyword(s):

Neural Network ◽

State Of The Art ◽

Model Performance ◽

Training Data ◽

Classification Problems ◽

Link Type ◽

On Line ◽

Pruning Algorithms ◽

Artificial Neural ◽

The Impact

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.

Download Full-text