Development of an ANN-Based Urban Flood Alert Criteria Prediction Model and the Impact of Training Data Augmentation

Urban flooding occurs during heavy rains of short duration, so quick and accurate warnings of the danger of inundation are required. Previous research proposed methods to estimate statistics-based urban flood alert criteria based on flood damage records and rainfall data, and developed a Neuro-Fuzzy model for predicting appropriate flood alert criteria. A variety of artificial intelligence algorithms have been applied to the prediction of the urban flood alert criteria, and their usage and predictive precision have been enhanced with the recent development of artificial intelligence. Therefore, this study predicted flood alert criteria and analyzed the effect of applying the technique to augmentation training data using the Artificial Neural Network (ANN) algorithm. The predictive performance of the ANN model was RMSE 3.39-9.80 mm, and the model performance with the extension of training data was RMSE 1.08-6.88 mm, indicating that performance was improved by 29.8-82.6%.

Download Full-text

Improvement of Urban Flood Alert Criteria Prediction Model based on Neuro-Fuzzy Initial Function and Training Data

Korean Society of Hazard Mitigation ◽

10.9798/kosham.2020.20.1.327 ◽

2020 ◽

Vol 20 (1) ◽

pp. 327-337

Author(s):

Hoseon Kang ◽

Jaewoong Cho ◽

Hanseung Lee ◽

Jeonggeun Hwang

Keyword(s):

Fuzzy Model ◽

Training Data ◽

Average Error ◽

Urban Flood ◽

Model Based ◽

Neuro Fuzzy ◽

Flood Alert ◽

Improved Model ◽

Steep Slopes ◽

And Training

In Korean metropolitan areas, the high density of residential and commercial buildings, highly impervious surfaces, and steep slopes contribute to floods that can occur within a short duration of heavy rainfall. To prepare for this, advance warning measures based on accurate flood alert criteria are needed. In our previous study, we demonstrated the applications of a Neuro-Fuzzy model that considersthe characteristics of the basin to predict flood alert criteria in areas with no damage. However, as the number of learning materials are low, at 27, the evaluation and verification of the model has not been sufficiently accomplished, and its application is limited. Therefore, in this study, we propose an improved model based on the initializing function of the Neuro-Fuzzy algorithm, the construction of training data, and preprocessing. Compared to the existing model, the improved model reduced the average error by 48.1%~65.4% and the RMSE by 50.7%~60.1%. The new model, when applied to actual floods, showed an improvement of 0.7%~19.1% in accuracy.

Download Full-text

Geometric Morphometric Data Augmentation Using Generative Computational Learning Algorithms

Applied Sciences ◽

10.3390/app10249133 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9133

Author(s):

Lloyd A. Courtenay ◽

Diego González-Aguilera

Keyword(s):

Sample Size ◽

Data Augmentation ◽

Synthetic Data ◽

Model Performance ◽

Training Data ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Geometric Morphometric ◽

Adversarial Networks ◽

The Impact

The fossil record is notorious for being incomplete and distorted, frequently conditioning the type of knowledge that can be extracted from it. In many cases, this often leads to issues when performing complex statistical analyses, such as classification tasks, predictive modelling, and variance analyses, such as those used in Geometric Morphometrics. Here different Generative Adversarial Network architectures are experimented with, testing the effects of sample size and domain dimensionality on model performance. For model evaluation, robust statistical methods were used. Each of the algorithms were observed to produce realistic data. Generative Adversarial Networks using different loss functions produced multidimensional synthetic data significantly equivalent to the original training data. Conditional Generative Adversarial Networks were not as successful. The methods proposed are likely to reduce the impact of sample size and bias on a number of statistical learning applications. While Generative Adversarial Networks are not the solution to all sample-size related issues, combined with other pre-processing steps these limitations may be overcome. This presents a valuable means of augmenting geometric morphometric datasets for greater predictive visualization.

Download Full-text

Geometric Morphometric Data Augmentation using Generative Computational Learning Algorithms

10.20944/preprints202011.0696.v1 ◽

2020 ◽

Author(s):

Lloyd A. Courtenay ◽

Diego González-Aguilera

Keyword(s):

Sample Size ◽

Data Augmentation ◽

Synthetic Data ◽

Model Performance ◽

Training Data ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Geometric Morphometric ◽

Adversarial Networks ◽

The Impact

Download Full-text

Artificial Intelligence, 3D Documentation, and Rock Art—Approaching and Reflecting on the Automation of Identification and Classification of Rock Art Images

Journal of Archaeological Method and Theory ◽

10.1007/s10816-021-09518-6 ◽

2021 ◽

Author(s):

Christian Horn ◽

Oscar Ivarsson ◽

Cecilia Lindhé ◽

Rich Potter ◽

Ashely Green ◽

...

Keyword(s):

Artificial Intelligence ◽

Bronze Age ◽

Data Augmentation ◽

Rock Art ◽

Training Data ◽

Future Research ◽

Rock Surface ◽

Southern Scandinavia ◽

Bounding Boxes

AbstractRock art carvings, which are best described as petroglyphs, were produced by removing parts of the rock surface to create a negative relief. This tradition was particularly strong during the Nordic Bronze Age (1700–550 BC) in southern Scandinavia with over 20,000 boats and thousands of humans, animals, wagons, etc. This vivid and highly engaging material provides quantitative data of high potential to understand Bronze Age social structures and ideologies. The ability to provide the technically best possible documentation and to automate identification and classification of images would help to take full advantage of the research potential of petroglyphs in southern Scandinavia and elsewhere. We, therefore, attempted to train a model that locates and classifies image objects using faster region-based convolutional neural network (Faster-RCNN) based on data produced by a novel method to improve visualizing the content of 3D documentations. A newly created layer of 3D rock art documentation provides the best data currently available and has reduced inscribed bias compared to older methods. Several models were trained based on input images annotated with bounding boxes produced with different parameters to find the best solution. The data included 4305 individual images in 408 scans of rock art sites. To enhance the models and enrich the training data, we used data augmentation and transfer learning. The successful models perform exceptionally well on boats and circles, as well as with human figures and wheels. This work was an interdisciplinary undertaking which led to important reflections about archaeology, digital humanities, and artificial intelligence. The reflections and the success represented by the trained models open novel avenues for future research on rock art.

Download Full-text

Underwater Acoustic Target Recognition Based on Generative Adversarial Network Data Augmentation

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2737 ◽

2021 ◽

Vol 263 (2) ◽

pp. 4558-4564

Author(s):

Minghong Zhang ◽

Xinwei Luo

Keyword(s):

Data Augmentation ◽

Target Recognition ◽

Training Data ◽

Small Samples ◽

Generative Adversarial Network ◽

Data Set ◽

Underwater Acoustic ◽

Adversarial Network ◽

Acoustic Target ◽

The Impact

Underwater acoustic target recognition is an important aspect of underwater acoustic research. In recent years, machine learning has been developed continuously, which is widely and effectively applied in underwater acoustic target recognition. In order to acquire good recognition results and reduce the problem of overfitting, Adequate data sets are essential. However, underwater acoustic samples are relatively rare, which has a certain impact on recognition accuracy. In this paper, in addition of the traditional audio data augmentation method, a new method of data augmentation using generative adversarial network is proposed, which uses generator and discriminator to learn the characteristics of underwater acoustic samples, so as to generate reliable underwater acoustic signals to expand the training data set. The expanded data set is input into the deep neural network, and the transfer learning method is applied to further reduce the impact caused by small samples by fixing part of the pre-trained parameters. The experimental results show that the recognition result of this method is better than the general underwater acoustic recognition method, and the effectiveness of this method is verified.

Download Full-text

DANNP: an efficient artificial neural network pruning tool

PeerJ Computer Science ◽

10.7717/peerj-cs.137 ◽

2017 ◽

Vol 3 ◽

pp. e137 ◽

Cited By ~ 7

Author(s):

Mona Alshahrani ◽

Othman Soufan ◽

Arturo Magana-Mora ◽

Vladimir B. Bajic

Keyword(s):

Neural Network ◽

State Of The Art ◽

Model Performance ◽

Training Data ◽

Classification Problems ◽

Link Type ◽

On Line ◽

Pruning Algorithms ◽

Artificial Neural ◽

The Impact

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.

Download Full-text

Pneumothorax detection in chest radiographs: optimizing artificial intelligence system for accuracy and confounding bias reduction using in-image annotations in algorithm training

European Radiology ◽

10.1007/s00330-021-07833-w ◽

2021 ◽

Author(s):

Johannes Rueckel ◽

Christian Huemmer ◽

Andreas Fieselmann ◽

Florin-Cristian Ghesu ◽

Awais Mansoor ◽

...

Keyword(s):

Artificial Intelligence ◽

Bias Reduction ◽

Image Features ◽

Chest Radiographs ◽

Training Data ◽

Operating Characteristics ◽

High Quality ◽

Confounding Bias ◽

Public Datasets ◽

The Impact

Abstract Objectives Diagnostic accuracy of artificial intelligence (AI) pneumothorax (PTX) detection in chest radiographs (CXR) is limited by the noisy annotation quality of public training data and confounding thoracic tubes (TT). We hypothesize that in-image annotations of the dehiscent visceral pleura for algorithm training boosts algorithm’s performance and suppresses confounders. Methods Our single-center evaluation cohort of 3062 supine CXRs includes 760 PTX-positive cases with radiological annotations of PTX size and inserted TTs. Three step-by-step improved algorithms (differing in algorithm architecture, training data from public datasets/clinical sites, and in-image annotations included in algorithm training) were characterized by area under the receiver operating characteristics (AUROC) in detailed subgroup analyses and referenced to the well-established “CheXNet” algorithm. Results Performances of established algorithms exclusively trained on publicly available data without in-image annotations are limited to AUROCs of 0.778 and strongly biased towards TTs that can completely eliminate algorithm’s discriminative power in individual subgroups. Contrarily, our final “algorithm 2” which was trained on a lower number of images but additionally with in-image annotations of the dehiscent pleura achieved an overall AUROC of 0.877 for unilateral PTX detection with a significantly reduced TT-related confounding bias. Conclusions We demonstrated strong limitations of an established PTX-detecting AI algorithm that can be significantly reduced by designing an AI system capable of learning to both classify and localize PTX. Our results are aimed at drawing attention to the necessity of high-quality in-image localization in training data to reduce the risks of unintentionally biasing the training process of pathology-detecting AI algorithms. Key Points • Established pneumothorax-detecting artificial intelligence algorithms trained on public training data are strongly limited and biased by confounding thoracic tubes. • We used high-quality in-image annotated training data to effectively boost algorithm performance and suppress the impact of confounding thoracic tubes. • Based on our results, we hypothesize that even hidden confounders might be effectively addressed by in-image annotations of pathology-related image features.

Download Full-text

StyleGANs and Transfer Learning for Generating Synthetic Images in Industrial Applications

Symmetry ◽

10.3390/sym13081497 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1497

Author(s):

Harold Achicanoy ◽

Deisy Chaves ◽

Maria Trujillo

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Data Augmentation ◽

Industrial Applications ◽

Generative Models ◽

Training Data ◽

Generative Adversarial Networks ◽

Augmentation Strategy ◽

Synthetic Images ◽

The Impact

Deep learning applications on computer vision involve the use of large-volume and representative data to obtain state-of-the-art results due to the massive number of parameters to optimise in deep models. However, data are limited with asymmetric distributions in industrial applications due to rare cases, legal restrictions, and high image-acquisition costs. Data augmentation based on deep learning generative adversarial networks, such as StyleGAN, has arisen as a way to create training data with symmetric distributions that may improve the generalisation capability of built models. StyleGAN generates highly realistic images in a variety of domains as a data augmentation strategy but requires a large amount of data to build image generators. Thus, transfer learning in conjunction with generative models are used to build models with small datasets. However, there are no reports on the impact of pre-trained generative models, using transfer learning. In this paper, we evaluate a StyleGAN generative model with transfer learning on different application domains—training with paintings, portraits, Pokémon, bedrooms, and cats—to generate target images with different levels of content variability: bean seeds (low variability), faces of subjects between 5 and 19 years old (medium variability), and charcoal (high variability). We used the first version of StyleGAN due to the large number of publicly available pre-trained models. The Fréchet Inception Distance was used for evaluating the quality of synthetic images. We found that StyleGAN with transfer learning produced good quality images, being an alternative for generating realistic synthetic images in the evaluated domains.

Download Full-text

The impact of different negative training data on regulatory sequence predictions

PLoS ONE ◽

10.1371/journal.pone.0237412 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0237412

Author(s):

Louisa-Marie Krützfeldt ◽

Max Schubach ◽

Martin Kircher

Keyword(s):

Model Performance ◽

Training Data ◽

Training Dataset ◽

Support Vector ◽

Regulatory Sequence ◽

Open Chromatin ◽

Regulatory Sequences ◽

Cell Type ◽

The Impact ◽

Negative Training

Regulatory regions, like promoters and enhancers, cover an estimated 5–15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization.

Download Full-text

The impact of different negative training data on regulatory sequence predictions

10.1101/2020.07.28.224485 ◽

2020 ◽

Author(s):

Louisa-Marie Krützfeldt ◽

Max Schubach ◽

Martin Kircher

Keyword(s):

Model Performance ◽

Training Data ◽

Training Dataset ◽

Support Vector ◽

Regulatory Sequence ◽

Open Chromatin ◽

Regulatory Sequences ◽

Cell Type ◽

The Impact ◽

Negative Training

AbstractRegulatory regions, like promoters and enhancers, cover an estimated 5-15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences.Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements’ relative activity as measured from independent experimental data.Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization.

Download Full-text