A test development of a data driven model to simulate chlorophyll data at Tongyeong bay in Korea

Mapping Intimacies ◽

10.5194/egusphere-egu2020-13035 ◽

2020 ◽

Author(s):

Sung Dae Kim ◽

Sang Hwa Choi

Keyword(s):

Solar Radiation ◽

Periodic Variation ◽

Training Data ◽

Training Dataset ◽

Sequence Length ◽

Observation Data ◽

The Public ◽

Ocean Science ◽

Moderate Resolution Imaging Spectroradiometer ◽

Hidden Layer

A pilot machine learning(ML) program was developed to test ML technique for simulation of biochemical parameters at the coastal area in Korea. Temperature, chlorophyll, solar radiation, daylight time, humidity, nutrient data were collected as training dataset from the public domain and in-house projects of KIOST(Korea Institute of Ocean Science & Technology). Daily satellite chlorophyll data of MODIS(Moderate Resolution Imaging Spectroradiometer) and GOCI(Geostationary Ocean Color Imager) were retrieved from the public services. Daily SST(Sea Surface Temperature) data and ECMWF solar radiation data were retrieved from GHRSST service and Copernicus service. Meteorological observation data and marine observation data were collected from KMA (Korea Meteorological Agency) and KIOST. The output of marine biochemical numerical model of KIOST were also prepared to validate ML model. ML program was configured using LSTM network and TensorFlow. During the data processing process, some chlorophyll data were interpolated because there were many missing data exist in satellite dataset. ML training were conducted repeatedly under varying combinations of sequence length, learning rate, number of hidden layer and iterations. The 75% of training dataset were used for training and 25% were used for prediction. The maximum correlation between training data and predicted data was 0.995 in case that model output data were used as training dataset. When satellite data and observation data were used, correlations were around 0.55. Though the latter corelation is relatively low, the model simulated periodic variation well and some differences were found at peak values. It is thought that ML model can be applied for simulation of chlorophyll data if preparation of sufficient reliable observation data were possible.

Download Full-text

Methodology for Collecting a Training Dataset for an Intrusion Detection Model

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(5)-5 ◽

2021 ◽

Vol 33 (5) ◽

pp. 83-104

Author(s):

Aleksandr Igorevich Getman ◽

Maxim Nikolaevich Goryunov ◽

Andrey Georgievich Matskevich ◽

Dmitry Aleksandrovich Rybolovlev

Keyword(s):

Attack Detection ◽

Training Data ◽

Training Dataset ◽

Training Models ◽

The Public ◽

Detection Model ◽

Computer Attacks ◽

Model Training ◽

Public Datasets

The paper discusses the issues of training models for detecting computer attacks based on the use of machine learning methods. The results of the analysis of publicly available training datasets and tools for analyzing network traffic and identifying features of network sessions are presented sequentially. The drawbacks of existing tools and possible errors in the datasets formed with their help are noted. It is concluded that it is necessary to collect own training data in the absence of guarantees of the public datasets reliability and the limited use of pre-trained models in networks with characteristics that differ from the characteristics of the network in which the training traffic was collected. A practical approach to generating training data for computer attack detection models is proposed. The proposed solutions have been tested to evaluate the quality of model training on the collected data and the quality of attack detection in conditions of real network infrastructure.

Download Full-text

Leveraging spatial textures, through machine learning, to identify aerosols and distinct cloud types from multispectral observations

Atmospheric Measurement Techniques ◽

10.5194/amt-13-5459-2020 ◽

2020 ◽

Vol 13 (10) ◽

pp. 5459-5480

Author(s):

Willem J. Marais ◽

Robert E. Holz ◽

Jeffrey S. Reid ◽

Rebecca M. Willett

Keyword(s):

Machine Learning ◽

Spectral Properties ◽

Infrared Imaging ◽

Spatial Information ◽

Training Data ◽

Machine Learning Techniques ◽

Training Dataset ◽

Orthogonal Polarization ◽

Statistical Parameters ◽

Moderate Resolution Imaging Spectroradiometer

Abstract. Current cloud and aerosol identification methods for multispectral radiometers, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS), employ multichannel spectral tests on individual pixels (i.e., fields of view). The use of the spatial information in cloud and aerosol algorithms has been primarily through statistical parameters such as nonuniformity tests of surrounding pixels with cloud classification provided by the multispectral microphysical retrievals such as phase and cloud top height. With these methodologies there is uncertainty in identifying optically thick aerosols, since aerosols and clouds have similar spectral properties in coarse-spectral-resolution measurements. Furthermore, identifying clouds regimes (e.g., stratiform, cumuliform) from just spectral measurements is difficult, since low-altitude cloud regimes have similar spectral properties. Recent advances in computer vision using deep neural networks provide a new opportunity to better leverage the coherent spatial information in multispectral imagery. Using a combination of machine learning techniques combined with a new methodology to create the necessary training data, we demonstrate improvements in the discrimination between cloud and severe aerosols and an expanded capability to classify cloud types. The labeled training dataset was created from an adapted NASA Worldview platform that provides an efficient user interface to assemble a human-labeled database of cloud and aerosol types. The convolutional neural network (CNN) labeling accuracy of aerosols and cloud types was quantified using independent Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) and MODIS cloud and aerosol products. By harnessing CNNs with a unique labeled dataset, we demonstrate the improvement of the identification of aerosols and distinct cloud types from MODIS and VIIRS images compared to a per-pixel spectral and standard deviation thresholding method. The paper concludes with case studies that compare the CNN methodology results with the MODIS cloud and aerosol products.

Download Full-text

DeepSSPred: A Deep Learning Based Sulfenylation site predictor via a novel n-segmented optimize federated feature encoder

Protein and Peptide Letters ◽

10.2174/0929866527666201202103411 ◽

2020 ◽

Vol 27 ◽

Author(s):

Zaheer Ullah Khan ◽

Dechang Pi

Keyword(s):

Large Scale ◽

Computational Models ◽

Research Work ◽

Training Data ◽

Training Dataset ◽

Validation Dataset ◽

Cytokine Signaling ◽

Minority Class ◽

Independent Dataset ◽

Feature Encoding

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.

Download Full-text

Accuracy Analysis of International Reference Ionosphere 2016 and NeQuick2 in the Antarctic

Sensors ◽

10.3390/s21041551 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1551

Author(s):

Zihuai Guo ◽

Yibin Yao ◽

Jian Kong ◽

Gang Chen ◽

Chen Zhou ◽

...

Keyword(s):

Solar Radiation ◽

International Reference Ionosphere ◽

Electron Content ◽

Observation Data ◽

Dual Frequency ◽

Total Electron ◽

International Reference ◽

Antarctic Continent ◽

The Antarctic ◽

Frequency Observation

Global navigation satellite system (GNSS) can provide dual-frequency observation data, which can be used to effectively calculate total electron content (TEC). Numerical studies have utilized GNSS-derived TEC to evaluate the accuracy of ionospheric empirical models, such as the International Reference Ionosphere model (IRI) and the NeQuick model. However, most studies have evaluated vertical TEC rather than slant TEC (STEC), which resulted in the introduction of projection error. Furthermore, since there are few GNSS observation stations available in the Antarctic region and most are concentrated in the Antarctic continent edge, it is difficult to evaluate modeling accuracy within the entire Antarctic range. Considering these problems, in this study, GNSS STEC was calculated using dual-frequency observation data from stations that almost covered the Antarctic continent. By comparison with GNSS STEC, the accuracy of IRI-2016 and NeQuick2 at different latitudes and different solar radiation was evaluated during 2016–2017. The numerical results showed the following. (1) Both IRI-2016 and NeQuick2 underestimated the STEC. Since IRI-2016 utilizes new models to represent the F2-peak height (hmF2) directly, the IRI-2016 STEC is closer to GNSS STEC than NeQuick2. This conclusion was also confirmed by the Constellation Observing System for Meteorology Ionosphere and Climate (COSMIC) occultation data. (2) The differences in STEC of the two models are both normally distributed, and the NeQuick2 STEC is systematically biased as solar radiation increases. (3) The root mean square error (RMSE) of the IRI-2016 STEC is smaller than that of the NeQuick2 model, and the RMSE of the two modeling STEC increases with solar radiation intensity. Since IRI-2016 relies on new hmF2 models, it is more stable than NeQuick2.

Download Full-text

Exploiting heterogeneity in operational neural networks by synaptic plasticity

Neural Computing and Applications ◽

10.1007/s00521-020-05543-w ◽

2021 ◽

Author(s):

Serkan Kiranyaz ◽

Junaid Malik ◽

Habib Ben Abdallah ◽

Turker Ince ◽

Alexandros Iosifidis ◽

...

Keyword(s):

Neural Networks ◽

Synaptic Plasticity ◽

Network Model ◽

Neuron Model ◽

Linear Operators ◽

Training Data ◽

Learning Performance ◽

Minimal Network ◽

Hidden Layer ◽

Hidden Neurons

AbstractThe recently proposed network model, Operational Neural Networks (ONNs), can generalize the conventional Convolutional Neural Networks (CNNs) that are homogenous only with a linear neuron model. As a heterogenous network model, ONNs are based on a generalized neuron model that can encapsulate any set of non-linear operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. However, the default search method to find optimal operators in ONNs, the so-called Greedy Iterative Search (GIS) method, usually takes several training sessions to find a single operator set per layer. This is not only computationally demanding, also the network heterogeneity is limited since the same set of operators will then be used for all neurons in each layer. To address this deficiency and exploit a superior level of heterogeneity, in this study the focus is drawn on searching the best-possible operator set(s) for the hidden neurons of the network based on the “Synaptic Plasticity” paradigm that poses the essential learning theory in biological neurons. During training, each operator set in the library can be evaluated by their synaptic plasticity level, ranked from the worst to the best, and an “elite” ONN can then be configured using the top-ranked operator sets found at each hidden layer. Experimental results over highly challenging problems demonstrate that the elite ONNs even with few neurons and layers can achieve a superior learning performance than GIS-based ONNs and as a result, the performance gap over the CNNs further widens.

Download Full-text

Improving 3-m Resolution Land Cover Mapping through Efficient Learning from an Imperfect 10-m Resolution Map

Remote Sensing ◽

10.3390/rs12091418 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1418

Author(s):

Runmin Dong ◽

Cong Li ◽

Haohuan Fu ◽

Jie Wang ◽

Weijia Li ◽

...

Keyword(s):

Land Cover ◽

Training Data ◽

Training Dataset ◽

Land Cover Mapping ◽

Remotely Sensed Data ◽

Large Area ◽

National Scale ◽

Substantial Progress ◽

Efficient Learning ◽

Land Cover Maps

Substantial progress has been made in the field of large-area land cover mapping as the spatial resolution of remotely sensed data increases. However, a significant amount of human power is still required to label images for training and testing purposes, especially in high-resolution (e.g., 3-m) land cover mapping. In this research, we propose a solution that can produce 3-m resolution land cover maps on a national scale without human efforts being involved. First, using the public 10-m resolution land cover maps as an imperfect training dataset, we propose a deep learning based approach that can effectively transfer the existing knowledge. Then, we improve the efficiency of our method through a network pruning process for national-scale land cover mapping. Our proposed method can take the state-of-the-art 10-m resolution land cover maps (with an accuracy of 81.24% for China) as the training data, enable a transferred learning process that can produce 3-m resolution land cover maps, and further improve the overall accuracy (OA) to 86.34% for China. We present detailed results obtained over three mega cities in China, to demonstrate the effectiveness of our proposed approach for 3-m resolution large-area land cover mapping.

Download Full-text

Bayesian Echo Classification for Australian Single-Polarization Weather Radar with Application to Assimilation of Radial Velocity Observations

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-14-00206.1 ◽

2015 ◽

Vol 32 (7) ◽

pp. 1341-1355 ◽

Cited By ~ 8

Author(s):

S. J. Rennie ◽

M. Curtis ◽

J. Peter ◽

A. W. Seed ◽

P. J. Steinle ◽

...

Keyword(s):

Weather Radar ◽

Training Data ◽

Training Dataset ◽

Bayes Classifier ◽

Threshold Method ◽

Ground Clutter ◽

Radar Network ◽

Routine Monitoring ◽

Single Polarization ◽

Anomalous Propagation

AbstractThe Australian Bureau of Meteorology’s operational weather radar network comprises a heterogeneous radar collection covering diverse geography and climate. A naïve Bayes classifier has been developed to identify a range of common echo types observed with these radars. The success of the classifier has been evaluated against its training dataset and by routine monitoring. The training data indicate that more than 90% of precipitation may be identified correctly. The echo types most difficult to distinguish from rainfall are smoke, chaff, and anomalous propagation ground and sea clutter. Their impact depends on their climatological frequency. Small quantities of frequently misclassified persistent echo (like permanent ground clutter or insects) can also cause quality control issues. The Bayes classifier is demonstrated to perform better than a simple threshold method, particularly for reducing misclassification of clutter as precipitation. However, the result depends on finding a balance between excluding precipitation and including erroneous echo. Unlike many single-polarization classifiers that are only intended to extract precipitation echo, the Bayes classifier also discriminates types of nonprecipitation echo. Therefore, the classifier provides the means to utilize clear air echo for applications like data assimilation, and the class information will permit separate data handling of different echo types.

Download Full-text

Denoising of river surface photogrammetric DEMs using deep learning

10.5194/egusphere-egu21-10266 ◽

2021 ◽

Author(s):

Radosław Szostak ◽

Przemysław Wachniew ◽

Mirosław Zimnoch ◽

Paweł Ćwiąkała ◽

Edyta Puniach ◽

...

Keyword(s):

Higher Education ◽

Deep Learning ◽

Water Level ◽

Water Surface ◽

Research University ◽

Static Characteristic ◽

Water Levels ◽

Training Dataset ◽

Observation Data ◽

Characteristic Points

Unmanned Aerial Vehicles (UAVs) can be an excellent tool for environmental measurements due to their ability to reach inaccessible places and fast data acquisition over large areas. In particular drones may have a potential application in hydrology, as they can be used to create photogrammetric digital elevation models (DEM) of the terrain allowing to obtain high resolution spatial distribution of water level in the river to be fed into hydrological models. Nevertheless, photogrammetric algorithms generate distortions on the DEM at the water bodies. This is due to light penetration below the water surface and the lack of static characteristic points on water surface that can be distinguished by the photogrammetric algorithm. The correction of these disturbances could be achieved by applying deep learning methods. For this purpose, it is necessary to build a training dataset containing DEMs before and after water surfaces denoising. A method has been developed to prepare such a dataset. It is divided into several stages. In the first step a photogrammetric surveys and geodetic water level measurements are performed. The second one includes generation of DEMs and orthomosaics using photogrammetric software. Finally in the last one the interpolation of the measured water levels is done to obtain a plane of the water surface and apply it to the DEMs to correct the distortion. The resulting dataset was used to train deep learning model based on convolutional neural networks. The proposed method has been validated on observation data representing part of Kocinka river catchment located in the central Poland.This research has been partly supported by the Ministry of Science and Higher Education Project &#8220;Initiative for Excellence &#8211; Research University&#8221; and Ministry of Science and Higher Education subsidy, project no. 16.16.220.842-B02 / 16.16.150.545.

Download Full-text

Evaluation of Power Insulator Detection Efficiency with the Use of Limited Training Dataset

Applied Sciences ◽

10.3390/app10062104 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2104

Author(s):

Michał Tomaszewski ◽

Paweł Michalski ◽

Jakub Osuchowski

Keyword(s):

Neural Network ◽

Neural Networks ◽

Object Detection ◽

Convolutional Neural Network ◽

Deep Neural Networks ◽

Detection Efficiency ◽

Training Data ◽

Training Dataset ◽

Training Set ◽

Convolutional Network

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.

Download Full-text

An Effective Solution to Regression Problem by RBF Neuron Network

International Journal of Operations Research and Information Systems ◽

10.4018/ijoris.2015100104 ◽

2015 ◽

Vol 6 (4) ◽

pp. 57-74 ◽

Cited By ~ 3

Author(s):

Dang Thi Thu Hien ◽

Hoang Xuan Huan ◽

Le Xuan Minh Hoang

Keyword(s):

Regression Function ◽

Training Data ◽

Neuron Network ◽

Open Problems ◽

Regression Problem ◽

Rbf Network ◽

Network Training ◽

Hidden Layer ◽

Definition Of ◽

Selection Of

Radial Basis Function (RBF) neuron network is being applied widely in multivariate function regression. However, selection of neuron number for hidden layer and definition of suitable centre in order to produce a good regression network are still open problems which have been researched by many people. This article proposes to apply grid equally space nodes as the centre of hidden layer. Then, the authors use k-nearest neighbour method to define the value of regression function at the center and an interpolation RBF network training algorithm with equally spaced nodes to train the network. The experiments show the outstanding efficiency of regression function when the training data has Gauss white noise.

Download Full-text