Generating Fixed-Size Training Sets for Large and Streaming Datasets

<p>The chemical space for novel electronic donor-acceptor oligomers with targeted properties was explored using deep generative models and transfer learning. A General Recurrent Neural Network model was trained from the ChEMBL database to generate chemically valid SMILES strings. The parameters of the General Recurrent Neural Network were fine-tuned via transfer learning using the electronic donor-acceptor database from the Computational Material Repository to generate novel donor-acceptor oligomers. Six different transfer learning models were developed with different subsets of the donor-acceptor database as training sets. We concluded that electronic properties such as HOMO-LUMO gaps and dipole moments of the training sets can be learned using the SMILES representation with deep generative models, and that the chemical space of the training sets can be efficiently explored. This approach identified approximately 1700 new molecules that have promising electronic properties (HOMO-LUMO gap <2 eV and dipole moment <2 Debye), 6-times more than in the original database. Amongst the molecular transformations, the deep generative model has learned how to produce novel molecules by trading off between selected atomic substitutions (such as halogenation or methylation) and molecular features such as the spatial extension of the oligomer. The method can be extended as a plausible source of new chemical combinations to effectively explore the chemical space for targeted properties.</p>

Download Full-text

FORMATION OF TRAINING SETS FOR NEURAL NETWORKS BASED ON SYNTHESIZED DATA

PROCESSING, TRANSMISSION AND PROTECTION OF INFORMATION IN COMPUTER SYSTEMS ◽

10.31799/978-5-8088-1452-3-2020-1-164-168 ◽

2020 ◽

Author(s):

A. S. Urazov ◽

◽

A. A. Vostrikov ◽

Keyword(s):

Neural Networks ◽

Training Sets

Download Full-text

Semi-automated classification of colonial Microcystis by FlowCAM imaging flow cytometry in mesocosm experiment reveals high heterogeneity during seasonal bloom

Scientific Reports ◽

10.1038/s41598-021-88661-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yersultan Mirasbekov ◽

Adina Zhumakhanova ◽

Almira Zhantuyakova ◽

Kuanysh Sarkytbayev ◽

Dmitry V. Malashenkov ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Spatial Resolution ◽

Mesocosm Experiment ◽

Imaging Flow Cytometry ◽

Leibler Divergence ◽

Temporal And Spatial ◽

High Level ◽

Training Sets

AbstractA machine learning approach was employed to detect and quantify Microcystis colonial morphospecies using FlowCAM-based imaging flow cytometry. The system was trained and tested using samples from a long-term mesocosm experiment (LMWE, Central Jutland, Denmark). The statistical validation of the classification approaches was performed using Hellinger distances, Bray–Curtis dissimilarity, and Kullback–Leibler divergence. The semi-automatic classification based on well-balanced training sets from Microcystis seasonal bloom provided a high level of intergeneric accuracy (96–100%) but relatively low intrageneric accuracy (67–78%). Our results provide a proof-of-concept of how machine learning approaches can be applied to analyze the colonial microalgae. This approach allowed to evaluate Microcystis seasonal bloom in individual mesocosms with high level of temporal and spatial resolution. The observation that some Microcystis morphotypes completely disappeared and re-appeared along the mesocosm experiment timeline supports the hypothesis of the main transition pathways of colonial Microcystis morphoforms. We demonstrated that significant changes in the training sets with colonial images required for accurate classification of Microcystis spp. from time points differed by only two weeks due to Microcystis high phenotypic heterogeneity during the bloom. We conclude that automatic methods not only allow a performance level of human taxonomist, and thus be a valuable time-saving tool in the routine-like identification of colonial phytoplankton taxa, but also can be applied to increase temporal and spatial resolution of the study.

Download Full-text

Spatial-temporal variables for swimming coaches: A comparison study between video and TritonWear sensor

International Journal of Sports Science & Coaching ◽

10.1177/17479541211013755 ◽

2021 ◽

pp. 174795412110137

Author(s):

Robin Pla ◽

Thibaut Ledanois ◽

Escobar David Simbana ◽

Anaël Aubry ◽

Benjamin Tranchard ◽

...

Keyword(s):

Swimming Performance ◽

Video Recording ◽

Open Water ◽

Percentage Error ◽

Stroke Index ◽

Wearable Sensor ◽

Stroke Length ◽

Video Recordings ◽

Temporal Variables ◽

Training Sets

The main aim of this study was to evaluate the validity and the reliability of a swimming sensor to assess swimming performance and spatial-temporal variables. Six international male open-water swimmers completed a protocol which consisted of two training sets: a 6×100m individual medley and a continuous 800 m set in freestyle. Swimmers were equipped with a wearable sensor, the TritonWear to collect automatically spatial-temporal variables: speed, lap time, stroke count (SC), stroke length (SL), stroke rate (SR), and stroke index (SI). Video recordings were added as a “gold-standard” and used to assess the validity and the reliability of the TritonWear sensor. The results show that the sensor provides accurate results in comparison with video recording measurements. A very high accuracy was observed for lap time with a mean absolute percentage error (MAPE) under 5% for each stroke (2.2, 3.2, 3.4, 4.1% for butterfly, backstroke, breaststroke and freestyle respectively) but high error ranges indicate a dependence on swimming technique. Stroke count accuracy was higher for symmetric strokes than for alternate strokes (MAPE: 0, 2.4, 7.1 & 4.9% for butterfly, breaststroke, backstroke & freestyle respectively). The other variables (SL, SR & SI) derived from the SC and the lap time also show good accuracy in all strokes. The wearable sensor provides an accurate real time feedback of spatial-temporal variables in six international open-water swimmers during classical training sets (at low to moderate intensities), which could be a useful tool for coaches, allowing them to monitor training load with no effort.

Download Full-text

Fixed-size video summarization over streaming data via non-monotone submodular maximization

Proceedings of the 2nd ACM International Conference on Multimedia in Asia ◽

10.1145/3444685.3446285 ◽

2021 ◽

Author(s):

Ganfeng Lu ◽

Jiping Zheng

Keyword(s):

Video Summarization ◽

Streaming Data ◽

Fixed Size ◽

Submodular Maximization

Download Full-text

Mining discriminative patches for script identification in natural scene images

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200260 ◽

2021 ◽

Vol 40 (1) ◽

pp. 551-563

Author(s):

Liqiong Lu ◽

Dong Wu ◽

Ziwei Tang ◽

Yaohua Yi ◽

Faliang Huang

Keyword(s):

Neural Networks ◽

Experimental Results ◽

The Other ◽

Natural Scene ◽

Fixed Size ◽

Script Identification ◽

Aspect Ratios ◽

Novel Approach ◽

Public Datasets ◽

Natural Scene Images

This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.

Download Full-text