Full Border Identification for Reduction of Training Sets

Author(s):  
Guichong Li ◽  
Nathalie Japkowicz ◽  
Trevor J. Stocki ◽  
R. Kurt Ungar
Keyword(s):  
2019 ◽  
Author(s):  
Qi Yuan ◽  
Alejandro Santana-Bonilla ◽  
Martijn Zwijnenburg ◽  
Kim Jelfs

<p>The chemical space for novel electronic donor-acceptor oligomers with targeted properties was explored using deep generative models and transfer learning. A General Recurrent Neural Network model was trained from the ChEMBL database to generate chemically valid SMILES strings. The parameters of the General Recurrent Neural Network were fine-tuned via transfer learning using the electronic donor-acceptor database from the Computational Material Repository to generate novel donor-acceptor oligomers. Six different transfer learning models were developed with different subsets of the donor-acceptor database as training sets. We concluded that electronic properties such as HOMO-LUMO gaps and dipole moments of the training sets can be learned using the SMILES representation with deep generative models, and that the chemical space of the training sets can be efficiently explored. This approach identified approximately 1700 new molecules that have promising electronic properties (HOMO-LUMO gap <2 eV and dipole moment <2 Debye), 6-times more than in the original database. Amongst the molecular transformations, the deep generative model has learned how to produce novel molecules by trading off between selected atomic substitutions (such as halogenation or methylation) and molecular features such as the spatial extension of the oligomer. The method can be extended as a plausible source of new chemical combinations to effectively explore the chemical space for targeted properties.</p>


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yersultan Mirasbekov ◽  
Adina Zhumakhanova ◽  
Almira Zhantuyakova ◽  
Kuanysh Sarkytbayev ◽  
Dmitry V. Malashenkov ◽  
...  

AbstractA machine learning approach was employed to detect and quantify Microcystis colonial morphospecies using FlowCAM-based imaging flow cytometry. The system was trained and tested using samples from a long-term mesocosm experiment (LMWE, Central Jutland, Denmark). The statistical validation of the classification approaches was performed using Hellinger distances, Bray–Curtis dissimilarity, and Kullback–Leibler divergence. The semi-automatic classification based on well-balanced training sets from Microcystis seasonal bloom provided a high level of intergeneric accuracy (96–100%) but relatively low intrageneric accuracy (67–78%). Our results provide a proof-of-concept of how machine learning approaches can be applied to analyze the colonial microalgae. This approach allowed to evaluate Microcystis seasonal bloom in individual mesocosms with high level of temporal and spatial resolution. The observation that some Microcystis morphotypes completely disappeared and re-appeared along the mesocosm experiment timeline supports the hypothesis of the main transition pathways of colonial Microcystis morphoforms. We demonstrated that significant changes in the training sets with colonial images required for accurate classification of Microcystis spp. from time points differed by only two weeks due to Microcystis high phenotypic heterogeneity during the bloom. We conclude that automatic methods not only allow a performance level of human taxonomist, and thus be a valuable time-saving tool in the routine-like identification of colonial phytoplankton taxa, but also can be applied to increase temporal and spatial resolution of the study.


Author(s):  
Robin Pla ◽  
Thibaut Ledanois ◽  
Escobar David Simbana ◽  
Anaël Aubry ◽  
Benjamin Tranchard ◽  
...  

The main aim of this study was to evaluate the validity and the reliability of a swimming sensor to assess swimming performance and spatial-temporal variables. Six international male open-water swimmers completed a protocol which consisted of two training sets: a 6×100m individual medley and a continuous 800 m set in freestyle. Swimmers were equipped with a wearable sensor, the TritonWear to collect automatically spatial-temporal variables: speed, lap time, stroke count (SC), stroke length (SL), stroke rate (SR), and stroke index (SI). Video recordings were added as a “gold-standard” and used to assess the validity and the reliability of the TritonWear sensor. The results show that the sensor provides accurate results in comparison with video recording measurements. A very high accuracy was observed for lap time with a mean absolute percentage error (MAPE) under 5% for each stroke (2.2, 3.2, 3.4, 4.1% for butterfly, backstroke, breaststroke and freestyle respectively) but high error ranges indicate a dependence on swimming technique. Stroke count accuracy was higher for symmetric strokes than for alternate strokes (MAPE: 0, 2.4, 7.1 & 4.9% for butterfly, breaststroke, backstroke & freestyle respectively). The other variables (SL, SR & SI) derived from the SC and the lap time also show good accuracy in all strokes. The wearable sensor provides an accurate real time feedback of spatial-temporal variables in six international open-water swimmers during classical training sets (at low to moderate intensities), which could be a useful tool for coaches, allowing them to monitor training load with no effort.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 2601-2601
Author(s):  
Tao Zhou ◽  
Libin Chen ◽  
Jing Guo ◽  
Mengmeng Zhang ◽  
Huanhuan Liu ◽  
...  

2601 Background: Microsatellite instability (MSI) is a common genomic alteration in several tumors, such as colorectal cancer, endometrial carcinoma, and stomach, which is characterized as microsatellite instability-high (MSI-H) and microsatellite stable (MSS) based on a high degree of polymorphism in microsatellite lengths. MSI is a predictive biomarker for immunotherapy efficacy in advanced/metastatic solid tumors, especially in colorectal cancer (CRC) patients. Several computational approaches based on target panel sequencing data have been used to detect MSI; However, they are considerably affected by the sequencing depth and panel size. Methods: We developed MSIFinder, a python package for automatic MSI classification, using random forest classifier (RFC)-based genome sequencing, which is a machine learning technology. We included 19 MSI-H and 25 MSS samples as training sets. First, RFC model were built by 54 feature markers from the training sets. Second. The software was validated the classifier using a test set comprising 21 MSI-H and 379 MSS samples. Results: With this test set, MSIFinder achieved a sensitivity (recall) of 0.997, a specificity of 1, an accuracy of 0.998, a positive predictive value (PPV) of 0.954, an F1 score of 0.977, and an area under curve (AUC) of 0.999. We discovered that MSIFinder is less affected by low sequencing depth and can achieve a concordance of 0.993, while exhibiting a sequencing depth of 100×. Furthermore, we realized that MSIFinder is less affected by the panel size and can achieve a concordance of 0.99 when the panel size is 0.5 m (million base). Conclusions: These results indicated that MSIFinder is a robust MSI classification tool and not affected by the panel size and sequencing depth. Furthermore, MSIFinder can provide reliable MSI detection for scientific and clinical purposes.[Table: see text]


Sign in / Sign up

Export Citation Format

Share Document