Undersampling Approach for Imbalanced Training Sets and Induction from Multi-label Text-Categorization Domains

Author(s):  
Sareewan Dendamrongvit ◽  
Miroslav Kubat
2019 ◽  
Author(s):  
Qi Yuan ◽  
Alejandro Santana-Bonilla ◽  
Martijn Zwijnenburg ◽  
Kim Jelfs

<p>The chemical space for novel electronic donor-acceptor oligomers with targeted properties was explored using deep generative models and transfer learning. A General Recurrent Neural Network model was trained from the ChEMBL database to generate chemically valid SMILES strings. The parameters of the General Recurrent Neural Network were fine-tuned via transfer learning using the electronic donor-acceptor database from the Computational Material Repository to generate novel donor-acceptor oligomers. Six different transfer learning models were developed with different subsets of the donor-acceptor database as training sets. We concluded that electronic properties such as HOMO-LUMO gaps and dipole moments of the training sets can be learned using the SMILES representation with deep generative models, and that the chemical space of the training sets can be efficiently explored. This approach identified approximately 1700 new molecules that have promising electronic properties (HOMO-LUMO gap <2 eV and dipole moment <2 Debye), 6-times more than in the original database. Amongst the molecular transformations, the deep generative model has learned how to produce novel molecules by trading off between selected atomic substitutions (such as halogenation or methylation) and molecular features such as the spatial extension of the oligomer. The method can be extended as a plausible source of new chemical combinations to effectively explore the chemical space for targeted properties.</p>


2009 ◽  
Vol 28 (12) ◽  
pp. 3080-3083 ◽  
Author(s):  
Xiu-mei GAO ◽  
Fang CHEN ◽  
Feng-xi SONG ◽  
Zhong JIN

2021 ◽  
Vol 25 (1) ◽  
pp. 21-34
Author(s):  
Rafael B. Pereira ◽  
Alexandre Plastino ◽  
Bianca Zadrozny ◽  
Luiz H.C. Merschmann

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yersultan Mirasbekov ◽  
Adina Zhumakhanova ◽  
Almira Zhantuyakova ◽  
Kuanysh Sarkytbayev ◽  
Dmitry V. Malashenkov ◽  
...  

AbstractA machine learning approach was employed to detect and quantify Microcystis colonial morphospecies using FlowCAM-based imaging flow cytometry. The system was trained and tested using samples from a long-term mesocosm experiment (LMWE, Central Jutland, Denmark). The statistical validation of the classification approaches was performed using Hellinger distances, Bray–Curtis dissimilarity, and Kullback–Leibler divergence. The semi-automatic classification based on well-balanced training sets from Microcystis seasonal bloom provided a high level of intergeneric accuracy (96–100%) but relatively low intrageneric accuracy (67–78%). Our results provide a proof-of-concept of how machine learning approaches can be applied to analyze the colonial microalgae. This approach allowed to evaluate Microcystis seasonal bloom in individual mesocosms with high level of temporal and spatial resolution. The observation that some Microcystis morphotypes completely disappeared and re-appeared along the mesocosm experiment timeline supports the hypothesis of the main transition pathways of colonial Microcystis morphoforms. We demonstrated that significant changes in the training sets with colonial images required for accurate classification of Microcystis spp. from time points differed by only two weeks due to Microcystis high phenotypic heterogeneity during the bloom. We conclude that automatic methods not only allow a performance level of human taxonomist, and thus be a valuable time-saving tool in the routine-like identification of colonial phytoplankton taxa, but also can be applied to increase temporal and spatial resolution of the study.


Author(s):  
Robin Pla ◽  
Thibaut Ledanois ◽  
Escobar David Simbana ◽  
Anaël Aubry ◽  
Benjamin Tranchard ◽  
...  

The main aim of this study was to evaluate the validity and the reliability of a swimming sensor to assess swimming performance and spatial-temporal variables. Six international male open-water swimmers completed a protocol which consisted of two training sets: a 6×100m individual medley and a continuous 800 m set in freestyle. Swimmers were equipped with a wearable sensor, the TritonWear to collect automatically spatial-temporal variables: speed, lap time, stroke count (SC), stroke length (SL), stroke rate (SR), and stroke index (SI). Video recordings were added as a “gold-standard” and used to assess the validity and the reliability of the TritonWear sensor. The results show that the sensor provides accurate results in comparison with video recording measurements. A very high accuracy was observed for lap time with a mean absolute percentage error (MAPE) under 5% for each stroke (2.2, 3.2, 3.4, 4.1% for butterfly, backstroke, breaststroke and freestyle respectively) but high error ranges indicate a dependence on swimming technique. Stroke count accuracy was higher for symmetric strokes than for alternate strokes (MAPE: 0, 2.4, 7.1 & 4.9% for butterfly, breaststroke, backstroke & freestyle respectively). The other variables (SL, SR & SI) derived from the SC and the lap time also show good accuracy in all strokes. The wearable sensor provides an accurate real time feedback of spatial-temporal variables in six international open-water swimmers during classical training sets (at low to moderate intensities), which could be a useful tool for coaches, allowing them to monitor training load with no effort.


Author(s):  
Nicola Capuano ◽  
Santi Caballé ◽  
Jordi Conesa ◽  
Antonio Greco

AbstractMassive open online courses (MOOCs) allow students and instructors to discuss through messages posted on a forum. However, the instructors should limit their interaction to the most critical tasks during MOOC delivery so, teacher-led scaffolding activities, such as forum-based support, can be very limited, even impossible in such environments. In addition, students who try to clarify the concepts through such collaborative tools could not receive useful answers, and the lack of interactivity may cause a permanent abandonment of the course. The purpose of this paper is to report the experimental findings obtained evaluating the performance of a text categorization tool capable of detecting the intent, the subject area, the domain topics, the sentiment polarity, and the level of confusion and urgency of a forum post, so that the result may be exploited by instructors to carefully plan their interventions. The proposed approach is based on the application of attention-based hierarchical recurrent neural networks, in which both a recurrent network for word encoding and an attention mechanism for word aggregation at sentence and document levels are used before classification. The integration of the developed classifier inside an existing tool for conversational agents, based on the academically productive talk framework, is also presented as well as the accuracy of the proposed method in the classification of forum posts.


Author(s):  
Bonthala Prabhanjan Yadav ◽  
Sukhaveerji Ghate ◽  
A Harshavardhan ◽  
G Jhansi ◽  
Komuravelly Sudheer Kumar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document