scholarly journals RIPTIDE: Learning violation prediction models from boarding activity data

Author(s):  
Hans Chalupsky ◽  
Eduard Hovy
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sylvia Kalli ◽  
Carla Araya-Cloutier ◽  
Jos Hageman ◽  
Jean-Paul Vincken

AbstractHigh resistance towards traditional antibiotics has urged the development of new, natural therapeutics against methicillin-resistant Staphylococcus aureus (MRSA). Prenylated (iso)flavonoids, present mainly in the Fabaceae, can serve as promising candidates. Herein, the anti-MRSA properties of 23 prenylated (iso)flavonoids were assessed in-vitro. The di-prenylated (iso)flavonoids, glabrol (flavanone) and 6,8-diprenyl genistein (isoflavone), together with the mono-prenylated, 4′-O-methyl glabridin (isoflavan), were the most active anti-MRSA compounds (Minimum Inhibitory Concentrations (MIC) ≤ 10 µg/mL, 30 µM). The in-house activity data was complemented with literature data to yield an extended, curated dataset of 67 molecules for the development of robust in-silico prediction models. A QSAR model having a good fit (R2adj 0.61), low average prediction errors and a good predictive power (Q2) for the training (4% and Q2LOO 0.57, respectively) and the test set (5% and Q2test 0.75, respectively) was obtained. Furthermore, the model predicted well the activity of an external validation set (on average 5% prediction errors), as well as the level of activity (low, moderate, high) of prenylated (iso)flavonoids against other Gram-positive bacteria. For the first time, the importance of formal charge, besides hydrophobic volume and hydrogen-bonding, in the anti-MRSA activity was highlighted, thereby suggesting potentially different modes of action of the different prenylated (iso)flavonoids.


2021 ◽  
Author(s):  
Angela Lopez-del Rio ◽  
Sergio Picart ◽  
Alexandre Perera-Lluna

<div>In silico analysis of biological activity data has become an essential technique in pharmaceutical development. </div><div>Specifically, the so-called proteochemometric models aim to share information between targets in machine learning ligand-target activity prediction models. </div><div>However, bioactivity datasets used in proteochemometrics modeling are usually imbalanced, which could potentially affect the performance of the models. In this work, we explored the effect of different balancing strategies in deep learning proteochemometric target-compound activity classification models while controlling for the compound series bias through clustering. These strategies were: (1) no_resampling, (2) resampling_after_clustering, (3) resampling_before_clustering and (4) semi_resampling. </div><div>These schemas were evaluated in kinases and GPCRs from BindingDB. </div><div>We observed that the predicted proportion of positives was driven by the actual data balance in the test set. </div><div>Additionally, it was confirmed that data balance had an impact on the performance estimates of the proteochemometrics model. </div><div>We recommend a combination of data augmentation and clustering in the training set (semi_resampling) in order to mitigate the data imbalance effect in a realistic scenario. </div><div>The code of this analysis is publicly available at https://github.com/b2slab/imbalance_pcm_benchmark.</div>


Cells ◽  
2019 ◽  
Vol 8 (11) ◽  
pp. 1431
Author(s):  
Réau ◽  
Lagarde ◽  
Zagury ◽  
Montes

The androgen receptor (AR) is a transcription factor that plays a key role in sexual phenotype and neuromuscular development. AR can be modulated by exogenous compounds such as pharmaceuticals or chemicals present in the environment, and particularly by AR agonist compounds that mimic the action of endogenous agonist ligands and whether restore or alter the AR endocrine system functions. The activation of AR must be correctly balanced and identifying potent AR agonist compounds is of high interest to both propose treatments for certain diseases, or to predict the risk related to agonist chemicals exposure. The development of in silico approaches and the publication of structural, affinity and activity data provide a good framework to develop rational AR hits prediction models. Herein, we present a docking and a pharmacophore modeling strategy to help identifying AR agonist compounds. All models were trained on the NR-DBIND that provides high quality binding data on AR and tested on AR-agonist activity assays from the Tox21 initiative. Both methods display high performance on the NR-DBIND set and could serve as starting point for biologists and toxicologists. Yet, the pharmacophore models still need data feeding to be used as large scope undesired effect prediction models.


2021 ◽  
Author(s):  
Lewis Mervin ◽  
Maria-Anna Trapotsi ◽  
Avid M. Afzal ◽  
Ian Barrett ◽  
Andreas Bender ◽  
...  

<p>In the context of small molecule property prediction, experimental errors are usually a neglected aspect during model generation. The main caveat to binary classification approaches is that they weight minority cases close to the threshold boundary equivalently in distinguishing between activity classes. For example, a pXC50 activity value of 5.1 or 4.9 are treated equally important in contributing to the opposing activity (e.g., classification threshold of 5), even though experimental error may not afford such discriminatory accuracy. This is detrimental in practice and therefore it is equally important to evaluate the presence of experimental error in databases and apply methodologies to account for variability in experiments and uncertainty near the decision boundary.<br></p><p></p><p> </p><p>In order to improve upon this, we herein present a novel approach toward predicting protein-ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF comprises a modification to the long-established Random Forest (RF), to take into account uncertainties in the assigned classes (i.e., activity labels). This enables representing the activity in a framework in-between the classification and regression architecture, with philosophical differences from either approach. Compared to classification, this approach enables better representation of factors increasing/decreasing inactivity. Conversely, one can utilize all data (even delimited/operand/censored data far from a cut-off) at the same time as taking into account the granularity around the cut-off, compared to a classical regression framework. The algorithm was applied toward ~550 target prediction tasks from ChEMBL and PubChem. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information is not considered in any way in the original RF algorithm. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold). The RF models gave errors smaller than the experimental uncertainty, which could indicate that they are <i>overtrained</i> and/or <i>over-confident</i>. Overall, we show that PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold. With this approach, we present, to our knowledge, for the first time an application of probabilistic modelling of activity data for target prediction using the PRF algorithm.</p>


Atmosphere ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 1172
Author(s):  
Hyunsu Hong ◽  
Hyungjin Jeon ◽  
Cheong Youn ◽  
Hyeon-Soo Kim

Air pollution sources and the hazards of high particulate matter 2.5 (PM2.5) concentrations among air pollutants have been well documented. Shipping emissions have been identified as a source of air pollution; therefore, it is necessary to predict air pollutant concentrations to manage seaport air quality. However, air pollution prediction models rarely consider shipping emissions. Here, the PM2.5 concentrations of the Busan North and Busan New Ports were predicted using a recurrent neural network and long short-term memory model by employing the shipping activity data of Busan Port. In contrast to previous studies that employed only air quality and meteorological data as input data, our model considered shipping activity data as an emission source. The model was trained from 1 January 2019 to 31 January 2020 and predictions and verifications were performed from 1–28 February 2020. Verifications revealed an index of agreements (IOA) of 0.975 and 0.970 and root mean square errors of 4.88 and 5.87 µg/m3 for Busan North Port and Busan New Port, respectively. Regarding the results based on the activity data, a previous study reported an IOA of 0.62–0.84, with a higher predictive power of 0.970–0.975. Thus, the extended approach offers a useful strategy to prevent PM2.5 air pollutant-induced damage in seaports.


2021 ◽  
Author(s):  
Lewis Mervin ◽  
Maria-Anna Trapotsi ◽  
Avid M. Afzal ◽  
Ian Barrett ◽  
Andreas Bender ◽  
...  

<p>In the context of small molecule property prediction, experimental errors are usually a neglected aspect during model generation. The main caveat to binary classification approaches is that they weight minority cases close to the threshold boundary equivalently in distinguishing between activity classes. For example, a pXC50 activity value of 5.1 or 4.9 are treated equally important in contributing to the opposing activity (e.g., classification threshold of 5), even though experimental error may not afford such discriminatory accuracy. This is detrimental in practice and therefore it is equally important to evaluate the presence of experimental error in databases and apply methodologies to account for variability in experiments and uncertainty near the decision boundary.<br></p><p></p><p> </p><p>In order to improve upon this, we herein present a novel approach toward predicting protein-ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF comprises a modification to the long-established Random Forest (RF), to take into account uncertainties in the assigned classes (i.e., activity labels). This enables representing the activity in a framework in-between the classification and regression architecture, with philosophical differences from either approach. Compared to classification, this approach enables better representation of factors increasing/decreasing inactivity. Conversely, one can utilize all data (even delimited/operand/censored data far from a cut-off) at the same time as taking into account the granularity around the cut-off, compared to a classical regression framework. The algorithm was applied toward ~550 target prediction tasks from ChEMBL and PubChem. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information is not considered in any way in the original RF algorithm. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold). The RF models gave errors smaller than the experimental uncertainty, which could indicate that they are <i>overtrained</i> and/or <i>over-confident</i>. Overall, we show that PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold. With this approach, we present, to our knowledge, for the first time an application of probabilistic modelling of activity data for target prediction using the PRF algorithm.</p>


2021 ◽  
Author(s):  
Angela Lopez-del Rio ◽  
Sergio Picart ◽  
Alexandre Perera-Lluna

<div>In silico analysis of biological activity data has become an essential technique in pharmaceutical development. </div><div>Specifically, the so-called proteochemometric models aim to share information between targets in machine learning ligand-target activity prediction models. </div><div>However, bioactivity datasets used in proteochemometrics modeling are usually imbalanced, which could potentially affect the performance of the models. In this work, we explored the effect of different balancing strategies in deep learning proteochemometric target-compound activity classification models while controlling for the compound series bias through clustering. These strategies were: (1) no_resampling, (2) resampling_after_clustering, (3) resampling_before_clustering and (4) semi_resampling. </div><div>These schemas were evaluated in kinases and GPCRs from BindingDB. </div><div>We observed that the predicted proportion of positives was driven by the actual data balance in the test set. </div><div>Additionally, it was confirmed that data balance had an impact on the performance estimates of the proteochemometrics model. </div><div>We recommend a combination of data augmentation and clustering in the training set (semi_resampling) in order to mitigate the data imbalance effect in a realistic scenario. </div><div>The code of this analysis is publicly available at https://github.com/b2slab/imbalance_pcm_benchmark.</div>


2013 ◽  
Vol 1 (1) ◽  
pp. 13
Author(s):  
Javaria Manzoor Shaikh ◽  
JaeSeung Park

Usually elongated hospitalization is experienced byBurn patients, and the precise forecast of the placement of patientaccording to the healing acceleration has significant consequenceon healthcare supply administration. Substantial amount ofevidence suggest that sun light is essential to burns healing andcould be exceptionally beneficial for burned patients andworkforce in healthcare building. Satisfactory UV sunlight isfundamental for a calculated amount of burn to heal; this delicaterather complex matrix is achieved by applying patternclassification for the first time on the space syntax map of the floorplan and Browder chart of the burned patient. On the basis of thedata determined from this specific healthcare learning technique,nurse can decide the location of the patient on the floor plan, hencepatient safety first is the priority in the routine tasks by staff inhealthcare settings. Whereas insufficient UV light and vitamin Dcan retard healing process, hence this experiment focuses onmachine learning design in which pattern recognition andtechnology supports patient safety as our primary goal. In thisexperiment we lowered the adverse events from 2012- 2013, andnearly missed errors and prevented medical deaths up to 50%lower, as compared to the data of 2005- 2012 before this techniquewas incorporated.In this research paper, three distinctive phases of clinicalsituations are considered—primarily: admission, secondly: acute,and tertiary: post-treatment according to the burn pattern andhealing rate—and be validated by capable AI- origin forecastingtechniques to hypothesis placement prediction models for eachclinical stage with varying percentage of burn i.e. superficialwound, partial thickness or full thickness deep burn. Conclusivelywe proved that the depth of burn is directly proportionate to thedepth of patient’s placement in terms of window distance. Ourfindings support the hypothesis that the windowed wall is mosthealing wall, here fundamental suggestion is support vectormachines: which is most advantageous hyper plane for linearlydivisible patterns for the burns depth as well as the depth map isused.


2017 ◽  
Vol 6 (1) ◽  
pp. 32
Author(s):  
Nismarni Nismarni

In the background backs Indonesian learning results obtained by the students is very low because the method of learning that are not relevant. Classroom action research aims to determine the implementation of cooperative learning model Numbered Heads Together (NHT) to improve learning outcomes Indonesian grade IV A SD Negeri 78 Pekanbaru on instructional materials do. The experiment was conducted in two cycles each cycle two meetings and one daily tests. Each cycle stages are: planning, implementation, observation and reflection. Data from the activity of teachers and students in the can from the observation sheet, while, learning outcomes in getting the daily test results. The results showed the activities of teachers and students has increased, in the first cycle of meetings I obtained a score of 33 (68.75%), in the first cycle of meetings II obtained a score of 38 (79.17%), the second cycle of meetings I obtained a score of 40 (83 , 33%), and the second cycle II meeting obtained a score of 44 (91.67%). And in the first cycle of the first meeting of student activity data obtained a score of 27 (56.25%), in the first cycle II meeting increased with the acquisition of a score of 36 (75.00%), and the second cycle first meeting increased to 41 (85.42 %), the second cycle II meeting increased to 45 (93.75%). Learning outcomes of students has increased, this is evidenced by: the preliminary data the number of students who reach KKM amounted to 10 students (28.57%) with an average of learning outcomes at 65.37. Increased in the first cycle by the number of students who completed totaling 26 students (74.28%) with an average of learning outcomes at 76.00. And the second cycle increases with the number of students 32 students (91.42%) with an average of learning outcomes at 86.86. Based on these results it can be concluded that the implementation of cooperative learning model NHT can improve learning outcomes Indonesian grade IV A SD Negeri 78 Pekanbaru. 


2012 ◽  
Vol 3 (2) ◽  
pp. 48-50
Author(s):  
Ana Isabel Velasco Fernández ◽  
◽  
Ricardo José Rejas Muslera ◽  
Juan Padilla Fernández-Vega ◽  
María Isabel Cepeda González

Sign in / Sign up

Export Citation Format

Share Document