scholarly journals Studying and Mitigating the Effects of Data Drifts on ML Model Performance at the Example of Chemical Toxicity Data

Author(s):  
Andrea Morger ◽  
Marina Garcia de Lomana ◽  
Ulf Norinder ◽  
Fredrik Svensson ◽  
Johannes Kirchmair ◽  
...  

Abstract Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.

Author(s):  
Xiangping Zhu ◽  
Xiatian Zhu ◽  
Minxian Li ◽  
Pietro Morerio ◽  
Vittorio Murino ◽  
...  

AbstractExisting person re-identification (re-id) methods mostly exploit a large set of cross-camera identity labelled training data. This requires a tedious data collection and annotation process, leading to poor scalability in practical re-id applications. On the other hand unsupervised re-id methods do not need identity label information, but they usually suffer from much inferior and insufficient model performance. To overcome these fundamental limitations, we propose a novel person re-identification paradigm based on an idea ofindependentper-camera identity annotation. This eliminates the most time-consuming and tedious inter-camera identity labelling process, significantly reducing the amount of human annotation efforts. Consequently, it gives rise to a more scalable and more feasible setting, which we callIntra-Camera Supervised (ICS)person re-id, for which we formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method. Specifically, MATE is designed for self-discovering the cross-camera identity correspondence in a per-camera multi-task inference framework. Extensive experiments demonstrate the cost-effectiveness superiority of our method over the alternative approaches on three large person re-id datasets. For example, MATE yields 88.7% rank-1 score on Market-1501 in the proposed ICS person re-id setting, significantly outperforming unsupervised learning models and closely approaching conventional fully supervised learning competitors.


Author(s):  
U. Aebi ◽  
L.E. Buhle ◽  
W.E. Fowler

Many important supramolecular structures such as filaments, microtubules, virus capsids and certain membrane proteins and bacterial cell walls exist as ordered polymers or two-dimensional crystalline arrays in vivo. In several instances it has been possible to induce soluble proteins to form ordered polymers or two-dimensional crystalline arrays in vitro. In both cases a combination of electron microscopy of negatively stained specimens with analog or digital image processing techniques has proven extremely useful for elucidating the molecular and supramolecular organization of the constituent proteins. However from the reconstructed stain exclusion patterns it is often difficult to identify distinct stain excluding regions with specific protein subunits. To this end it has been demonstrated that in some cases this ambiguity can be resolved by a combination of stoichiometric labeling of the ordered structures with subunit-specific antibody fragments (e.g. Fab) and image processing of the electron micrographs recorded from labeled and unlabeled structures.


Author(s):  
I. F. Gorlov ◽  
A. A. Mosolov ◽  
G. V. Komlatskiy ◽  
M. A. Nesterenko ◽  
K. D. Nimbona ◽  
...  

The article presents materials on the study of the possibility of reproduction and increase in the herd of highly productive cows through the use of embryo transplantation technology. The classical (in vivo) and more modern, developing (in vitro) methods of embryotransfer, their positive and negative sides are considered in detail. The possibility of accelerating the breeding process by using the method of transplantation, in which from one cow can be obtained from 10 to 100 calves, which will allow for 4-5 years, almost any herd (of any size and breed) with the help of biotechnology to turn into a cattle-breeding enterprise of the most modern level. At the same time, heifers obtained from unproductive cows can be used as "surrogate" mothers who are transplanted with the best donor embryos, which allows to obtain a full-fledged offspring adapted to local environmental conditions. A detailed scheme of obtaining, evaluation, storage, as well as the cost and economic effect of embryo transplantation was calculated, the market was evaluated, the required annual volume of transplants and the number of donor cows for large livestock farms were determined. As a positive example of "Scientific-production enterprise "Centre of biotechnology and embryo transfer" in 2014, implemented a project for accelerated replacement and genetic improvement of the dairy herd, engraftment averaged 57-69%, and the economic effect of the enterprise from getting a single animal by the method of embryo transfer, compared with imports of similar close in quality, ranged from 60 to 100 thousand rubles on his head. It is shown that it is necessary to organize at the state level a developed service for embryo transplantation to reduce the cost of embryo transfer and the possibility of creating in a short time in the country's own highly productive breeding nucleus of dairy and beef cattle, which will reduce, and in the future completely eliminate, import dependence on cattle products.


2018 ◽  
Vol 18 (2) ◽  
pp. 277-285 ◽  
Author(s):  
Mohsen Mohammadgholi ◽  
Nourollah Sadeghzadeh ◽  
Mostafa Erfani ◽  
Saeid Abediankenari ◽  
Seyed Mohammad Abedi ◽  
...  

Background: Human fibronectin extra-domain B (EDB) is particularly expressed during angiogenesis progression. It is, thus, a promising marker of tumour growth. Aptides are a novel class of peptides with high-affinity binding to specific protein targets. APTEDB is an antagonist-like ligand that especially interacts with human fibronectin EDB. Objective: This study was the first attempt in which the hydrazinonicotinamide (HYNIC)-conjugated APTEDB was labelled with technetium-99m (99mTc) as an appropriate radiotracer and tricine/EDDA exchange labeling. Methods: Radiochemical purity, normal saline, and serum stability were evaluated by HPLC and radio-isotope TLC scanner. Other examinations, such as protein-binding calculation, dissociation radioligand binding assay, and partition coefficient constant determination, were also carried out. The cellular-specific binding of 99mTc- HYNIC-conjugated APTEDB was assessed in two EDB-positive (U87MG) and EDB-negative (U373MG) cell lines. Bio-distribution was investigated in normal mice as well as in U87MG and U373MG tumour-bearing mice. Eventually, the radiolabelled APTEDB was used for tumour imaging using planar SPECT. Results: Radiolabelling was achieved with high purity (up to 97%) and accompanied by high solution (over 90% after overnight) and serum (80% after 2 hours) stability. The obtained cellular-specific binding ratio was greater than nine-fold. In-vivo experiments showed rapid blood clearance with mainly renal excretion and tumour uptake specificity (0.48±0.03% ID/g after 1h). The results of the imaging also confirmed considerable tumour uptake for EDB-positive cell line compared with the EDB-negative one. Conclusion: Aptides are considered to be a potent candidate for biopharmaceutical applications. They can be modified with imaging or therapeutic agents. This report shows the capability of 99mTc-HYNIC-APTEDB for human EDB-expressing tumours detection.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4638
Author(s):  
Bummo Koo ◽  
Jongman Kim ◽  
Yejin Nam ◽  
Youngho Kim

In this study, algorithms to detect post-falls were evaluated using the cross-dataset according to feature vectors (time-series and discrete data), classifiers (ANN and SVM), and four different processing conditions (normalization, equalization, increase in the number of training data, and additional training with external data). Three-axis acceleration and angular velocity data were obtained from 30 healthy male subjects by attaching an IMU to the middle of the left and right anterior superior iliac spines (ASIS). Internal and external tests were performed using our lab dataset and SisFall public dataset, respectively. The results showed that ANN and SVM were suitable for the time-series and discrete data, respectively. The classification performance generally decreased, and thus, specific feature vectors from the raw data were necessary when untrained motions were tested using a public dataset. Normalization made SVM and ANN more and less effective, respectively. Equalization increased the sensitivity, even though it did not improve the overall performance. The increase in the number of training data also improved the classification performance. Machine learning was vulnerable to untrained motions, and data of various movements were needed for the training.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1688
Author(s):  
Luqman Ali ◽  
Fady Alnajjar ◽  
Hamad Al Jassmi ◽  
Munkhjargal Gochoo ◽  
Wasif Khan ◽  
...  

This paper proposes a customized convolutional neural network for crack detection in concrete structures. The proposed method is compared to four existing deep learning methods based on training data size, data heterogeneity, network complexity, and the number of epochs. The performance of the proposed convolutional neural network (CNN) model is evaluated and compared to pretrained networks, i.e., the VGG-16, VGG-19, ResNet-50, and Inception V3 models, on eight datasets of different sizes, created from two public datasets. For each model, the evaluation considered computational time, crack localization results, and classification measures, e.g., accuracy, precision, recall, and F1-score. Experimental results demonstrated that training data size and heterogeneity among data samples significantly affect model performance. All models demonstrated promising performance on a limited number of diverse training data; however, increasing the training data size and reducing diversity reduced generalization performance, and led to overfitting. The proposed customized CNN and VGG-16 models outperformed the other methods in terms of classification, localization, and computational time on a small amount of data, and the results indicate that these two models demonstrate superior crack detection and localization for concrete structures.


2016 ◽  
Vol 113 (21) ◽  
pp. E2899-E2905 ◽  
Author(s):  
Irina O. Vvedenskaya ◽  
Hanif Vahedian-Movahed ◽  
Yuanchao Zhang ◽  
Deanne M. Taylor ◽  
Richard H. Ebright ◽  
...  

During transcription initiation, RNA polymerase (RNAP) holoenzyme unwinds ∼13 bp of promoter DNA, forming an RNAP-promoter open complex (RPo) containing a single-stranded transcription bubble, and selects a template-strand nucleotide to serve as the transcription start site (TSS). In RPo, RNAP core enzyme makes sequence-specific protein–DNA interactions with the downstream part of the nontemplate strand of the transcription bubble (“core recognition element,” CRE). Here, we investigated whether sequence-specific RNAP–CRE interactions affect TSS selection. To do this, we used two next-generation sequencing-based approaches to compare the TSS profile of WT RNAP to that of an RNAP derivative defective in sequence-specific RNAP–CRE interactions. First, using massively systematic transcript end readout, MASTER, we assessed effects of RNAP–CRE interactions on TSS selection in vitro and in vivo for a library of 47 (∼16,000) consensus promoters containing different TSS region sequences, and we observed that the TSS profile of the RNAP derivative defective in RNAP–CRE interactions differed from that of WT RNAP, in a manner that correlated with the presence of consensus CRE sequences in the TSS region. Second, using 5′ merodiploid native-elongating-transcript sequencing, 5′ mNET-seq, we assessed effects of RNAP–CRE interactions at natural promoters in Escherichia coli, and we identified 39 promoters at which RNAP–CRE interactions determine TSS selection. Our findings establish RNAP–CRE interactions are a functional determinant of TSS selection. We propose that RNAP–CRE interactions modulate the position of the downstream end of the transcription bubble in RPo, and thereby modulate TSS selection, which involves transcription bubble expansion or transcription bubble contraction (scrunching or antiscrunching).


1999 ◽  
Vol 147 (6) ◽  
pp. 1275-1286 ◽  
Author(s):  
Conrad L. Leung ◽  
Dongming Sun ◽  
Min Zheng ◽  
David R. Knowles ◽  
Ronald K.H. Liem

We cloned and characterized a full-length cDNA of mouse actin cross-linking family 7 (mACF7) by sequential rapid amplification of cDNA ends–PCR. The completed mACF7 cDNA is 17 kb and codes for a 608-kD protein. The closest relative of mACF7 is the Drosophila protein Kakapo, which shares similar architecture with mACF7. mACF7 contains a putative actin-binding domain and a plakin-like domain that are highly homologous to dystonin (BPAG1-n) at its NH2 terminus. However, unlike dystonin, mACF7 does not contain a coiled–coil rod domain; instead, the rod domain of mACF7 is made up of 23 dystrophin-like spectrin repeats. At its COOH terminus, mACF7 contains two putative EF-hand calcium-binding motifs and a segment homologous to the growth arrest–specific protein, Gas2. In this paper, we demonstrate that the NH2-terminal actin-binding domain of mACF7 is functional both in vivo and in vitro. More importantly, we found that the COOH-terminal domain of mACF7 interacts with and stabilizes microtubules. In transfected cells full-length mACF7 can associate not only with actin but also with microtubules. Hence, we suggest a modified name: MACF (microtubule actin cross-linking factor). The properties of MACF are consistent with the observation that mutations in kakapo cause disorganization of microtubules in epidermal muscle attachment cells and some sensory neurons.


2016 ◽  
Vol 12 (6) ◽  
pp. 1731-1745 ◽  
Author(s):  
Jonathan Lotze ◽  
Ulrike Reinhardt ◽  
Oliver Seitz ◽  
Annette G. Beck-Sickinger

Peptide-tag based labelling can be achieved by (i) enzymes (ii) recognition of metal ions or small molecules and (iii) peptide–peptide interactions and enables site-specific protein visualization to investigate protein localization and trafficking.


Sign in / Sign up

Export Citation Format

Share Document