scholarly journals Improved Training of CAE-Based Defect Detectors Using Structural Noise

2021 ◽  
Vol 11 (24) ◽  
pp. 12062
Author(s):  
Reina Murakami ◽  
Valentin Grave ◽  
Osamu Fukuda ◽  
Hiroshi Okumura ◽  
Nobuhiko Yamaguchi

Appearances of products are important to companies as they reflect the quality of their manufacture to customers. Nowadays, visual inspection is conducted by human inspectors. This research attempts to automate this process using Convolutional AutoEncoders (CAE). Our models were trained using images of non-defective parts. Previous research on autoencoders has reported that the accuracy of image regeneration can be improved by adding noise to the training dataset, but no extensive analyse of the noise factor has been done. Therefore, our method compares the effects of two different noise patterns on the models efficiency: Gaussian noise and noise made of a known structure. The test datasets were comprised of “defective” parts. Over the experiments, it has mostly been observed that the precision of the CAE sharpened when using noisy data during the training phases. The best results were obtained with structural noise, made of defined shapes randomly corrupting training data. Furthermore, the models were able to process test data that had slightly different positions and rotations compared to the ones found in the training dataset. However, shortcomings appeared when “regular” spots (in the training data) and “defective” spots (in the test data) partially, or totally, overlapped.

2021 ◽  
Author(s):  
Louise Bloch ◽  
Christoph M. Friedrich

Abstract Background: The prediction of whether Mild Cognitive Impaired (MCI) subjects will prospectively develop Alzheimer's Disease (AD) is important for the recruitment and monitoring of subjects for therapy studies. Machine Learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to noisy data sets. Additional noise is introduced by multicentric study designs and varying acquisition protocols. This article examines whether an automatic and fair data valuation method based on Shapley values can identify subjects with noisy data. Methods: An ML-workow was developed and trained for a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test data set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workow included volumetric Magnetic Resonance Imaging (MRI) feature extraction, subject sample selection using data Shapley, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for model training and Kernel SHapley Additive exPlanations (SHAP) values for model interpretation. This model interpretation enables clinically relevant explanation of individual predictions. Results: The XGBoost models which excluded 116 of the 467 subjects from the training data set based on their Logistic Regression (LR) data Shapley values outperformed the models which were trained on the entire training data set and which reached a mean classification accuracy of 58.54 % by 14.13 % (8.27 percentage points) on the independent ADNI test data set. The XGBoost models, which were trained on the entire training data set reached a mean accuracy of 60.35 % for the AIBL data set. An improvement of 24.86 % (15.00 percentage points) could be reached for the XGBoost models if those 72 subjects with the smallest RF data Shapley values were excluded from the training data set. Conclusion: The data Shapley method was able to improve the classification accuracies for the test data sets. Noisy data was associated with the number of ApoEϵ4 alleles and volumetric MRI measurements. Kernel SHAP showed that the black-box models learned biologically plausible associations.


2021 ◽  
Vol 33 (5) ◽  
pp. 83-104
Author(s):  
Aleksandr Igorevich Getman ◽  
Maxim Nikolaevich Goryunov ◽  
Andrey Georgievich Matskevich ◽  
Dmitry Aleksandrovich Rybolovlev

The paper discusses the issues of training models for detecting computer attacks based on the use of machine learning methods. The results of the analysis of publicly available training datasets and tools for analyzing network traffic and identifying features of network sessions are presented sequentially. The drawbacks of existing tools and possible errors in the datasets formed with their help are noted. It is concluded that it is necessary to collect own training data in the absence of guarantees of the public datasets reliability and the limited use of pre-trained models in networks with characteristics that differ from the characteristics of the network in which the training traffic was collected. A practical approach to generating training data for computer attack detection models is proposed. The proposed solutions have been tested to evaluate the quality of model training on the collected data and the quality of attack detection in conditions of real network infrastructure.


Author(s):  
Michael Auer ◽  
Mark D. Griffiths

AbstractPlayer protection and harm minimization have become increasingly important in the gambling industry along with the promotion of responsible gambling (RG). Among the most widespread RG tools that gaming operators provide are limit-setting tools that help players limit the amount of time and/or money they spend gambling. Research suggests that limit-setting significantly reduces the amount of money that players spend. If limit-setting is to be encouraged as a way of facilitating responsible gambling, it is important to know what variables are important in getting individuals to set and change limits in the first place. In the present study, 33 variables assessing the player behavior among Norsk Tipping clientele (N = 70,789) from January to March 2017 were computed. The 33 variables which reflect the players’ behavior were then used to predict the likelihood of gamblers changing their monetary limit between April and June 2017. The 70,789 players were randomly split into a training dataset of 56,532 and an evaluation set of 14,157 players (corresponding to an 80/20 split). The results demonstrated that it is possible to predict future limit-setting based on player behavior. The random forest algorithm appeared to predict limit-changing behavior much better than the other algorithms. However, on the independent test data, the random forest algorithm’s accuracy dropped significantly. The best performance on the test data along with a small decrease in accuracy in comparison to the training data was delivered by the gradient boost machine learning algorithm. The most important variables predicting future limit-setting using the gradient boost machine algorithm were players receiving feedback that they had reached 80% of their personal monthly global loss limit, personal monthly loss limit, the amount bet, theoretical loss, and whether the players had increased their limits in the past. With the help of predictive analytics, players with a high likelihood of changing their limits can be proactively approached.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 580-581 ◽  
Author(s):  
V. S. Haynes ◽  
J. Curtis ◽  
F. Xie ◽  
I. Lipkovich ◽  
H. Zhao ◽  
...  

Background:Patients with rheumatoid arthritis (RA) experience fluctuating symptoms, increased pain, decreased function and variable quality of life; such changes often occur between visits to clinicians. Digital Tracking of Arthritis Longitudinally (DIGITAL) study2is evaluating the use of electronically captured patient-reported outcomes (ePRO) and passive data collection from a Fitbit device to identify disease worsening in a real-world study of participants (pts) with RA.Objectives:Evaluate agreement between self-reported new-onset flare and ePROs in an interim analysis from DIGITAL using a classification model.Methods:Members of the ArthritisPower registry with RA were invited to participate in DIGITAL. Pts who successfully completed a two-week Lead-in period entered the Main Study in which they wore a smartwatch and provided daily (pain and fatigue numeric rating scales (NRS)) and weekly ePROs, including the OMERACT RA Flare Questionnaire (FLARE) and PROMIS measures. This interim analysis is of ePRO data from pts who completed at least 30 days of the Main Study. A “Yes” response to the FLARE item, “Are you having a flare now?” identified flare. For modeling association between new-onset flare and ePRO, the dataset was split into training (the first 30 days of the Main Study) and test data (Day 31 and following). Within each dataset, repeated binary outcomes (Flare/No Flare) per pt were defined each week. To focus on new-onset flare, within each dataset, outcomes for patient weeks for which flare was present in the previous week were excluded.Candidate variables for the model included baseline and current FLARE score (0-50 scale) and each of its 5 items, daily pain, daily fatigue, and several PROMIS weekly instruments and their lagged values (last week or last 6 days for daily). ‘Baseline’ was calculated in non-flare weeks. Training data was used for logistic regression model selection combining clinical expertise with backward elimination. Performance of the final model was evaluated using test data.Results:The training data was composed of outcomes from 128 pts who reported 388 weekly flare assessments as no flare or onset flare over 2800 days during the first month of the Main Study. Of pts in the training dataset, 92.2% were female, 87.5% white, with mean age (SD) 52.7 (11.0) and years since RA diagnosis 10.4 (10.3); 62.5% were on a biologic. Among those in the training dataset, 58 flare outcomes occurred in 50 (39.1%) unique pts.The test data comprised outcomes from 123 pts who reported 442 weekly flare assessments as no flare or onset flare over 3366 days in which 64 flare outcomes occurred, and primarily included continued observations from pts who contributed to the training dataset.The best-performing model to classify flare in training data included the current and baseline FLARE instrument activity question (i.e. “Considering how active your rheumatoid arthritis has been, how much difficulty have you had when taking part in activities such as work, family life, social events that are typical for you during the last week”), current daily pain, and baseline daily pain average and standard deviation. In test data, this model had an area under the receiver operator curve of 0.81 (Figure). At a cut point requiring specificity to be ≥0.80, sensitivity to detect flare was 0.62 and overall accuracy was 0.78.Conclusion:New-onset flare is common among RA patients, and the FLARE instrument and daily pain scores appear effective to classify it. Evaluation of passive data as a proxy for self-reported new-onset flare is ongoing.References:[1]Bartlett SJ, et al. JRheumatol, 2017;44:1536-43.[2]Nowell WB, et al. JMIR Res Protoc, 2019;8:e14665.Disclosure of Interests:Virginia S. Haynes Shareholder of: Eli Lilly and Company, Employee of: Eli Lilly and Company, Jeffrey Curtis Grant/research support from: AbbVie, Amgen, Bristol-Myers Squibb, Corrona, Janssen, Lilly, Myriad, Pfizer, Regeneron, Roche, UCB, Consultant of: AbbVie, Amgen, Bristol-Myers Squibb, Corrona, Janssen, Lilly, Myriad, Pfizer, Regeneron, Roche, UCB, Fenglong Xie: None declared, Ilya Lipkovich Shareholder of: Eli Lilly and Company, Employee of: Eli Lilly and Company, Hong Zhao: None declared, Carol L. Kannowski Shareholder of: Eli Lilly and Company, Employee of: Eli Lilly and Company, Jiat-Ling Poon Shareholder of: Eli Lilly and Company, Employee of: Eli Lilly and Company, Kelly Gavigan: None declared, David Curtis: None declared, Sandra K. Nolot Shareholder of: Eli Lilly and Company, Employee of: Eli Lilly and Company, W. Benjamin Nowell: None declared


2021 ◽  
Vol 905 (1) ◽  
pp. 012018
Author(s):  
I Y Prayogi ◽  
Sandra ◽  
Y Hendrawan

Abstract The objective of this study is to classify the quality of dried clove flowers using deep learning method with Convolutional Neural Network (CNN) algorithm, and also to perform the sensitivity analysis of CNN hyperparameters to obtain best model for clove quality classification process. The quality of clove as raw material in this study was determined according to SNI 3392-1994 by PT. Perkebunan Nusantara XII Pancusari Plantation, Malang, East Java, Indonesia. In total 1,600 images of dried clove flower were divided into 4 qualities. Each clove quality has 225 training data, 75 validation data, and 100 test data. The first step of this study is to build CNN model architecture as first model. The result of that model gives 65.25% reading accuracy. The second step is to analyze CNN sensitivity or CNN hyperparameter on the first model. The best value of CNN hyperparameter in each step then to be used in the next stage. Finally, after CNN hyperparameter carried out the reading accuracy of the test data is improved to 87.75%.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
R Hariharan ◽  
P He ◽  
C Hickman ◽  
J Chambost ◽  
C Jacques ◽  
...  

Abstract Study question Is a pre-trained machine learning algorithm able to accurately detect cellular arrangement in 4-cell embryos from a different continent? Summary answer Artificial Intelligence (AI) analysis of 4-cell embryo classification is transferable across clinics globally with 79% accuracy. What is known already Previous studies observing four-cell human embryo configurations have demonstrated that non-tetrahedral embryos (embryos in which cells make contact with fewer than 3 other cells) are associated with compromised blastulation and implantation potential. Previous research by this study group has indicated the efficacy of AI models in classification of tetrahedral and non-tetrahedral embryos with 87% accuracy, with a database comprising 2 clinics both from the same country (Brazil). This study aims to evaluate the transferability and robustness of this model on blind test data from a different country (France). Study design, size, duration The study was a retrospective cohort analysis in which 909 4-cell embryo images (“tetrahedral”, n = 749; “non-tetrahedral”, n = 160) were collected from 3 clinics (2 Brazilian, 1 French). All embryos were captured at the central focal plane using Embryoscope™ time-lapse incubators. The training data consisted solely of embryo images captured in Brazil (586 tetrahedral; 87 non-tetrahedral) and the test data consisted exclusively of embryo images captured in France (163 tetrahedral; 72 non-tetrahedral). Participants/materials, setting, methods The embryo images were labelled as either “tetrahedral” or “non-tetrahedral” at their respective clinics. Annotations were then validated by three operators. A ResNet–50 neural network model pretrained on ImageNet was fine-tuned on the training dataset to predict the correct annotation for each image. We used the cross entropy loss function and the RMSprop optimiser (lr = 1e–5). Simple data augmentations (flips and rotations) were used during the training process to help counteract class imbalances. Main results and the role of chance Our model was capable of classifying embryos in the blind French test set with 79% accuracy when trained with the Brazilian data. The model had sensitivity of 91% and 51% for tetrahedral and non-tetrahedral embryos respectively; precision was 81% and 73%; F1 score was 86% and 60%; and AUC was 0.61 and 0.64. This represents a 10% decrease in accuracy compared to when the model both trained and tested on different data from the same clinics. Limitations, reasons for caution Although strict inclusion and exclusion criteria were used, inter-operator variability may affect the pre-processing stage of the algorithm. Moreover, as only one focal plane was used, ambiguous cases were interpoloated and further annotated. Analysing embryos at multiple focal planes may prove crucial in improving the accuracy of the model. Wider implications of the findings: Though the use of machine learning models in the analysis of embryo imagery has grown in recent years, there has been concern over their robustness and transferability. While previous results have demonstrated the utility of locally-trained models, our results highlight the potential for models to be implemented across different clinics. Trial registration number Not applicable


2020 ◽  
Vol 7 (2) ◽  
pp. 379
Author(s):  
Agung Wahyu Setiawan ◽  
Alfie R. Ananda

<p class="Abstrak">Salah satu permasalahan utama dalam industri kelapa sawit adalah proses sortasi Tandan Buah Segar (TBS) di pabrik kelapa sawit. Parameter yang digunakan dalam sortasi TBS adalah jumlah brondolan kelapa sawit. Pada saat ini, sortasi dilakukan oleh <em>grader</em> yang bersifat subyektif dan sering kali tidak konsisten. Hal ini terjadi karena keterbatasan penglihatan dan kemampuan manusia untuk mengolah informasi jumlah brondolan setiap TBS dalam waktu yang terbatas. Oleh karena itu, pada penelitian ini dikembangkan sistem penilaian kematangan TBS kelapa sawit berbasis spektroskopi dan nilai kontras citras. Sumber cahaya yang digunakan pada penelitian ini adalah lampu berjenis <em>Light-emitting Diode</em> (LED) dengan panjang gelombang 680 dan 750 nm. Akuisisi citra TBS dilakukan dengan menggunakan kamera DSLR yang telah dimodifikasi. sehingga diperoleh dua citra TBS pada panjang gelombang 680 dan 750 nm. Kemudian, dilakukan perhitungan nilai kontras kedua citra tersebut. Dalam penelitian ini, terdapat 24 TBS yang digunakan sebagai data latih, dengan komposisi 10 TBS matang dan 14 TBS mentah. Data uji yang digunakan berjumlah 77 TBS yang terdiri dari 38 matang dan 39 mentah. Pada penelitian ini, <em>Support Vector Machine</em> (SVM) digunakan sebagai metode klasifikasi. Akurasi data latih yang diperoleh adalah 66,67%. Sedangkan akurasi data uji dari sistem yang dikembangkan dalam penelitian ini adalah 57,14%. Hasil yang diperoleh ini masih perlu diperbaiki untuk meningkatkan akurasi sistem dengan cara menambah jumlah data, baik data latih maupun uji, serta menggunakan pembelajaran mesin.</p><p class="Abstrak"> </p><p class="Abstrak"><strong><em>Abstract</em></strong></p><p class="Abstrak"><em>One of the main problems in the palm oil industry is the grading of Fresh Fruit Bunches (FFB) in the palm oil mills. The parameter used for the process is the number of fruitlets detached from the bunch. Nowadays, the FFB grading is conducted by graders which is subjective and often inconsistent due to the limitation of human vision and ability to process information on the number of fruitlets detached per FFB in a very limited time. Therefore, this study developed a grading system to assess and estimate the FFB maturity based on spectroscopy and image contrast value. From the literature review, visible light and NIR spectrum in 680 and 780 nm can be used as light sources to detect the maturity level of FFB. DSLR camera is used to acquire the FFB image. Using this scheme, two FFB images in 680 and 750 nm are obtained. The next process is to calculate the image contrast. In this research, there are 24 FFB that are used as training data that consists of 10 ripe and 14 unripe. A total of 77 FFB are used as test data that consists of 38 ripe and 39 unripe. Support Vector Machine (SVM) is used in this research to classify the maturity level of FFB. The accuracy of the training dataset is 66.67%. Meanwhile, the accuracy of the test data is 57.14%. Future works will focus on enhancing accuracy of the system through increasing the number of training and testing data using machine learning.</em></p>


Author(s):  
Katherine V. Whittington

Abstract The electronics supply chain is being increasingly infiltrated by non-authentic, counterfeit electronic parts, whose use poses a great risk to the integrity and quality of critical hardware. There is a wide range of counterfeit parts such as leads and body molds. The failure analyst has many tools that can be used to investigate counterfeit parts. The key is to follow an investigative path that makes sense for each scenario. External visual inspection is called for whenever the source of supply is questionable. Other methods include use of solvents, 3D measurement, X-ray fluorescence, C-mode scanning acoustic microscopy, thermal cycle testing, burn-in technique, and electrical testing. Awareness, vigilance, and effective investigations are the best defense against the threat of counterfeit parts.


2020 ◽  
Vol 27 ◽  
Author(s):  
Zaheer Ullah Khan ◽  
Dechang Pi

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.


Author(s):  
Sayoni Das ◽  
Harry M Scholes ◽  
Neeladri Sen ◽  
Christine Orengo

Abstract Motivation Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein–protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). Results FunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. Availabilityand implementation https://github.com/UCL/cath-funsite-predictor. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document