scholarly journals Efficient Multiphase Test Set Embedding for Scan-based Testing

Author(s):  
E. Kalligeros ◽  
X. Kavousianos ◽  
D. Nikolos
Keyword(s):  
1990 ◽  
Vol 29 (03) ◽  
pp. 167-181 ◽  
Author(s):  
G. Hripcsak

AbstractA connectionist model for decision support was constructed out of several back-propagation modules. Manifestations serve as input to the model; they may be real-valued, and the confidence in their measurement may be specified. The model produces as its output the posterior probability of disease. The model was trained on 1,000 cases taken from a simulated underlying population with three conditionally independent manifestations. The first manifestation had a linear relationship between value and posterior probability of disease, the second had a stepped relationship, and the third was normally distributed. An independent test set of 30,000 cases showed that the model was better able to estimate the posterior probability of disease (the standard deviation of residuals was 0.046, with a 95% confidence interval of 0.046-0.047) than a model constructed using logistic regression (with a standard deviation of residuals of 0.062, with a 95% confidence interval of 0.062-0.063). The model fitted the normal and stepped manifestations better than the linear one. It accommodated intermediate levels of confidence well.


Author(s):  
William Finnigan ◽  
Lorna J. Hepworth ◽  
Nicholas J. Turner ◽  
Sabine Flitsch

As the enzyme toolbox for biocatalysis has expanded, so has the potential for the construction of powerful enzymatic cascades for efficient and selective synthesis of target molecules. Additionally, recent advances in computer-aided synthesis planning (CASP) are revolutionizing synthesis design in both synthetic biology and organic chemistry. However, the potential for biocatalysis is not well captured by tools currently available in either field. Here we present RetroBioCat, an intuitive and accessible tool for computer-aided design of biocatalytic cascades, freely available at retrobiocat.com. Our approach uses a set of expertly encoded reaction rules encompassing the enzyme toolbox for biocatalysis, and a system for identifying literature precedent for enzymes with the correct substrate specificity where this is available. Applying these rules for automated biocatalytic retrosynthesis, we show our tool to be capable of identifying promising biocatalytic pathways to target molecules, validated using a test-set of recent cascades described in the literature.


2021 ◽  
Vol 12 (2) ◽  
Author(s):  
Mohammad Haekal ◽  
Henki Bayu Seta ◽  
Mayanda Mega Santoni

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.


2020 ◽  
Vol 16 (8) ◽  
pp. 1088-1105
Author(s):  
Nafiseh Vahedi ◽  
Majid Mohammadhosseini ◽  
Mehdi Nekoei

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.


Author(s):  
Sheng Zhang ◽  
Qi Luo ◽  
Yukun Feng ◽  
Ke Ding ◽  
Daniela Gifu ◽  
...  

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.


2009 ◽  
Vol 7 (4) ◽  
pp. 846-856 ◽  
Author(s):  
Andrey Toropov ◽  
Alla Toropova ◽  
Emilio Benfenati

AbstractUsually, QSPR is not used to model organometallic compounds. We have modeled the octanol/water partition coefficient for organometallic compounds of Na, K, Ca, Cu, Fe, Zn, Ni, As, and Hg by optimal descriptors calculated with simplified molecular input line entry system (SMILES) notations. The best model is characterized by the following statistics: n=54, r2=0.9807, s=0.677, F=2636 (training set); n=26, r2=0.9693, s=0.969, F=759 (test set). Empirical criteria for the definition of the applicability domain for these models are discussed.


2020 ◽  
pp. 1-8
Author(s):  
Amira Rachah ◽  
Olav Reksen ◽  
Nils Kristian Afseth ◽  
Valeria Tafintseva ◽  
Sabine Ferneborg ◽  
...  

Abstract The objective of the study was to evaluate the potential of Fourier transform infrared spectroscopy (FTIR) analysis of milk samples to predict body energy status and related traits (energy balance (EB), dry matter intake (DMI) and efficient energy intake (EEI)) in lactating dairy cows. The data included 2371 milk samples from 63 Norwegian Red dairy cows collected during the first 105 days in milk (DIM). To predict the body energy status traits, calibration models were developed using Partial Least Squares Regression (PLSR). Calibration models were established using split-sample (leave-one cow-out) cross-validation approach and validated using an external test set. The PLSR method was implemented using just the FTIR spectra or using the FTIR together with milk yield (MY) or concentrate intake (CONCTR) as predictors of traits. Analyses were conducted for the entire first 105 DIM and separately for the two lactation periods: 5 ≤ DIM ≤ 55 and 55 < DIM ≤ 105. To test the models, an external validation using an independent test set was performed. Predictions depending on the parity (1st, 2nd and 3rd-to 6th parities) in early lactation were also investigated. Accuracy of prediction (r) for both cross-validation and external test set was defined as the correlation between the predicted and observed values for body energy status traits. Analyzing FTIR in combination with MY by PLSR, resulted in relatively high r-values to estimate EB (r = 0.63), DMI (r = 0.83), EEI (r = 0.84) using an external validation. Only moderate correlations between FTIR spectra and traits like EB, EEI and dry matter intake (DMI) have so far been published. Our hypothesis was that improvements in the FTIR predictions of EB, EEI and DMI can be obtained by (1) stratification into different stages of lactations and different parities, or (2) by adding additional information on milking and feeding traits. Stratification of the lactation stages improved predictions compared with the analyses including all data 5 ≤ DIM ≤105. The accuracy was improved if additional data (MY or CONCTR) were included in the prediction model. Furthermore, stratification into parity groups, improved the predictions of body energy status. Our results show that FTIR spectral data combined with MY or CONCTR can be used to obtain improved estimation of body energy status compared to only using the FTIR spectra in Norwegian Red dairy cattle. The best prediction results were achieved using FTIR spectra together with MY for early lactation. The results obtained in the study suggest that the modeling approach used in this paper can be considered as a viable method for predicting an individual cow's energy status.


Cancers ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 913
Author(s):  
Johannes Fahrmann ◽  
Ehsan Irajizad ◽  
Makoto Kobayashi ◽  
Jody Vykoukal ◽  
Jennifer Dennison ◽  
...  

MYC is an oncogenic driver in the pathogenesis of ovarian cancer. We previously demonstrated that MYC regulates polyamine metabolism in triple-negative breast cancer (TNBC) and that a plasma polyamine signature is associated with TNBC development and progression. We hypothesized that a similar plasma polyamine signature may associate with ovarian cancer (OvCa) development. Using mass spectrometry, four polyamines were quantified in plasma from 116 OvCa cases and 143 controls (71 healthy controls + 72 subjects with benign pelvic masses) (Test Set). Findings were validated in an independent plasma set from 61 early-stage OvCa cases and 71 healthy controls (Validation Set). Complementarity of polyamines with CA125 was also evaluated. Receiver operating characteristic area under the curve (AUC) of individual polyamines for distinguishing cases from healthy controls ranged from 0.74–0.88. A polyamine signature consisting of diacetylspermine + N-(3-acetamidopropyl)pyrrolidin-2-one in combination with CA125 developed in the Test Set yielded improvement in sensitivity at >99% specificity relative to CA125 alone (73.7% vs 62.2%; McNemar exact test 2-sided P: 0.019) in the validation set and captured 30.4% of cases that were missed with CA125 alone. Our findings reveal a MYC-driven plasma polyamine signature associated with OvCa that complemented CA125 in detecting early-stage ovarian cancer.


Sign in / Sign up

Export Citation Format

Share Document