Efficient Multiphase Test Set Embedding for Scan-based Testing

AbstractA connectionist model for decision support was constructed out of several back-propagation modules. Manifestations serve as input to the model; they may be real-valued, and the confidence in their measurement may be specified. The model produces as its output the posterior probability of disease. The model was trained on 1,000 cases taken from a simulated underlying population with three conditionally independent manifestations. The first manifestation had a linear relationship between value and posterior probability of disease, the second had a stepped relationship, and the third was normally distributed. An independent test set of 30,000 cases showed that the model was better able to estimate the posterior probability of disease (the standard deviation of residuals was 0.046, with a 95% confidence interval of 0.046-0.047) than a model constructed using logistic regression (with a standard deviation of residuals of 0.062, with a 95% confidence interval of 0.062-0.063). The model fitted the normal and stepped manifestations better than the linear one. It accommodated intermediate levels of confidence well.

Download Full-text

RetroBioCat: Computer-Aided Synthesis Planning for Biocatalytic Reactions and Cascades

10.26434/chemrxiv.12571235.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

William Finnigan ◽

Lorna J. Hepworth ◽

Nicholas J. Turner ◽

Sabine Flitsch

Keyword(s):

Organic Chemistry ◽

Computer Aided Design ◽

Selective Synthesis ◽

Test Set ◽

Synthesis Design ◽

Computer Aided ◽

Synthesis Planning ◽

Enzymatic Cascades ◽

Target Molecules ◽

Aided Design

As the enzyme toolbox for biocatalysis has expanded, so has the potential for the construction of powerful enzymatic cascades for efficient and selective synthesis of target molecules. Additionally, recent advances in computer-aided synthesis planning (CASP) are revolutionizing synthesis design in both synthetic biology and organic chemistry. However, the potential for biocatalysis is not well captured by tools currently available in either field. Here we present RetroBioCat, an intuitive and accessible tool for computer-aided design of biocatalytic cascades, freely available at retrobiocat.com. Our approach uses a set of expertly encoded reaction rules encompassing the enzyme toolbox for biocatalysis, and a system for identifying literature precedent for enzymes with the correct substrate specificity where this is available. Applying these rules for automated biocatalytic retrosynthesis, we show our tool to be capable of identifying promising biocatalytic pathways to target molecules, validated using a test-set of recent cascades described in the literature.

Download Full-text

PREDIKSI KUALITAS AIR SUNGAI CILIWUNG DENGAN MENGGUNAKAN ALGORITMA POHON KEPUTUSAN

Jurnal Air Indonesia ◽

10.29122/jai.v12i2.4364 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

Mohammad Haekal ◽

Henki Bayu Seta ◽

Mayanda Mega Santoni

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Online Monitoring ◽

Training Set ◽

Microsoft Excel ◽

Test Set

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.

Download Full-text

Test Set Generation for Pairwise Testing Using Genetic Algorithms

Journal of Information Processing Systems ◽

10.3745/jips.04.0019 ◽

2015 ◽

Cited By ~ 2

Keyword(s):

Genetic Algorithms ◽

Test Set ◽

Pairwise Testing

Download Full-text

Registration of a Dynamic Multimodal Target Image Test Set for the Evaluation of Image Fusion Techniques

10.21236/ada598370 ◽

2013 ◽

Cited By ~ 1

Author(s):

Alexander Toet

Keyword(s):

Image Fusion ◽

Target Image ◽

Test Set ◽

Image Test

Download Full-text

QSAR Study of PARP Inhibitors by GA-MLR, GA-SVM and GA-ANN Approaches

Current Analytical Chemistry ◽

10.2174/1573411016999200518083359 ◽

2020 ◽

Vol 16 (8) ◽

pp. 1088-1105

Author(s):

Nafiseh Vahedi ◽

Majid Mohammadhosseini ◽

Mehdi Nekoei

Keyword(s):

Present Report ◽

Principal Component ◽

Parp Inhibitors ◽

Support Vector ◽

Ann Model ◽

Statistical Parameters ◽

Qsar Study ◽

Data Set ◽

Test Set ◽

Non Linear

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.

Download Full-text

Key phrase Extraction by Improving TextRank with an Integration of Word Embedding and Syntactic Information

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200820155846 ◽

2020 ◽

Vol 13 ◽

Author(s):

Sheng Zhang ◽

Qi Luo ◽

Yukun Feng ◽

Ke Ding ◽

Daniela Gifu ◽

...

Keyword(s):

Semantic Information ◽

Performance Enhancement ◽

Word Embedding ◽

The Other ◽

Test Set ◽

Pagerank Algorithm ◽

Phrase Extraction ◽

Extraction Algorithm ◽

Syntactic Information ◽

Key Phrase Extraction

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.

Download Full-text

QSPR modelling of the octanol/water partition coefficient of organometallic substances by optimal SMILES-based descriptors

Open Chemistry ◽

10.2478/s11532-009-0095-y ◽

2009 ◽

Vol 7 (4) ◽

pp. 846-856 ◽

Cited By ~ 6

Author(s):

Andrey Toropov ◽

Alla Toropova ◽

Emilio Benfenati

Keyword(s):

Partition Coefficient ◽

Organometallic Compounds ◽

Applicability Domain ◽

Training Set ◽

Input Line ◽

Test Set ◽

Water Partition Coefficient ◽

Definition Of

AbstractUsually, QSPR is not used to model organometallic compounds. We have modeled the octanol/water partition coefficient for organometallic compounds of Na, K, Ca, Cu, Fe, Zn, Ni, As, and Hg by optimal descriptors calculated with simplified molecular input line entry system (SMILES) notations. The best model is characterized by the following statistics: n=54, r2=0.9807, s=0.677, F=2636 (training set); n=26, r2=0.9693, s=0.969, F=759 (test set). Empirical criteria for the definition of the applicability domain for these models are discussed.

Download Full-text

Fourier transform infrared spectroscopy of milk samples as a tool to estimate energy balance, energy- and dry matter intake in lactating dairy cows

Journal of Dairy Research ◽

10.1017/s0022029920001004 ◽

2020 ◽

pp. 1-8

Author(s):

Amira Rachah ◽

Olav Reksen ◽

Nils Kristian Afseth ◽

Valeria Tafintseva ◽

Sabine Ferneborg ◽

...

Keyword(s):

Dairy Cows ◽

Dry Matter ◽

External Validation ◽

Ftir Spectra ◽

Dry Matter Intake ◽

Energy Status ◽

Test Set ◽

Milk Samples ◽

External Test ◽

Body Energy

Abstract The objective of the study was to evaluate the potential of Fourier transform infrared spectroscopy (FTIR) analysis of milk samples to predict body energy status and related traits (energy balance (EB), dry matter intake (DMI) and efficient energy intake (EEI)) in lactating dairy cows. The data included 2371 milk samples from 63 Norwegian Red dairy cows collected during the first 105 days in milk (DIM). To predict the body energy status traits, calibration models were developed using Partial Least Squares Regression (PLSR). Calibration models were established using split-sample (leave-one cow-out) cross-validation approach and validated using an external test set. The PLSR method was implemented using just the FTIR spectra or using the FTIR together with milk yield (MY) or concentrate intake (CONCTR) as predictors of traits. Analyses were conducted for the entire first 105 DIM and separately for the two lactation periods: 5 ≤ DIM ≤ 55 and 55 < DIM ≤ 105. To test the models, an external validation using an independent test set was performed. Predictions depending on the parity (1st, 2nd and 3rd-to 6th parities) in early lactation were also investigated. Accuracy of prediction (r) for both cross-validation and external test set was defined as the correlation between the predicted and observed values for body energy status traits. Analyzing FTIR in combination with MY by PLSR, resulted in relatively high r-values to estimate EB (r = 0.63), DMI (r = 0.83), EEI (r = 0.84) using an external validation. Only moderate correlations between FTIR spectra and traits like EB, EEI and dry matter intake (DMI) have so far been published. Our hypothesis was that improvements in the FTIR predictions of EB, EEI and DMI can be obtained by (1) stratification into different stages of lactations and different parities, or (2) by adding additional information on milking and feeding traits. Stratification of the lactation stages improved predictions compared with the analyses including all data 5 ≤ DIM ≤105. The accuracy was improved if additional data (MY or CONCTR) were included in the prediction model. Furthermore, stratification into parity groups, improved the predictions of body energy status. Our results show that FTIR spectral data combined with MY or CONCTR can be used to obtain improved estimation of body energy status compared to only using the FTIR spectra in Norwegian Red dairy cattle. The best prediction results were achieved using FTIR spectra together with MY for early lactation. The results obtained in the study suggest that the modeling approach used in this paper can be considered as a viable method for predicting an individual cow's energy status.

Download Full-text

A MYC-Driven Plasma Polyamine Signature for Early Detection of Ovarian Cancer

Cancers ◽

10.3390/cancers13040913 ◽

2021 ◽

Vol 13 (4) ◽

pp. 913

Author(s):

Johannes Fahrmann ◽

Ehsan Irajizad ◽

Makoto Kobayashi ◽

Jody Vykoukal ◽

Jennifer Dennison ◽

...

Keyword(s):

Ovarian Cancer ◽

Early Stage ◽

Area Under The Curve ◽

Polyamine Metabolism ◽

Healthy Controls ◽

Test Set ◽

Exact Test ◽

Oncogenic Driver ◽

Characteristic Area ◽

Validation Set

MYC is an oncogenic driver in the pathogenesis of ovarian cancer. We previously demonstrated that MYC regulates polyamine metabolism in triple-negative breast cancer (TNBC) and that a plasma polyamine signature is associated with TNBC development and progression. We hypothesized that a similar plasma polyamine signature may associate with ovarian cancer (OvCa) development. Using mass spectrometry, four polyamines were quantified in plasma from 116 OvCa cases and 143 controls (71 healthy controls + 72 subjects with benign pelvic masses) (Test Set). Findings were validated in an independent plasma set from 61 early-stage OvCa cases and 71 healthy controls (Validation Set). Complementarity of polyamines with CA125 was also evaluated. Receiver operating characteristic area under the curve (AUC) of individual polyamines for distinguishing cases from healthy controls ranged from 0.74–0.88. A polyamine signature consisting of diacetylspermine + N-(3-acetamidopropyl)pyrrolidin-2-one in combination with CA125 developed in the Test Set yielded improvement in sensitivity at >99% specificity relative to CA125 alone (73.7% vs 62.2%; McNemar exact test 2-sided P: 0.019) in the validation set and captured 30.4% of cases that were missed with CA125 alone. Our findings reveal a MYC-driven plasma polyamine signature associated with OvCa that complemented CA125 in detecting early-stage ovarian cancer.

Download Full-text