Erratum: A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR

2013 ◽  
Vol 32 (9-10) ◽  
pp. 866-866 ◽  
Author(s):  
Martin Gütlein ◽  
Christoph Helma ◽  
Andreas Karwath ◽  
Stefan Kramer
2013 ◽  
Vol 32 (5-6) ◽  
pp. 516-528 ◽  
Author(s):  
Martin Gütlein ◽  
Christoph Helma ◽  
Andreas Karwath ◽  
Stefan Kramer

2021 ◽  
Author(s):  
Zhilong Yi ◽  
Siqi Hu ◽  
Xiaofeng Lin ◽  
Qiong Zou ◽  
MinHong Zou ◽  
...  

Abstract Purpose 68Ga-PSMA PET/CT has high specificity and sensitivity for the detection of both intraprostatic tumor focal lesions and metastasis. However, approximately 10% of primary prostate cancer are invisible on PSMA-PET (exhibit no or minimal uptake). In this work, we investigated whether machine learning-based radiomics models derived from PSMA-PET images could predict invisible intraprostatic lesions on 68Ga-PSMA-11 PET in patients with primary prostate cancer.Methods In this retrospective study, patients with or without prostate cancer who underwent 68Ga-PSMA PET/CT and presented negative on PSMA-PET image at either of two different institutions were included: institution 1 (between 2017 to 2020) for the training set and institution 2 (between 2019 to 2020) for the external test set. Three random forest (RF) models were built using selected features extract from standard PET images, delayed PET images, and both standard and delayed PET images. Then, subsequent 10-fold cross-validation was performed. In the test phase, the three RF models and PSA density (PSAD, cut-off value: 0.15ng/ml/ml) were tested with the external test set. The area under the receiver operating characteristic curve (AUC) was calculated for the models and PSAD. The AUCs of the radiomics model and PSAD were compared.Results A total of 64 patients (39 with prostate cancer and 25 with benign prostate disease) were in the training set, and 36 (21 with prostate cancer and 15 with benign prostate disease) were in the test set. The average AUCs of the three RF models from 10-fold cross-validation were 0.87 (95% CI: 0.72, 1.00), 0.86 (95% CI: 0.63, 1.00) and 0.91 (95% CI: 0.69, 1.00), respectively. In the test set, the AUCs of the three trained RF models and PSAD were 0.903 (95% CI: 0.830, 0.975), 0.856 (95% CI: 0.748, 0.964), 0.925 (95% CI:0.838, 1.00), and 0.662 (95% CI: 0.510, 0.813). The AUCs of the three radiomics models were higher than that of PSAD (0.903, 0.856 and 0.925 vs 0.662, respectively; P = .007, P = .045 and P = .005, respectively).Conclusion Random forest models developed by 68Ga-PSMA-11 PET-based radiomics features were proven useful for accurate prediction of invisible intraprostatic lesion on 68Ga-PSMA-11 PET in patients with primary prostate cancer and showed better diagnostic performance compared with PSAD.


2021 ◽  
Vol 12 (2) ◽  
Author(s):  
Mohammad Haekal ◽  
Henki Bayu Seta ◽  
Mayanda Mega Santoni

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.


2021 ◽  
pp. 095679762097751
Author(s):  
Li Zhao ◽  
Jiaxin Zheng ◽  
Haiying Mao ◽  
Xinyi Yu ◽  
Jiacheng Ye ◽  
...  

Morality-based interventions designed to promote academic integrity are being used by educational institutions around the world. Although many such approaches have a strong theoretical foundation and are supported by laboratory-based evidence, they often have not been subjected to rigorous empirical evaluation in real-world contexts. In a naturalistic field study ( N = 296), we evaluated a recent research-inspired classroom innovation in which students are told, just prior to taking an unproctored exam, that they are trusted to act with integrity. Four university classes were assigned to a proctored exam or one of three types of unproctored exam. Students who took unproctored exams cheated significantly more, which suggests that it may be premature to implement this approach in college classrooms. These findings point to the importance of conducting ecologically valid and well-controlled field studies that translate psychological theory into practice when introducing large-scale educational reforms.


2020 ◽  
pp. 1-8
Author(s):  
Amira Rachah ◽  
Olav Reksen ◽  
Nils Kristian Afseth ◽  
Valeria Tafintseva ◽  
Sabine Ferneborg ◽  
...  

Abstract The objective of the study was to evaluate the potential of Fourier transform infrared spectroscopy (FTIR) analysis of milk samples to predict body energy status and related traits (energy balance (EB), dry matter intake (DMI) and efficient energy intake (EEI)) in lactating dairy cows. The data included 2371 milk samples from 63 Norwegian Red dairy cows collected during the first 105 days in milk (DIM). To predict the body energy status traits, calibration models were developed using Partial Least Squares Regression (PLSR). Calibration models were established using split-sample (leave-one cow-out) cross-validation approach and validated using an external test set. The PLSR method was implemented using just the FTIR spectra or using the FTIR together with milk yield (MY) or concentrate intake (CONCTR) as predictors of traits. Analyses were conducted for the entire first 105 DIM and separately for the two lactation periods: 5 ≤ DIM ≤ 55 and 55 < DIM ≤ 105. To test the models, an external validation using an independent test set was performed. Predictions depending on the parity (1st, 2nd and 3rd-to 6th parities) in early lactation were also investigated. Accuracy of prediction (r) for both cross-validation and external test set was defined as the correlation between the predicted and observed values for body energy status traits. Analyzing FTIR in combination with MY by PLSR, resulted in relatively high r-values to estimate EB (r = 0.63), DMI (r = 0.83), EEI (r = 0.84) using an external validation. Only moderate correlations between FTIR spectra and traits like EB, EEI and dry matter intake (DMI) have so far been published. Our hypothesis was that improvements in the FTIR predictions of EB, EEI and DMI can be obtained by (1) stratification into different stages of lactations and different parities, or (2) by adding additional information on milking and feeding traits. Stratification of the lactation stages improved predictions compared with the analyses including all data 5 ≤ DIM ≤105. The accuracy was improved if additional data (MY or CONCTR) were included in the prediction model. Furthermore, stratification into parity groups, improved the predictions of body energy status. Our results show that FTIR spectral data combined with MY or CONCTR can be used to obtain improved estimation of body energy status compared to only using the FTIR spectra in Norwegian Red dairy cattle. The best prediction results were achieved using FTIR spectra together with MY for early lactation. The results obtained in the study suggest that the modeling approach used in this paper can be considered as a viable method for predicting an individual cow's energy status.


2021 ◽  
Vol 13 (11) ◽  
pp. 2220
Author(s):  
Yanbing Bai ◽  
Wenqi Wu ◽  
Zhengxin Yang ◽  
Jinze Yu ◽  
Bo Zhao ◽  
...  

Identifying permanent water and temporary water in flood disasters efficiently has mainly relied on change detection method from multi-temporal remote sensing imageries, but estimating the water type in flood disaster events from only post-flood remote sensing imageries still remains challenging. Research progress in recent years has demonstrated the excellent potential of multi-source data fusion and deep learning algorithms in improving flood detection, while this field has only been studied initially due to the lack of large-scale labelled remote sensing images of flood events. Here, we present new deep learning algorithms and a multi-source data fusion driven flood inundation mapping approach by leveraging a large-scale publicly available Sen1Flood11 dataset consisting of roughly 4831 labelled Sentinel-1 SAR and Sentinel-2 optical imagery gathered from flood events worldwide in recent years. Specifically, we proposed an automatic segmentation method for surface water, permanent water, and temporary water identification, and all tasks share the same convolutional neural network architecture. We utilize focal loss to deal with the class (water/non-water) imbalance problem. Thorough ablation experiments and analysis confirmed the effectiveness of various proposed designs. In comparison experiments, the method proposed in this paper is superior to other classical models. Our model achieves a mean Intersection over Union (mIoU) of 52.99%, Intersection over Union (IoU) of 52.30%, and Overall Accuracy (OA) of 92.81% on the Sen1Flood11 test set. On the Sen1Flood11 Bolivia test set, our model also achieves very high mIoU (47.88%), IoU (76.74%), and OA (95.59%) and shows good generalization ability.


2002 ◽  
Vol 10 (3) ◽  
pp. 203-214 ◽  
Author(s):  
N. Gierlinger ◽  
M. Schwanninger ◽  
B. Hinterstoisser ◽  
R. Wimmer

The feasibility of Fourier transform near infrared (FT-NIR) spectroscopy to rapidly determine extractive and phenolic content in heartwood of larch trees ( Larix decidua MILL., L. leptolepis (LAMB.) CARR. and the hybrid L. x eurolepis) was investigated. FT-NIR spectra were collected from wood powder and solid wood using a fibre-optic probe. Partial Least Squares (PLS) regression analyses were carried out describing relationships between the data sets of wet laboratory chemical data and the FT-NIR spectra. Besides cross and test set validation the established models were subjected to a further evaluation step by means of additional wood samples with unknown extractive content. Extractive and phenol contents of these additional samples were predicted and outliers detected through Mahalanobis distance calculations. Models based on the whole spectral range and without data pre-processing performed well in cross-validation and test set validation, but failed in the evaluation test, which is based on spectral outlier detection. But selection of data pre-processing methods and manual as well as automatic restriction of wavenumber ranges considerably improved the model predictability. High coefficients of determination ( R2) and low root mean square errors of cross-validation ( RMSECV) were obtained for hot water extractives ( R2 = 0.96, RMSECV = 0.86%, range = 4.9–20.4%), acetone extractives ( R2 = 0.86, RMSECV = 0.32%, range = 0.8–3.6%) and phenolic substances ( R2 = 0.98, RMSECV = 0.21%, range = 0.7–4.9%) from wood powder. The models derived from wood powder spectra were more precise than those obtained from solid wood strips. Overall, NIR spectroscopy has proven to be an easy to facilitate, reliable, accurate and fast method for non-destructive wood extractive determination.


2021 ◽  
Vol 09 (06) ◽  
pp. E955-E964
Author(s):  
Ganggang Mu ◽  
Yijie Zhu ◽  
Zhanyue Niu ◽  
Shigang Ding ◽  
Honggang Yu ◽  
...  

Abstract Background and study aims Endoscopy plays a crucial role in diagnosis of gastritis. Endoscopists have low accuracy in diagnosing atrophic gastritis with white-light endoscopy (WLE). High-risk factors (such as atrophic gastritis [AG]) for carcinogenesis demand early detection. Deep learning (DL)-based gastritis classification with WLE rarely has been reported. We built a system for improving the accuracy of diagnosis of AG with WLE to assist with this common gastritis diagnosis and help lessen endoscopist fatigue. Methods We collected a total of 8141 endoscopic images of common gastritis, other gastritis, and non-gastritis in 4587 cases and built a DL -based system constructed with UNet + + and Resnet-50. A system was developed to sort common gastritis images layer by layer: The first layer included non-gastritis/common gastritis/other gastritis, the second layer contained AG/non-atrophic gastritis, and the third layer included atrophy/intestinal metaplasia and erosion/hemorrhage. The convolutional neural networks were tested with three separate test sets. Results Rates of accuracy for classifying non-atrophic gastritis/AG, atrophy/intestinal metaplasia, and erosion/hemorrhage were 88.78 %, 87.40 %, and 93.67 % in internal test set, 91.23 %, 85.81 %, and 92.70 % in the external test set ,and 95.00 %, 92.86 %, and 94.74 % in the video set, respectively. The hit ratio with the segmentation model was 99.29 %. The accuracy for detection of non-gastritis/common gastritis/other gastritis was 93.6 %. Conclusions The system had decent specificity and accuracy in classification of gastritis lesions. DL has great potential in WLE gastritis classification for assisting with achieving accurate diagnoses after endoscopic procedures.


2015 ◽  
Vol 26 (7) ◽  
pp. 1887-1899 ◽  
Author(s):  
Zhen Ling ◽  
Junzhou Luo ◽  
Wei Yu ◽  
Ming Yang ◽  
Xinwen Fu

2014 ◽  
Vol 12 (3) ◽  
pp. 365-376 ◽  
Author(s):  
Teodora Harsa ◽  
Alexandra Harsa ◽  
Beata Szefler

AbstractA novel QSAR approach based on correlation weighting and alignment over a hypermolecule that mimics the investigated correlational space was performed on a set of 40 caffeines downloaded from the PubChem database. The best models describing log P and LD50 values of this set of caffeine derivatives were validated against the external test set and in a new predictive model by using clusters of similarity.


Sign in / Sign up

Export Citation Format

Share Document