A 6DoF Pose Estimation Dataset and Network for Multiple Parametric Shapes in Stacked Scenarios

Most industrial parts are instantiated from different parametric templates. The 6DoF (6D) pose estimation tasks are challenging, since some part objects from a known template may be unseen before. This paper releases a new and well-annotated 6D pose estimation dataset for multiple parametric templates in stacked scenarios donated as Multi-Parametric Dataset, where a training set (50K scenes) and a test set (2K scenes) are obtained by automatical labeling techniques. In particular, the test set is further divided into a TEST-L dataset for learning evaluation and a TEST-G dataset for generalization evaluation. Since the part objects from the same template are regarded as a class in the Multi-Parametric Dataset and the number of part objects is infinite, we propose a new 6D pose estimation network as our baseline method, Multi-templates Parametric Pose Network (MPP-Net), aiming to have sufficient generalization ability for parametric part objects in stacked scenarios. To our best knowledge, our dataset and method are the first to jointly achieve 6D pose estimation and parameter values prediction for multiple parametric templates. Many experiments are conducted on the Multi-Parametric Dataset. The mIoU and Overall Accuracy of foreground segmentation and template segmentation on the two test datasets exceed 99.0%. Besides, MPP-Net achieves 92.9% and 90.8% on mAP under the threshold of 0.5cm for translation prediction, achieves 41.9% and 36.8% under the threshold of 5∘ for rotation prediction, and achieves 51.0% and 6.0% under the threshold of 5% for parameter values prediction, on the two test set, respectively. The results have shown that our dataset has exploratory value for 6D pose estimation and parameter values prediction tasks.

Download Full-text

PREDIKSI KUALITAS AIR SUNGAI CILIWUNG DENGAN MENGGUNAKAN ALGORITMA POHON KEPUTUSAN

Jurnal Air Indonesia ◽

10.29122/jai.v12i2.4364 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

Mohammad Haekal ◽

Henki Bayu Seta ◽

Mayanda Mega Santoni

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Online Monitoring ◽

Training Set ◽

Microsoft Excel ◽

Test Set

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.

Download Full-text

QSPR modelling of the octanol/water partition coefficient of organometallic substances by optimal SMILES-based descriptors

Open Chemistry ◽

10.2478/s11532-009-0095-y ◽

2009 ◽

Vol 7 (4) ◽

pp. 846-856 ◽

Cited By ~ 6

Author(s):

Andrey Toropov ◽

Alla Toropova ◽

Emilio Benfenati

Keyword(s):

Partition Coefficient ◽

Organometallic Compounds ◽

Applicability Domain ◽

Training Set ◽

Input Line ◽

Test Set ◽

Water Partition Coefficient ◽

Definition Of

AbstractUsually, QSPR is not used to model organometallic compounds. We have modeled the octanol/water partition coefficient for organometallic compounds of Na, K, Ca, Cu, Fe, Zn, Ni, As, and Hg by optimal descriptors calculated with simplified molecular input line entry system (SMILES) notations. The best model is characterized by the following statistics: n=54, r2=0.9807, s=0.677, F=2636 (training set); n=26, r2=0.9693, s=0.969, F=759 (test set). Empirical criteria for the definition of the applicability domain for these models are discussed.

Download Full-text

Feature-Weighted Sampling for Proper Evaluation of Classification Models

Applied Sciences ◽

10.3390/app11052039 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2039

Author(s):

Hyunseok Shin ◽

Sejong Oh

Keyword(s):

Random Sampling ◽

Sampling Method ◽

Classification Model ◽

Training Set ◽

Test Set ◽

Feature Importance ◽

Proper Training ◽

Machine Learning Applications ◽

Test Sets ◽

The Given

In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution–difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.

Download Full-text

Weakly supervised deep learning for determining the prognostic value of 18F-FDG PET/CT in extranodal natural killer/T cell lymphoma, nasal type

European Journal of Nuclear Medicine and Molecular Imaging ◽

10.1007/s00259-021-05232-3 ◽

2021 ◽

Author(s):

Rui Guo ◽

Xiaobin Hu ◽

Haoming Song ◽

Pengpeng Xu ◽

Haoping Xu ◽

...

Keyword(s):

Deep Learning ◽

Fdg Pet ◽

Cell Lymphoma ◽

Training Set ◽

Test Set ◽

Natural Killer T Cell ◽

Pet Ct ◽

Weakly Supervised ◽

Fdg Pet Ct ◽

Killer T Cell

Abstract Purpose To develop a weakly supervised deep learning (WSDL) method that could utilize incomplete/missing survival data to predict the prognosis of extranodal natural killer/T cell lymphoma, nasal type (ENKTL) based on pretreatment 18F-FDG PET/CT results. Methods One hundred and sixty-seven patients with ENKTL who underwent pretreatment 18F-FDG PET/CT were retrospectively collected. Eighty-four patients were followed up for at least 2 years (training set = 64, test set = 20). A WSDL method was developed to enable the integration of the remaining 83 patients with incomplete/missing follow-up information in the training set. To test generalization, these data were derived from three types of scanners. Prediction similarity index (PSI) was derived from deep learning features of images. Its discriminative ability was calculated and compared with that of a conventional deep learning (CDL) method. Univariate and multivariate analyses helped explore the significance of PSI and clinical features. Results PSI achieved area under the curve scores of 0.9858 and 0.9946 (training set) and 0.8750 and 0.7344 (test set) in the prediction of progression-free survival (PFS) with the WSDL and CDL methods, respectively. PSI threshold of 1.0 could significantly differentiate the prognosis. In the test set, WSDL and CDL achieved prediction sensitivity, specificity, and accuracy of 87.50% and 62.50%, 83.33% and 83.33%, and 85.00% and 75.00%, respectively. Multivariate analysis confirmed PSI to be an independent significant predictor of PFS in both the methods. Conclusion The WSDL-based framework was more effective for extracting 18F-FDG PET/CT features and predicting the prognosis of ENKTL than the CDL method.

Download Full-text

Prediction of the Toxicity of Binary Mixtures by QSAR Approach Using the Hypothetical Descriptors

International Journal of Molecular Sciences ◽

10.3390/ijms19113423 ◽

2018 ◽

Vol 19 (11) ◽

pp. 3423 ◽

Cited By ~ 12

Author(s):

Ting Wang ◽

Lili Tang ◽

Feng Luan ◽

M. Natália D. S. Cordeiro

Keyword(s):

Correlation Coefficient ◽

Binary Mixtures ◽

Quantitative Structure Activity Relationship ◽

Training Set ◽

Statistical Parameters ◽

Test Set ◽

Qsar Models ◽

Forward Stepwise ◽

Leave One Out ◽

External Test

Organic compounds are often exposed to the environment, and have an adverse effect on the environment and human health in the form of mixtures, rather than as single chemicals. In this paper, we try to establish reliable and developed classical quantitative structure–activity relationship (QSAR) models to evaluate the toxicity of 99 binary mixtures. The derived QSAR models were built by forward stepwise multiple linear regression (MLR) and nonlinear radial basis function neural networks (RBFNNs) using the hypothetical descriptors, respectively. The statistical parameters of the MLR model provided were N (number of compounds in training set) = 79, R2 (the correlation coefficient between the predicted and observed activities)= 0.869, LOOq2 (leave-one-out correlation coefficient) = 0.864, F (Fisher’s test) = 165.494, and RMS (root mean square) = 0.599 for the training set, and Next (number of compounds in external test set) = 20, R2 = 0.853, qext2 (leave-one-out correlation coefficient for test set)= 0.825, F = 30.861, and RMS = 0.691 for the external test set. The RBFNN model gave the statistical results, namely N = 79, R2 = 0.925, LOOq2 = 0.924, F = 950.686, RMS = 0.447 for the training set, and Next = 20, R2 = 0.896, qext2 = 0.890, F = 155.424, RMS = 0.547 for the external test set. Both of the MLR and RBFNN models were evaluated by some statistical parameters and methods. The results confirm that the built models are acceptable, and can be used to predict the toxicity of the binary mixtures.

Download Full-text

Identification of Multi-omics Biomarkers and Construction of the Novel Prognostic Model for Hepatocellular Carcinoma

10.21203/rs.3.rs-452644/v1 ◽

2021 ◽

Author(s):

Xiaokai Yan ◽

Chiying Xiao ◽

Kunyan Yue ◽

Min Chen ◽

Hang Zhou

Keyword(s):

Hepatocellular Carcinoma ◽

Survival Analysis ◽

Prognostic Model ◽

Prognostic Models ◽

Prognostic Indicators ◽

Omics Data ◽

Training Set ◽

Test Set ◽

Model Based ◽

Cox Analysis

Abstract Background: Change in the genome plays a crucial role in cancerogenesis and many biomarkers can be used as effective prognostic indicators in diverse tumors. Currently, although many studies have constructed some predictive models for hepatocellular carcinoma (HCC) based on molecular signatures, the performance of which is unsatisfactory. To fill this shortcoming, we hope to construct a novel and accurate prognostic model with multi-omics data to guide prognostic assessments of HCC. Methods: The TCGA training set was used to identify crucial biomarkers and construct single-omic prognostic models through difference analysis, univariate Cox, and LASSO/stepwise Cox analysis. Then the performances of single-omic models were evaluated and validated through survival analysis, Harrell’s concordance index (C-index), and receiver operating characteristic (ROC) curve, in the TCGA test set and external cohorts. Besides, a comprehensive model based on multi-omics data was constructed via multiple Cox analysis, and the performance of which was evaluated in the TCGA training set and TCGA test set. Results: We identified 16 key mRNAs, 20 key lncRNAs, 5 key miRNAs, 5 key CNV genes, and 7 key SNPs which were significantly associated with the prognosis of HCC, and constructed 5 single-omic models which showed relatively good performance in prognostic prediction with c-index ranged from 0.63 to 0.75 in the TCGA training set and test set. Besides, we validated the mRNA model and the SNP model in two independent external datasets respectively, and good discriminating abilities were observed through survival analysis (P < 0.05). Moreover, the multi-omics model based on mRNA, lncRNA, miRNA, CNV, and SNP information presented a quite strong predictive ability with c-index over 0.80 and all AUC values at 1,3,5-years more than 0.84.Conclusion: In this study, we identified many biomarkers that may help study underlying carcinogenesis mechanisms in HCC, and constructed five single-omic models and an integrated multi-omics model that may provide effective and reliable guides for prognosis assessment and treatment decision-making.

Download Full-text

Radiomics-based model for predicting early recurrence of intrahepatic mass-forming cholangiocarcinoma after curative tumor resection

Scientific Reports ◽

10.1038/s41598-021-97796-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yong Zhu ◽

Yingfan Mao ◽

Jun Chen ◽

Yudong Qiu ◽

Yue Guan ◽

...

Keyword(s):

Regression Analysis ◽

Multivariate Logistic Regression Analysis ◽

Early Recurrence ◽

Prediction Performance ◽

Multivariate Logistic Regression ◽

Training Set ◽

Combined Model ◽

Test Set ◽

Pathological Model ◽

Radiomics Signature

AbstractTo investigate the ability of CT-based radiomics signature for pre-and postoperatively predicting the early recurrence of intrahepatic mass-forming cholangiocarcinoma (IMCC) and develop radiomics-based prediction models. Institutional review board approved this study. Clinicopathological characteristics, contrast-enhanced CT images, and radiomics features of 125 IMCC patients (35 with early recurrence and 90 with non-early recurrence) were retrospectively reviewed. In the training set of 92 patients, preoperative model, pathological model, and combined model were developed by multivariate logistic regression analysis to predict the early recurrence (≤ 6 months) of IMCC, and the prediction performance of different models were compared using the Delong test. The developed models were validated by assessing their prediction performance in test set of 33 patients. Multivariate logistic regression analysis identified solitary, differentiation, energy- arterial phase (AP), inertia-AP, and percentile50th-portal venous phase (PV) to construct combined model for predicting early recurrence of IMCC [the area under the curve (AUC) = 0.917; 95% CI 0.840–0.965]. While the AUC of pathological model and preoperative model were 0.741 (95% CI 0.637–0.828) and 0.844 (95% CI 0.751–0.912), respectively. The AUC of the combined model was significantly higher than that of the preoperative model (p = 0.049) or pathological model (p = 0.002) in training set. In test set, the combined model also showed higher prediction performance. CT-based radiomics signature is a powerful predictor for early recurrence of IMCC. Preoperative model (constructed with homogeneity-AP and standard deviation-AP) and combined model (constructed with solitary, differentiation, energy-AP, inertia-AP, and percentile50th-PV) can improve the accuracy for pre-and postoperatively predicting the early recurrence of IMCC.

Download Full-text

Sistem Informasi Posyandu Ibu Hamil dengan Penerapan Klasifikasi Resiko Kehamilan Menggunakan Metode Naïve Bayes

BERKALA SAINSTEK ◽

10.19184/bst.v6i1.7554 ◽

2018 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Qomariyatul Hasanah ◽

Anang Andrianto ◽

Muhammad Arief Hidayat

Keyword(s):

Cross Validation ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Set ◽

Test Set ◽

Fold Cross Validation

Sistem informasi posyandu ibu hamil dapat mengelola data kesehatan ibu hamil yang berkaitan dengan faktor resiko kehamilan. Faktor resiko kehamilan berdasarkan ketentuan Kartu Skor Poedji Rochyati (KSPR) digunakan bidan untuk menentukan resiko kehamilan dengan memberikan skor pada masing-masing parameter. KSPR memiliki kelemahan tidak dapat memberikan skor pada parameter yang belum pasti sehingga jika belum diketahui dengan pasti maka dianggap tidak terjadi. Konsep membaca pola data yang diadopsi dari teknik datamining menggunakan metode klasifikasi naive bayes dapat menjadi alternatif untuk kelemahan KSPR tersebut yaitu dengan mengklasifikasikan resiko kehamilan. Metode naïve bayes menghitung probabilitas parameter tertentu berdasarkan data pada periode sebelumnya yang telah ditentukan sebagai data training, berdasarkan hasil perhitungan tersebut dapat diketahui resiko kehamilan secara tepat sesuai parameter yang telah diketahui. Metode naïve bayes dipilih karena memiliki tingkat akurasi yang cukup tinggi daripada metode klasifikasi lainnya. Sistem informasi ini dibangun berbasis website agar dapat diakses secara mudah oleh beberapa posyandu yang berbeda tempat. Sistem dibangun mengadopsi dari model Waterfall. Sistem informasi posyandu ibu hamil dirancang dan dibangun dengan tiga (3) hak akses yaitu admin, bidan dan kader dengan masing-masing fitur yang dapat memudahkan penggunanya. Hasil dari penelitian ini adalah sistem informasi posyandu ibu hamil dengan penerapan klasifikasi resiko kehamilan menggunakan metode naïve bayes, dengan tingkat akurasi ketika menggunakan 17 atribut didapatkan 53.913%, 19 atribut didapatkan 54.348%, , 21 atribut didapatkan 54.783%, dan 22 atribut didapatkan 56.957%. Tingkat akurasi klasifikasi diperoleh menggunakan metode pengujian menggunakan Ten-Fold Cross Validation dimana training set dibagi menjadi 10 kelompok, jika kelompok 1 dijadikan test set maka kelompok 2 hingga 10 menjadi training set. Kata Kunci: Posyandu, Resiko Kehamilan, Waterfall, Datamining, Klasifikasi, Naïve bayes

Download Full-text

Application of Multi-Scale Fusion Attention U-Net to Segment the Thyroid Gland on CT Localization Images for Radiotherapy

10.21203/rs.3.rs-949323/v1 ◽

2021 ◽

Author(s):

Xiaobo Wen ◽

Biao Zhao ◽

Meifang Yuan ◽

Jinzhi Li ◽

Mengzhen Sun ◽

...

Keyword(s):

Thyroid Gland ◽

Clinical Work ◽

Similarity Coefficient ◽

Dice Similarity Coefficient ◽

Training Set ◽

Data Set ◽

Test Set ◽

Noise Interference ◽

Multi Scale ◽

Validation Set

Abstract Objectives: To explore the performance of Multi-scale Fusion Attention U-net (MSFA-U-net) in thyroid gland segmentation on CT localization images for radiotherapy. Methods: CT localization images for radiotherapy of 80 patients with breast cancer or head and neck tumors were selected; label images were manually delineated by experienced radiologists. The data set was randomly divided into the training set (n=60), the validation set (n=10), and the test set (n=10). Data expansion was performed in the training set, and the performance of the MSFA-U-net model was evaluated using the evaluation indicators Dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC), positive predictive value (PPV), sensitivity (SE), and Hausdorff distance (HD). Results: With the MSFA-U-net model, the DSC, JSC, PPV, SE, and HD indexes of the segmented thyroid gland in the test set were 0.8967±0.0935, 0.8219±0.1115, 0.9065±0.0940, 0.8979±0.1104, and 2.3922±0.5423, respectively. Compared with U-net, HR-net, and Attention U-net, MSFA-U-net showed that DSC increased by 0.052, 0.0376, and 0.0346 respectively; JSC increased by 0.0569, 0.0805, and 0.0433, respectively; SE increased by 0.0361, 0.1091, and 0.0831, respectively; and HD increased by −0.208, −0.1952, and −0.0548, respectively. The test set image results showed that the thyroid edges segmented by the MSFA-U-net model were closer to the standard thyroid delineated by the experts, in comparison with those segmented by the other three models. Moreover, the edges were smoother, over-anti-noise interference was stronger, and oversegmentation and undersegmentation were reduced. Conclusion: The MSFA-U-net model can meet basic clinical requirements and improve the efficiency of physicians' clinical work.

Download Full-text

The Importance of the Test Set Size in Quantification Assessment

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/366 ◽

2020 ◽

Cited By ~ 1

Author(s):

André Maletzke ◽

Waqar Hassan ◽

Denis dos Reis ◽

Gustavo Batista

Keyword(s):

Performance Measures ◽

Training Set ◽

Test Set ◽

Test Size ◽

Critical Variable ◽

Set Size ◽

Quantification Method ◽

Class Distribution ◽

Cherry Picking ◽

Test Sets

Quantification is a task similar to classification in the sense that it learns from a labeled training set. However, quantification is not interested in predicting the class of each observation, but rather measure the class distribution in the test set. The community has developed performance measures and experimental setups tailored to quantification tasks. Nonetheless, we argue that a critical variable, the size of the test sets, remains ignored. Such disregard has three main detrimental effects. First, it implicitly assumes that quantifiers will perform equally well for different test set sizes. Second, it increases the risk of cherry-picking by selecting a test set size for which a particular proposal performs best. Finally, it disregards the importance of designing methods that are suitable for different test set sizes. We discuss these issues with the support of one of the broadest experimental evaluations ever performed, with three main outcomes. (i) We empirically demonstrate the importance of the test set size to assess quantifiers. (ii) We show that current quantifiers generally have a mediocre performance on the smallest test sets. (iii) We propose a metalearning scheme to select the best quantifier based on the test size that can outperform the best single quantification method.

Download Full-text