Application of Multi-Scale Fusion Attention U-Net to Segment the Thyroid Gland on CT Localization Images for Radiotherapy

Abstract Objectives: To explore the performance of Multi-scale Fusion Attention U-net (MSFA-U-net) in thyroid gland segmentation on CT localization images for radiotherapy. Methods: CT localization images for radiotherapy of 80 patients with breast cancer or head and neck tumors were selected; label images were manually delineated by experienced radiologists. The data set was randomly divided into the training set (n=60), the validation set (n=10), and the test set (n=10). Data expansion was performed in the training set, and the performance of the MSFA-U-net model was evaluated using the evaluation indicators Dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC), positive predictive value (PPV), sensitivity (SE), and Hausdorff distance (HD). Results: With the MSFA-U-net model, the DSC, JSC, PPV, SE, and HD indexes of the segmented thyroid gland in the test set were 0.8967±0.0935, 0.8219±0.1115, 0.9065±0.0940, 0.8979±0.1104, and 2.3922±0.5423, respectively. Compared with U-net, HR-net, and Attention U-net, MSFA-U-net showed that DSC increased by 0.052, 0.0376, and 0.0346 respectively; JSC increased by 0.0569, 0.0805, and 0.0433, respectively; SE increased by 0.0361, 0.1091, and 0.0831, respectively; and HD increased by −0.208, −0.1952, and −0.0548, respectively. The test set image results showed that the thyroid edges segmented by the MSFA-U-net model were closer to the standard thyroid delineated by the experts, in comparison with those segmented by the other three models. Moreover, the edges were smoother, over-anti-noise interference was stronger, and oversegmentation and undersegmentation were reduced. Conclusion: The MSFA-U-net model can meet basic clinical requirements and improve the efficiency of physicians' clinical work.

Download Full-text

Diagnostic Classification of Cystoscopic Images Using Deep Convolutional Neural Networks

JCO Clinical Cancer Informatics ◽

10.1200/cci.17.00126 ◽

2018 ◽

pp. 1-8 ◽

Cited By ~ 9

Author(s):

Okyaz Eminaga ◽

Nurettin Eminaga ◽

Axel Semjonow ◽

Bernhard Breil

Keyword(s):

Deep Learning ◽

Harmonic Series ◽

Diagnostic Classification ◽

Training Set ◽

Deep Convolutional Neural Networks ◽

Data Set ◽

Test Set ◽

Filter Size ◽

Validation Set

Purpose The recognition of cystoscopic findings remains challenging for young colleagues and depends on the examiner’s skills. Computer-aided diagnosis tools using feature extraction and deep learning show promise as instruments to perform diagnostic classification. Materials and Methods Our study considered 479 patient cases that represented 44 urologic findings. Image color was linearly normalized and was equalized by applying contrast-limited adaptive histogram equalization. Because these findings can be viewed via cystoscopy from every possible angle and side, we ultimately generated images rotated in 10-degree grades and flipped them vertically or horizontally, which resulted in 18,681 images. After image preprocessing, we developed deep convolutional neural network (CNN) models (ResNet50, VGG-19, VGG-16, InceptionV3, and Xception) and evaluated these models using F1 scores. Furthermore, we proposed two CNN concepts: 90%-previous-layer filter size and harmonic-series filter size. A training set (60%), a validation set (10%), and a test set (30%) were randomly generated from the study data set. All models were trained on the training set, validated on the validation set, and evaluated on the test set. Results The Xception-based model achieved the highest F1 score (99.52%), followed by models that were based on ResNet50 (99.48%) and the harmonic-series concept (99.45%). All images with cancer lesions were correctly determined by these models. When the focus was on the images misclassified by the model with the best performance, 7.86% of images that showed bladder stones with indwelling catheter and 1.43% of images that showed bladder diverticulum were falsely classified. Conclusion The results of this study show the potential of deep learning for the diagnostic classification of cystoscopic images. Future work will focus on integration of artificial intelligence–aided cystoscopy into clinical routines and possibly expansion to other clinical endoscopy applications.

Download Full-text

An artificial intelligence model (euploid prediction algorithm) can predict embryo ploidy status based on time-lapse data

Reproductive Biology and Endocrinology ◽

10.1186/s12958-021-00864-4 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Bo Huang ◽

Wei Tan ◽

Zhou Li ◽

Lei Jin

Keyword(s):

Artificial Intelligence ◽

Time Lapse ◽

Training Data ◽

Prediction Algorithm ◽

Preliminary Evaluation ◽

Training Set ◽

Data Set ◽

Test Set ◽

Validation Set ◽

Ploidy Status

Abstract Background For the association between time-lapse technology (TLT) and embryo ploidy status, there has not yet been fully understood. TLT has the characteristics of large amount of data and non-invasiveness. If we want to accurately predict embryo ploidy status from TLT, artificial intelligence (AI) technology is a good choice. However, the current work of AI in this field needs to be strengthened. Methods A total of 469 preimplantation genetic testing (PGT) cycles and 1803 blastocysts from April 2018 to November 2019 were included in the study. All embryo images are captured during 5 or 6 days after fertilization before biopsy by time-lapse microscope system. All euploid embryos or aneuploid embryos are used as data sets. The data set is divided into training set, validation set and test set. The training set is mainly used for model training, the validation set is mainly used to adjust the hyperparameters of the model and the preliminary evaluation of the model, and the test set is used to evaluate the generalization ability of the model. For better verification, we used data other than the training data for external verification. A total of 155 PGT cycles from December 2019 to December 2020 and 523 blastocysts were included in the verification process. Results The euploid prediction algorithm (EPA) was able to predict euploid on the testing dataset with an area under curve (AUC) of 0.80. Conclusions The TLT incubator has gradually become the choice of reproductive centers. Our AI model named EPA that can predict embryo ploidy well based on TLT data. We hope that this system can serve all in vitro fertilization and embryo transfer (IVF-ET) patients in the future, allowing embryologists to have more non-invasive aids when selecting the best embryo to transfer.

Download Full-text

Genome-Wide Identification of a Novel Autophagy-Related Signature for Colorectal Cancer

Dose-Response ◽

10.1177/1559325819894179 ◽

2019 ◽

Vol 17 (4) ◽

pp. 155932581989417 ◽

Cited By ~ 6

Author(s):

Zhi Huang ◽

Jie Liu ◽

Liang Luo ◽

Pan Sheng ◽

Biao Wang ◽

...

Keyword(s):

Colorectal Cancer ◽

Signaling Pathway ◽

Risk Score ◽

Low Risk ◽

Training Data ◽

The Cancer Genome Atlas ◽

Training Set ◽

Data Set ◽

Validation Set ◽

Cox Analysis

Background: Plenty of evidence has suggested that autophagy plays a crucial role in the biological processes of cancers. This study aimed to screen autophagy-related genes (ARGs) and establish a novel a scoring system for colorectal cancer (CRC). Methods: Autophagy-related genes sequencing data and the corresponding clinical data of CRC in The Cancer Genome Atlas were used as training data set. The GSE39582 data set from the Gene Expression Omnibus was used as validation set. An autophagy-related signature was developed in training set using univariate Cox analysis followed by stepwise multivariate Cox analysis and assessed in the validation set. Then we analyzed the function and pathways of ARGs using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Finally, a prognostic nomogram combining the autophagy-related risk score and clinicopathological characteristics was developed according to multivariate Cox analysis. Results: After univariate and multivariate analysis, 3 ARGs were used to construct autophagy-related signature. The KEGG pathway analyses showed several significantly enriched oncological signatures, such as p53 signaling pathway, apoptosis, human cytomegalovirus infection, platinum drug resistance, necroptosis, and ErbB signaling pathway. Patients were divided into high- and low-risk groups, and patients with high risk had significantly shorter overall survival (OS) than low-risk patients in both training set and validation set. Furthermore, the nomogram for predicting 3- and 5-year OS was established based on autophagy-based risk score and clinicopathologic factors. The area under the curve and calibration curves indicated that the nomogram showed well accuracy of prediction. Conclusions: Our proposed autophagy-based signature has important prognostic value and may provide a promising tool for the development of personalized therapy.

Download Full-text

Evaluation of Tree-Based Ensemble Machine Learning Models in Predicting Stock Price Direction of Movement

Information ◽

10.3390/info11060332 ◽

2020 ◽

Vol 11 (6) ◽

pp. 332

Author(s):

Ernest Kwame Ampomah ◽

Zhiguang Qin ◽

Gabriel Nyame

Keyword(s):

Machine Learning ◽

Stock Market ◽

Stock Price ◽

Superior Performance ◽

Operating Characteristics ◽

Training Set ◽

Data Set ◽

Test Set ◽

Ensemble Machine Learning ◽

Better Than

Forecasting the direction and trend of stock price is an important task which helps investors to make prudent financial decisions in the stock market. Investment in the stock market has a big risk associated with it. Minimizing prediction error reduces the investment risk. Machine learning (ML) models typically perform better than statistical and econometric models. Also, ensemble ML models have been shown in the literature to be able to produce superior performance than single ML models. In this work, we compare the effectiveness of tree-based ensemble ML models (Random Forest (RF), XGBoost Classifier (XG), Bagging Classifier (BC), AdaBoost Classifier (Ada), Extra Trees Classifier (ET), and Voting Classifier (VC)) in forecasting the direction of stock price movement. Eight different stock data from three stock exchanges (NYSE, NASDAQ, and NSE) are randomly collected and used for the study. Each data set is split into training and test set. Ten-fold cross validation accuracy is used to evaluate the ML models on the training set. In addition, the ML models are evaluated on the test set using accuracy, precision, recall, F1-score, specificity, and area under receiver operating characteristics curve (AUC-ROC). Kendall W test of concordance is used to rank the performance of the tree-based ML algorithms. For the training set, the AdaBoost model performed better than the rest of the models. For the test set, accuracy, precision, F1-score, and AUC metrics generated results significant to rank the models, and the Extra Trees classifier outperformed the other models in all the rankings.

Download Full-text

Multiclass Classifier for P-Glycoprotein Substrates, Inhibitors, and Non-Active Compounds

Molecules ◽

10.3390/molecules24102006 ◽

2019 ◽

Vol 24 (10) ◽

pp. 2006 ◽

Cited By ~ 1

Author(s):

Liadys Mora Lagares ◽

Nikola Minovski ◽

Marjana Novič

Keyword(s):

In Silico ◽

Transmembrane Protein ◽

External Validation ◽

Assessment Process ◽

Classification Model ◽

Training Set ◽

Test Set ◽

Active Compounds ◽

P Glycoprotein ◽

Validation Set

P-glycoprotein (P-gp) is a transmembrane protein that actively transports a wide variety of chemically diverse compounds out of the cell. It is highly associated with the ADMET (absorption, distribution, metabolism, excretion and toxicity) properties of drugs/drug candidates and contributes to decreasing toxicity by eliminating compounds from cells, thereby preventing intracellular accumulation. Therefore, in the drug discovery and toxicological assessment process it is advisable to pay attention to whether a compound under development could be transported by P-gp or not. In this study, an in silico multiclass classification model capable of predicting the probability of a compound to interact with P-gp was developed using a counter-propagation artificial neural network (CP ANN) based on a set of 2D molecular descriptors, as well as an extensive dataset of 2512 compounds (1178 P-gp inhibitors, 477 P-gp substrates and 857 P-gp non-active compounds). The model provided a good classification performance, producing non error rate (NER) values of 0.93 for the training set and 0.85 for the test set, while the average precision (AvPr) was 0.93 for the training set and 0.87 for the test set. An external validation set of 385 compounds was used to challenge the model’s performance. On the external validation set the NER and AvPr values were 0.70 for both indices. We believe that this in silico classifier could be effectively used as a reliable virtual screening tool for identifying potential P-gp ligands.

Download Full-text

Dataset Splitting Techniques Comparison For Face Classification on CCTV Images

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.58092 ◽

2020 ◽

Vol 14 (4) ◽

pp. 341

Author(s):

Ade Nurhopipah ◽

Uswatun Hasanah

Keyword(s):

Splitting Method ◽

Machine Learning Algorithms ◽

Support Vector ◽

Training Set ◽

Test Set ◽

Face Classification ◽

Lower Accuracy ◽

Svm Algorithm ◽

Stable Performance ◽

Validation Set

The performance of classification models in machine learning algorithms is influenced by many factors, one of which is dataset splitting method. To avoid overfitting, it is important to apply a suitable dataset splitting strategy. This study presents comparison of four dataset splitting techniques, namely Random Sub-sampling Validation (RSV), k-Fold Cross Validation (k-FCV), Bootstrap Validation (BV) and Moralis Lima Martin Validation (MLMV). This comparison is done in face classification on CCTV images using Convolutional Neural Network (CNN) algorithm and Support Vector Machine (SVM) algorithm. This study is also applied in two image datasets. The results of the comparison are reviewed by using model accuracy in training set, validation set and test set, also bias and variance of the model. The experiment shows that k-FCV technique has more stable performance and provide high accuracy on training set as well as good generalizations on validation set and test set. Meanwhile, data splitting using MLMV technique has lower performance than the other three techniques since it yields lower accuracy. This technique also shows higher bias and variance values and it builds overfitting models, especially when it is applied on validation set.

Download Full-text

An Early Biomarker Algorithm Predicts Lethal Graft-Versus-Host Disease and Survival after Allogeneic Hematopoietic Cell Transplantation

Blood ◽

10.1182/blood.v128.22.509.509 ◽

2016 ◽

Vol 128 (22) ◽

pp. 509-509 ◽

Cited By ~ 1

Author(s):

Matthew J Hartwell ◽

Umut Ozbek ◽

Ernst Holler ◽

Anne S. Renteria ◽

Pavan R. Reddy ◽

...

Keyword(s):

High Risk ◽

Research Funding ◽

Acute Gvhd ◽

Conditioning Regimen ◽

Training Set ◽

Test Set ◽

Graft Versus Host ◽

Independent Validation ◽

Blood Biomarker ◽

Validation Set

Abstract No laboratory test can predict non-relapse mortality (NRM) after hematopoietic cellular transplantation (HCT) prior to the onset graft-versus-host disease (GVHD). Recently, we have shown that a signature of three GVHD plasma biomarkers (TNFR1, ST2, and REG3α) can predict response to GVHD therapy and NRM at the onset of clinical GVHD (Levine, Lancet Haem, 2015). Our goal in the current study was to identify a blood biomarker signature that could predict lethal GVHD and six-month NRM well in advance of the onset of GVHD symptoms. Patient samples on day +7 after HCT were obtained from 1,287 patients from 11 HCT centers in the Mount Sinai Acute GVHD International Consortium (MAGIC). Samples from two large centers (n = 929) were combined and randomly assigned to a training set (n = 620) and test set (n = 309). 358 patients from nine others centers constituted an independent validation set. The overall cumulative incidences of 6-month NRM were 11%, 12%, and 13% for the training, test, and validation sets respectively. The incidence of lethal GVHD, defined as death without preceding relapse while under steroid treatment for acute GVHD, were 18%, 24%, and 14% in the same groups, respectively. The median day of GVHD onset was 28 days in the training set and 29 days in the test and validation sets. We measured four GVHD related biomarkers [ST2, REG3α, TNFR1, and IL2Rα] in all samples and used the training set alone to develop competing risks regression models that used all 13 possible combinations of one to four biomarkers to predict 6-month NRM. The best algorithm, which we rigorously confirmed through Monte Carlo cross-validation of 75 different combinations of training sets, included ST2 and REG3α. No combination of one, three, or four biomarkers was superior to the combination of these two biomarkers. The day 7 algorithm identified high risk (HR) and low risk (LR) groups with 6-month NRMs of 28% and 7%, respectively (p<0.001) (Fig 1A). The relapse rates did not differ between risk groups so that overall survival (OS) was 60% for HR and 84% for LR (p<0.001) (Fig 1B). When applied to the test set (Fig 1C/D), the algorithm identified 54/309 (17%) of the patients as HR with an NRM of 33% vs 7% for LR patients (p<0.001) and 6-month OS of 57% and 81% for HR and LR patients, respectively (p<0.001). In the independent validation set (Fig 1 E/F), the algorithm identified 72/358 (20%) of the patients as HR with an NRM of 26% vs 10% for LR patients (p<0.001) and OS of 68% and 85% for HR and LR patients, respectively (p<0.001). High risk patients were three times more likely to die from GVHD than LR patients in each cohort (p<0.001) (Fig 2). The GI tract is the GVHD target organ that is most resistant to treatment and represents a major cause of NRM, and we observed twice as much severe GI GVHD (stage 3 or 4) in HR patients as in LR patients (p<0.001, data not shown). The algorithm successfully separated HR and LR strata for 6 month NRM in several groups with differing risks for GVHD and NRM, including donor type, degree. of HLA-match, age group, and conditioning regimen intensity (Fig 3). In conclusion, we have developed a blood biomarker algorithm that predicts the development of lethal GVHD seven days after HCT, which performed successfully in large multicenter validation sets. The GVH reaction is already in progress by day +7, even though clinical symptoms may not occur until days or weeks later. We speculate that the blood biomarker concentrations at this early time point reflect subclinical GI pathology, a notion that is reinforced by the fact that ST2 and REG3α, the two biomarkers in the algorithm, are closely associated with GI GVHD. The algorithm identified HR and LR strata in several patient groups with different overall risk for lethal GVHD (donor, HLA match, conditioning regimen intensity, age). This day +7 algorithm should prove useful in clinical BMT research by identifying patients at high risk for lethal GVHD who might benefit from aggressive preemptive treatment strategies. Disclosures Chen: Novartis: Research Funding; Incyte Corporation: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding. Jagasia:Therakos: Consultancy. Kitko:Therakos: Honoraria, Speakers Bureau. Kroeger:Novartis: Honoraria, Research Funding. Levine:Viracor: Patents & Royalties: GVHD biomarkers patent. Ferrara:Viracor: Patents & Royalties: GVHD biomarkers patent.

Download Full-text

Identification of 5 Gene Signatures in Survival Prediction for Patients with Lung Squamous Cell Carcinoma Based on Integrated Multiomics Data Analysis

BioMed Research International ◽

10.1155/2020/6427483 ◽

2020 ◽

Vol 2020 ◽

pp. 1-19

Author(s):

Hongxia Ma ◽

Lihong Tong ◽

Qian Zhang ◽

Wenjun Chang ◽

Fengsen Li

Keyword(s):

Squamous Cell Carcinoma ◽

Data Analysis ◽

Cell Carcinoma ◽

Squamous Cell ◽

Lung Squamous Cell Carcinoma ◽

Training Set ◽

Test Set ◽

Gene Signatures ◽

Genomic Variants ◽

Validation Set

Background. Lung squamous cell carcinoma (LSCC) is a frequently diagnosed cancer worldwide, and it has a poor prognosis. The current study is aimed at developing the prediction of LSCC prognosis by integrating multiomics data including transcriptome, copy number variation data, and mutation data analysis, so as to predict patients’ survival and discover new therapeutic targets. Methods. RNASeq, SNP, CNV data, and LSCC patients’ clinical follow-up information were downloaded from The Cancer Genome Atlas (TCGA), and the samples were randomly divided into two groups, namely, the training set and the validation set. In the training set, the genes related to prognosis and those with different copy numbers or with different SNPs were integrated to extract features using random forests, and finally, robust biomarkers were screened. In addition, a gene-related prognostic model was established and further verified in the test set and GEO validation set. Results. We obtained a total of 804 prognostic-related genes and 535 copy amplification genes, 621 copy deletions genes, and 388 significantly mutated genes in genomic variants; noticeably, these genomic variant genes were found closely related to tumor development. A total of 51 candidate genes were obtained by integrating genomic variants and prognostic genes, and 5 characteristic genes (HIST1H2BH, SERPIND1, COL22A1, LCE3C, and ADAMTS17) were screened through random forest feature selection; we found that many of those genes had been reported to be related to LSCC progression. Cox regression analysis was performed to establish 5-gene signature that could serve as an independent prognostic factor for LSCC patients and can stratify risk samples in training set, test set, and external validation set (p<0.01), and the 5-year survival areas under the curve (AUC) of both training set and validation set were > 0.67. Conclusion. In the current study, 5 gene signatures were constructed as novel prognostic markers to predict the survival of LSCC patients. The present findings provide new diagnostic and prognostic biomarkers and therapeutic targets for LSCC treatment.

Download Full-text

Neural network-based sperm whale click classification

Journal of the Marine Biological Association of the United Kingdom ◽

10.1017/s0025315407054756 ◽

2007 ◽

Vol 87 (1) ◽

pp. 35-38 ◽

Cited By ~ 12

Author(s):

M. van der Schaar ◽

E. Delory ◽

A. Català ◽

M. André

Keyword(s):

Neural Network ◽

Wavelet Packet ◽

Radial Basis Function Network ◽

Sperm Whale ◽

Data Sets ◽

Sperm Whales ◽

Training Set ◽

Data Set ◽

Validation Set ◽

Function Network

Recordings of a group of foraging sperm whales usually result in a mixture of clicks from different animals. To analyse the click sequences of individual whales these clicks need to be separated, and for this an automatic classifier would be preferred. Here we study the use of a radial basis function network to perform the separation. The neural network's ability to discriminate between different whales was tested with six data sets of individually diving males. The data consisted of five shorter click trains and one complete dive which was especially important to evaluate the capacity of the network to generalize. The network was trained with characteristics extracted from the six click series with the help of a wavelet packet-based local discriminant basis. The selected features were separated in a training set containing 50 clicks of each data set and a validation set with the remaining clicks. After the network was trained it could correctly classify around 90% of the short click series, while for the entire dive this percentage was around 78%.

Download Full-text

Machine Learning-Based Prediction of Survival Prognosis in Cervical Cancer

10.21203/rs.3.rs-134659/v1 ◽

2020 ◽

Author(s):

Dongyan Ding ◽

Tingyuan Lang ◽

Dongling Zou ◽

Jiawei Tan ◽

Jia Chen ◽

...

Keyword(s):

Cervical Cancer ◽

Survival Rate ◽

Prediction Model ◽

Missing Values ◽

Prediction Models ◽

Survival Prediction ◽

Training Set ◽

Data Set ◽

Test Set ◽

Group 1

Abstract Backgroud: Accurately forecasting the prognosis could improve therapeutic management of cancer patients, however, the currently used clinical features are difficult to provide enought information. The purpose of this study is to develop a survival prediction model for cervical cancer patients with big data and machine learning algorithms. Results: The cancer genome atlas cervical cancer data, including the expression of 1046 microRNAs and the clinical information of 309 cervical and endocervical cancer and 3 control samples, were downloaded. Missing values and outliers imputation, samples normalization, log transformation and features scaling were performed for preprocessing and 3 control, 2 metastatic samples and 707 microRNAs with missing values ≥ 20% were excluded. By Cox Proportional-Hazards analysis, 55 prognosis-related microRNAs (20 positively and 35 negatively correlated with survival) were identified. K-means clustering analysis showed that the cervical cancer samples can be separated into two and three subgroups with top 20 identified survival-related microRNAs for best stratification. By Support Vector Machine algorithm, two prediction models were developed which can segment the patients into two and three groups with different survival rate, respectively. The models exhibite high performance : for two classes, Area under the curve = 0.976 (training set), 0.972 (test set), 0.974 (whole data set); for three classes, AUC = 0.983, 0.996 and 0.991 (group1, 2 and 3 in training set), 0.955, 0.989 and 0.991 (group 1, 2 and 3 in test set), 0.974, 0.993 and 0.991 (group 1, 2 and 3 in whole data set) .Conclusion: The survival prediction models for cervical cancer were developed. The patients with very low survival rate (≤ 40%) can be separated by the three classes prediction model first. The rest patients can be identified by the two classes prediction model as high survival rate (≈ 75%) and low survival rate (≈ 50%).

Download Full-text