Machine learning-based CT radiomics features for the prediction of pulmonary metastasis in osteosarcoma

2021 ◽  
pp. 20201391
Author(s):  
Helcio Mendonça Pereira ◽  
Maria Eugenia Duarte Leite ◽  
Igor R Damasceno ◽  
Luiz Afonso Santos, OM ◽  
Marcello Henrique Nogueira-Barbosa

Objective: This study aims to build machine learning-based CT radiomic features to predict patients developing metastasis after osteosarcoma diagnosis. Methods and materials: This retrospective study has included 81 patients with a histopathological diagnosis of osteosarcoma. The entire dataset was divided randomly into training (60%) and test sets (40%). A data augmentation technique for the minority class was performed in the training set, along with feature’s selection and model’s training. The radiomic features were extracted from CT’s image of the local osteosarcoma. Three frequently-used machine-learning models tried to predict patients with lung metastasis (MT) and those without lung metastasis (non-MT). According to the higher area under the curve (AUC), the best classifier was chosen and applied in the testing set with unseen data to provide an unbiased evaluation of the final model. Results: The best classifier for predicting MT and non-MT groups used a Random Forest algorithm. The AUC and accuracy results of the test set were bulky. (accuracy of 73% [ 95% coefficient interval (CI): 54%; 87%] and AUC of 0.79 [95% CI: 0.62; 0.96]). Features that fitted the model (radiomics signature) derived from Laplacian of Gaussian and wavelet filters. Conclusions: Machine learning-based CT radiomics approach can provide a non-invasive method with a fair predictive accuracy of the risk of developing pulmonary metastasis in osteosarcoma patients. Advances in knowledge: Models based on CT radiomic analysis help assess the risk of developing pulmonary metastases in patients with osteosarcoma, allowing further studies for those with a worse prognosis.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Huu-Thanh Duong ◽  
Tram-Anh Nguyen-Thi

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.


2021 ◽  
pp. 1-10
Author(s):  
I. Krug ◽  
J. Linardon ◽  
C. Greenwood ◽  
G. Youssef ◽  
J. Treasure ◽  
...  

Abstract Background Despite a wide range of proposed risk factors and theoretical models, prediction of eating disorder (ED) onset remains poor. This study undertook the first comparison of two machine learning (ML) approaches [penalised logistic regression (LASSO), and prediction rule ensembles (PREs)] to conventional logistic regression (LR) models to enhance prediction of ED onset and differential ED diagnoses from a range of putative risk factors. Method Data were part of a European Project and comprised 1402 participants, 642 ED patients [52% with anorexia nervosa (AN) and 40% with bulimia nervosa (BN)] and 760 controls. The Cross-Cultural Risk Factor Questionnaire, which assesses retrospectively a range of sociocultural and psychological ED risk factors occurring before the age of 12 years (46 predictors in total), was used. Results All three statistical approaches had satisfactory model accuracy, with an average area under the curve (AUC) of 86% for predicting ED onset and 70% for predicting AN v. BN. Predictive performance was greatest for the two regression methods (LR and LASSO), although the PRE technique relied on fewer predictors with comparable accuracy. The individual risk factors differed depending on the outcome classification (EDs v. non-EDs and AN v. BN). Conclusions Even though the conventional LR performed comparably to the ML approaches in terms of predictive accuracy, the ML methods produced more parsimonious predictive models. ML approaches offer a viable way to modify screening practices for ED risk that balance accuracy against participant burden.


2020 ◽  
Vol 117 (31) ◽  
pp. 18869-18879 ◽  
Author(s):  
Christopher Culley ◽  
Supreeta Vijayakumar ◽  
Guido Zampieri ◽  
Claudio Angione

Metabolic modeling and machine learning are key components in the emerging next generation of systems and synthetic biology tools, targeting the genotype–phenotype–environment relationship. Rather than being used in isolation, it is becoming clear that their value is maximized when they are combined. However, the potential of integrating these two frameworks for omic data augmentation and integration is largely unexplored. We propose, rigorously assess, and compare machine-learning–based data integration techniques, combining gene expression profiles with computationally generated metabolic flux data to predict yeast cell growth. To this end, we create strain-specific metabolic models for 1,143Saccharomyces cerevisiaemutants and we test 27 machine-learning methods, incorporating state-of-the-art feature selection and multiview learning approaches. We propose a multiview neural network using fluxomic and transcriptomic data, showing that the former increases the predictive accuracy of the latter and reveals functional patterns that are not directly deducible from gene expression alone. We test the proposed neural network on a further 86 strains generated in a different experiment, therefore verifying its robustness to an additional independent dataset. Finally, we show that introducing mechanistic flux features improves the predictions also for knockout strains whose genes were not modeled in the metabolic reconstruction. Our results thus demonstrate that fusing experimental cues with in silico models, based on known biochemistry, can contribute with disjoint information toward biologically informed and interpretable machine learning. Overall, this study provides tools for understanding and manipulating complex phenotypes, increasing both the prediction accuracy and the extent of discernible mechanistic biological insights.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 2581-2581 ◽  
Author(s):  
Paul Johannet ◽  
Nicolas Coudray ◽  
George Jour ◽  
Douglas MacArthur Donnelly ◽  
Shirin Bajaj ◽  
...  

2581 Background: There is growing interest in optimizing patient selection for treatment with immune checkpoint inhibitors (ICIs). We postulate that phenotypic features present in metastatic melanoma tissue reflect the biology of tumor cells, immune cells, and stromal tissue, and hence can provide predictive information about tumor behavior. Here, we test the hypothesis that machine learning algorithms can be trained to predict the likelihood of response and/or toxicity to ICIs. Methods: We examined 124 stage III/IV melanoma patients who received anti-CTLA-4 (n = 81), anti-PD-1 (n = 25), or combination (n = 18) therapy as first line. The tissue analyzed was resected before treatment with ICIs. In total, 340 H&E slides were digitized and annotated for three regions of interest: tumor, lymphocytes, and stroma. The slides were then partitioned into training (n = 285), validation (n = 26), and test (n = 29) sets. Slides were tiled (299x299 pixels) at 20X magnification. We trained a deep convolutional neural network (DCNN) to automatically segment the images into each of the three regions and then deconstruct images into their component features to detect non-obvious patterns with objectivity and reproducibility. We then trained the DCNN for two classifications: 1) complete/partial response versus progression of disease (POD), and 2) severe versus no immune-related adverse events (irAEs). Predictive accuracy was estimated by area under the curve (AUC) of receiver operating characteristics (ROC). Results: The DCNN identified tumor within LN with AUC 0.987 and within ST with AUC 0.943. Prediction of POD based on ST-only always performed better than prediction based on LN-only (AUC 0.84 compared to 0.61, respectively). The DCNN had an average AUC 0.69 when analyzing only tumor regions from both LN and ST data sets and AUC 0.68 when analyzing tumor and lymphocyte regions. Severe irAEs were predicted with limited accuracy (AUC 0.53). Conclusions: Our results support the potential application of machine learning on pre-treatment histologic slides to predict response to ICIs. It also revealed their limited value in predicting toxicity. We are currently investigating whether the predictive capability of the algorithm can be further improved by incorporating additional immunologic biomarkers.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Hooman Zabeti ◽  
Nick Dexter ◽  
Amir Hosein Safari ◽  
Nafiseh Sedaghat ◽  
Maxwell Libbrecht ◽  
...  

Abstract Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 15-16
Author(s):  
Pablo A S Fonseca ◽  
Massimo Tornatore ◽  
Angela Cánovas

Abstract Reduced fertility is one of the main causes of economic losses in dairy farms. The cost of a stillbirth is estimated in US$ 938 per case in Holstein herds. Machine learning (ML) is gaining popularity in the livestock sector as a mean to identify hidden patterns and due to its potential to address dimensionality problems. Here we investigate the application of ML algorithms for the prediction of cows with higher stillbirth susceptibility in two scenarios: cows with >25% and >33.33% of stillbirths among birth records. These thresholds correspond to percentiles 75 (still_75) and 90 (still_90), respectively. A total of 10,570 cows and 50,541 birth records were collected to perform a haplotype-based genome-wide association study. Five-hundred significant pseudo single nucleotide polymorphisms (pseudo-SNPs) (False-Discovery Rate< 0.05) were used as input features of ML-based predictions to determine if the cow is in the top-75 and top-90 percentiles. Table 1 shows the classification performance of the investigated ML and linear models. The ML models outperformed linear models for both thresholds. In general, still_75 showed higher F1 values compared to still_90, suggesting a lower misclassification ratio when a less stringent threshold is used. We observe that accuracy of the models in our study is higher when compared to ML-based prediction accuracies in other breeds, e.g. compared to the accuracies of 0.46 and 0.67 that were achieved using SNPs for body weight in Brahman and fertility traits in Nellore, respectively. Xgboost algorithm shows the highest balanced accuracy (BA; 0.625), F1-score (0.588) and area under the curve (AUC; 0.688), suggesting that xgboost can achieve the highest predictive performance and the lowest difference in misclassification ratio between classes. The ML applied over haplotype libraries is an interesting approach for the detection of animals with higher susceptibility to stillbirths due to highest predictive accuracy and relatively lower misclassification ratio.


2018 ◽  
Vol 8 (10) ◽  
pp. 1949 ◽  
Author(s):  
Yagya Raj Pandeya ◽  
Dongwhoon Kim ◽  
Joonwhoan Lee

The domestic cat (Feliscatus) is one of the most attractive pets in the world, and it generates mysterious kinds of sound according to its mood and situation. In this paper, we deal with the automatic classification of cat sounds using machine learning. Machine learning approach for the classification requires class labeled data, so our work starts with building a small dataset named CatSound across 10 categories. Along with the original dataset, we increase the amount of data with various audio data augmentation methods to help our classification task. In this study, we use two types of learned features from deep neural networks; one from a pre-trained convolutional neural net (CNN) on music data by transfer learning and the other from unsupervised convolutional deep belief network that is (CDBN) solely trained on a collected set of cat sounds. In addition to conventional GAP, we propose an effective pooling method called FDAP to explore a number of meaningful features. In FDAP, the frequency dimension is roughly divided and then the average pooling is applied in each division. For the classification, we exploited five different machine learning algorithms and an ensemble of them. We compare the classification performances with respect following factors: the amount of data increased by augmentation, the learned features from pre-trained CNN or unsupervised CDBN, conventional GAP or FDAP, and the machine learning algorithms used for the classification. As expected, the proposed FDAP features with larger amount of data increased by augmentation combined with the ensemble approach have produced the best accuracy. Moreover, both learned features from pre-trained CNN and unsupervised CDBN produce good results in the experiment. Therefore, with the combination of all those positive factors, we obtained the best result of 91.13% in accuracy, 0.91 in f1-score, and 0.995 in area under the curve (AUC) score.


Tomography ◽  
2021 ◽  
Vol 7 (3) ◽  
pp. 301-312
Author(s):  
Annette Erle ◽  
Sobhan Moazemi ◽  
Susanne Lütje ◽  
Markus Essler ◽  
Thomas Schultz ◽  
...  

The importance of machine learning (ML) in the clinical environment increases constantly. Differentiation of pathological from physiological tracer-uptake in positron emission tomography/computed tomography (PET/CT) images is considered time-consuming and attention intensive, hence crucial for diagnosis and treatment planning. This study aimed at comparing and validating supervised ML algorithms to classify pathological uptake in prostate cancer (PC) patients based on prostate-specific membrane antigen (PSMA)-PET/CT. Retrospective analysis of 68Ga-PSMA-PET/CTs of 72 PC patients resulted in a total of 77 radiomics features from 2452 manually delineated hotspots for training and labeled pathological (1629) or physiological (823) as ground truth (GT). As the held-out test dataset, 331 hotspots (path.:128, phys.: 203) were delineated in 15 other patients. Three ML classifiers were trained and ranked to assess classification performance. As a result, a high overall average performance (area under the curve (AUC) of 0.98) was achieved, especially to detect pathological uptake (0.97 mean sensitivity). However, there is still room for improvement to detect physiological uptake (0.82 mean specificity), especially for glands. The ML algorithm applied to manually delineated lesions predicts hotspot labels with high accuracy on unseen data and may be an important tool to assist in clinical diagnosis.


Dermatology ◽  
2021 ◽  
pp. 1-11
Author(s):  
Tao Zhang ◽  
Yingli Nie

<b><i>Background:</i></b> Alopecia areata (AA) is an autoimmune disease typified by nonscarring hair loss with a variable clinical course. Although there is an increased understanding of AA pathogenesis and progress in its treatments, the outcome of AA patients remains unfavorable, especially when they are progressing to the subtypes of alopecia totalis (AT) or alopecia universalis (AU). Thus, identifying biomarkers that reflect the risk of AA progressing to AT or AU could lead to better interventions for AA patients. <b><i>Methods:</i></b> In this study, we conducted bioinformatics analyses to select key genes that correlated to AU or AT based on the whole-genome gene expression of 122 human scalp skin biopsy specimens obtained from NCBI-GEO GSE68801. Then, we built a biomarker using 8 different machine learning (ML) algorithms based on the key genes selected by bioinformatics analyses. <b><i>Results:</i></b> We identified 4 key genes that significantly increased (CD28) or decreased (HOXC13, KRTAP1-3, and GPRC5D) in AA tissues, especially in the subtypes of AT and AU. Besides, the predictive accuracy (area under the curve [AUC] value) of the prediction models for forecasting AA patients progressing to AT/AU models reached 90.7% (87.9%) by logistic regression, 93.8% (79.9%) by classification trees, 100.0% (76.3%) by random forest, 96.9% (76.3%) by support vector machine, 83.5% (79.9%) by K-nearest neighbors, 97.1% (87.3%) by XGBoost, and 93.3% (80.6%) by neural network algorithms for the training (internal validation) cohort. Besides, 2 molecule drugs, azacitidine and anisomycin, were identified by Cmap database. They might have the potential therapeutic effects on AA patients with high risk of progressing to AT/AU. <b><i>Conclusions:</i></b> In the present study, we conducted high accuracy models for predicting the risk of AA patients progressing to AT or AU, which may be important in facilitating personalized therapeutic strategies and clinical management for different AA patients.


Healthcare ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 1107
Author(s):  
Tayla Anthony ◽  
Amit Kumar Mishra ◽  
Willem Stassen ◽  
Jarryd Son

This paper presents the application of machine learning for classifying time-critical conditions namely sepsis, myocardial infarction and cardiac arrest, based off transcriptions of emergency calls from emergency services dispatch centers in South Africa. In this study we present results from the application of four multi-class classification algorithms: Support Vector Machine (SVM), Logistic Regression, Random Forest and K-Nearest Neighbor (kNN). The application of machine learning for classifying time-critical diseases may allow for earlier identification, adequate telephonic triage, and quicker response times of the appropriate cadre of emergency care personnel. The data set consisted of an original data set of 93 examples which was further expanded through the use of data augmentation. Two feature extraction techniques were investigated namely; TF-IDF and handcrafted features. The results were further improved using hyper-parameter tuning and feature selection. In our work, within the limitations of a limited data set, classification results yielded an accuracy of up to 100% when training with 10-fold cross validation, and 95% accuracy when predicted on unseen data. The results are encouraging and show that automated diagnosis based on emergency dispatch centre transcriptions is feasible. When implemented in real time, this can have multiple utilities, e.g. enabling the call-takers to take the right action with the right priority.


Sign in / Sign up

Export Citation Format

Share Document