scholarly journals A Quantitative Assessment of Pre-Operative MRI Reports in Glioma Patients: Report Metrics and IDH Prediction Ability

2021 ◽  
Vol 10 ◽  
Author(s):  
Hang Cao ◽  
E. Zeynep Erson-Omay ◽  
Murat Günel ◽  
Jennifer Moliterno ◽  
Robert K. Fulbright

ObjectivesTo measure the metrics of glioma pre-operative MRI reports and build IDH prediction models.MethodsPre-operative MRI reports of 144 glioma patients in a single institution were collected retrospectively. Words were transformed to lowercase letters. White spaces, punctuations, and stop words were removed. Stemming was performed. A word cloud method applied to processed text matrix visualized language behavior. Spearman’s rank correlation assessed the correlation between the subjective descriptions of the enhancement pattern. The T1-contrast images associated with enhancement descriptions were selected. The keywords associated with IDH status were evaluated by χ2 value ranking. Random forest, k-nearest neighbors and Support Vector Machine algorithms were used to train models based on report features and age. All statistical analysis used two-tailed test with significance at p <.05.ResultsLonger word counts occurred in reports of older patients, higher grade gliomas, and wild type IDH gliomas. We identified 30 glioma enhancement descriptions, eight of which were commonly used: peripheral, heterogeneous, irregular, nodular, thick, rim, large, and ring. Five of eight patterns were correlated. IDH mutant tumors were characterized by words related to normal, symmetric or negative findings. IDH wild type tumors were characterized words by related to pathological MR findings like enhancement, necrosis and FLAIR foci. An integrated KNN model based on report features and age demonstrated high-performance (AUC: 0.89, 95% CI: 0.88–0.90).ConclusionReport length depended on age, glioma grade, and IDH status. Description of glioma enhancement was varied. Report descriptions differed for IDH wild and mutant gliomas. Report features can be used to predict glioma IDH status.

2020 ◽  
Vol 11 ◽  
Author(s):  
Llibertat Tusell ◽  
Rob Bergsma ◽  
Hélène Gilbert ◽  
Daniel Gianola ◽  
Miriam Piles

This research assessed the ability of a Support Vector Machine (SVM) regression model to predict pig crossbred (CB) performance from various sources of phenotypic and genotypic information for improving crossbreeding performance at reduced genotyping cost. Data consisted of average daily gain (ADG) and residual feed intake (RFI) records and genotypes of 5,708 purebred (PB) boars and 5,007 CB pigs. Prediction models were fitted using individual PB genotypes and phenotypes (trn.1); genotypes of PB sires and average of CB records per PB sire (trn.2); and individual CB genotypes and phenotypes (trn.3). The average of CB offspring records was the trait to be predicted from PB sire’s genotype using cross-validation. Single nucleotide polymorphisms (SNPs) were ranked based on the Spearman Rank correlation with the trait. Subsets with an increasing number (from 50 to 2,000) of the most informative SNPs were used as predictor variables in SVM. Prediction performance was the median of the Spearman correlation (SC, interquartile range in brackets) between observed and predicted phenotypes in the testing set. The best predictive performances were obtained when sire phenotypic information was included in trn.1 (0.22 [0.03] for RFI with SVM and 250 SNPs, and 0.12 [0.05] for ADG with SVM and 500–1,000 SNPs) or when trn.3 was used (0.29 [0.16] with Genomic best linear unbiased prediction (GBLUP) for RFI, and 0.15 [0.09] for ADG with just 50 SNPs). Animals from the last two generations were assigned to the testing set and remaining animals to the training set. Individual’s PB own phenotype and genotype improved the prediction ability of CB offspring of young animals for ADG but not for RFI. The highest SC was 0.34 [0.21] and 0.36 [0.22] for RFI and ADG, respectively, with SVM and 50 SNPs. Predictive performance using CB data for training leads to a SC of 0.34 [0.19] with GBLUP and 0.28 [0.18] with SVM and 250 SNPs for RFI and 0.34 [0.15] with SVM and 500 SNPs for ADG. Results suggest that PB candidates could be evaluated for CB performance with SVM and low-density SNP chip panels after collecting their own RFI or ADG performances or even earlier, after being genotyped using a reference population of CB animals.


Diagnostics ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 574
Author(s):  
Gennaro Tartarisco ◽  
Giovanni Cicceri ◽  
Davide Di Pietro ◽  
Elisa Leonardi ◽  
Stefania Aiello ◽  
...  

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.


Algorithms ◽  
2021 ◽  
Vol 14 (10) ◽  
pp. 282
Author(s):  
Di Wu ◽  
Wanying Zhang ◽  
Heming Jia ◽  
Xin Leng

Chimp Optimization Algorithm (ChOA), a novel meta-heuristic algorithm, has been proposed in recent years. It divides the population into four different levels for the purpose of hunting. However, there are still some defects that lead to the algorithm falling into the local optimum. To overcome these defects, an Enhanced Chimp Optimization Algorithm (EChOA) is developed in this paper. Highly Disruptive Polynomial Mutation (HDPM) is introduced to further explore the population space and increase the population diversity. Then, the Spearman’s rank correlation coefficient between the chimps with the highest fitness and the lowest fitness is calculated. In order to avoid the local optimization, the chimps with low fitness values are introduced with Beetle Antenna Search Algorithm (BAS) to obtain visual ability. Through the introduction of the above three strategies, the ability of population exploration and exploitation is enhanced. On this basis, this paper proposes an EChOA-SVM model, which can optimize parameters while selecting the features. Thus, the maximum classification accuracy can be achieved with as few features as possible. To verify the effectiveness of the proposed method, the proposed method is compared with seven common methods, including the original algorithm. Seventeen benchmark datasets from the UCI machine learning library are used to evaluate the accuracy, number of features, and fitness of these methods. Experimental results show that the classification accuracy of the proposed method is better than the other methods on most data sets, and the number of features required by the proposed method is also less than the other algorithms.


Scientifica ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-4 ◽  
Author(s):  
Vani Chandrashekar

Hb A1c measurement is subject to interference by hemoglobin traits and this is dependent on the method used for determination. In this paper we studied the difference between Hb A1c measured by HPLC in hemoglobin traits and normal chromatograms. We also studied the correlation of Hb A1c with age. Hemoglobin analysis was carried out by high performance liquid chromatography. Spearman’s rank correlation was used to study correlation between A1c levels and age. Mann-WhitneyUtest was used to study the difference in Hb A1c between patients with normal hemoglobin and hemoglobin traits. A total of 431 patients were studied. There was positive correlation with age in patients with normal chromatograms only. No correlation was seen in Hb E trait or beta thalassemia trait. No significant difference in Hb A1c of patients with normal chromatograms and patients with hemoglobin traits was seen. There is no interference by abnormal hemoglobin in the detection of A1c by high performance liquid chromatography. This method cannot be used for detection of A1c in compound heterozygous and homozygous disorders.


2021 ◽  
Vol 12 (1) ◽  
pp. 89
Author(s):  
Ruiqi Chen ◽  
Tianyu Wu ◽  
Yuchen Zheng ◽  
Ming Ling

In Internet of Things (IoT) scenarios, it is challenging to deploy Machine Learning (ML) algorithms on low-cost Field Programmable Gate Arrays (FPGAs) in a real-time, cost-efficient, and high-performance way. This paper introduces Machine Learning on FPGA (MLoF), a series of ML IP cores implemented on the low-cost FPGA platforms, aiming at helping more IoT developers to achieve comprehensive performance in various tasks. With Verilog, we deploy and accelerate Artificial Neural Networks (ANNs), Decision Trees (DTs), K-Nearest Neighbors (k-NNs), and Support Vector Machines (SVMs) on 10 different FPGA development boards from seven producers. Additionally, we analyze and evaluate our design with six datasets, and compare the best-performing FPGAs with traditional SoC-based systems including NVIDIA Jetson Nano, Raspberry Pi 3B+, and STM32L476 Nucle. The results show that Lattice’s ICE40UP5 achieves the best overall performance with low power consumption, on which MLoF averagely reduces power by 891% and increases performance by 9 times. Moreover, its cost, power, Latency Production (CPLP) outperforms SoC-based systems by 25 times, which demonstrates the significance of MLoF in endpoint deployment of ML algorithms. Furthermore, we make all of the code open-source in order to promote future research.


2019 ◽  
Author(s):  
Hannes Rosenbusch ◽  
Felix Soldner ◽  
Anthony M Evans ◽  
Marcel Zeelenberg

Machine learning methods for pattern detection and prediction are increasingly prevalent in psychological research. We provide a comprehensive overview of machine learning, its applications, and how to implement models for research. We review fundamental concepts of machine learning, such as prediction accuracy and out-of-sample evaluation, and summarize four standard prediction algorithms: linear regressions, ridge regressions, decision trees, and random forests (plus k-nearest neighbors, Naïve Bayes classifiers, and support vector machines in the supplementary material). This selection provides a set of powerful models that are implemented regularly in machine learning projects. We demonstrate each method with examples and annotated R code, and discuss best practices for determining sample sizes; comparing model performances; tuning prediction models; preregistering prediction models; and reporting results. Finally, we discuss the value of machine learning methods in maintaining psychology’s status as a predictive science.


Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3417 ◽  
Author(s):  
Longtu Zhu ◽  
Honglei Jia ◽  
Yibing Chen ◽  
Qi Wang ◽  
Mingwei Li ◽  
...  

Soil organic matter (SOM) is a major indicator of soil fertility and nutrients. In this study, a soil organic matter measuring method based on an artificial olfactory system (AOS) was designed. An array composed of 10 identical gas sensors controlled at different temperatures was used to collect soil gases. From the response curve of each sensor, four features were extracted (maximum value, mean differential coefficient value, response area value, and the transient value at the 20th second). Then, soil organic matter regression prediction models were built based on back-propagation neural network (BPNN), support vector regression (SVR), and partial least squares regression (PLSR). The prediction performance of each model was evaluated using the coefficient of determination (R2), root-mean-square error (RMSE), and the ratio of performance to deviation (RPD). It was found that the R2 values between prediction (from BPNN, SVR, and PLSR) and observation were 0.880, 0.895, and 0.808. RMSEs were 14.916, 14.094, and 18.890, and RPDs were 2.837, 3.003, and 2.240, respectively. SVR had higher prediction ability than BPNN and PLSR and can be used to accurately predict organic matter contents. Thus, our findings offer brand new methods for predicting SOM.


2020 ◽  
Vol 10 (8) ◽  
pp. 2941 ◽  
Author(s):  
Lifei Wei ◽  
Yangxi Zhang ◽  
Ziran Yuan ◽  
Zhengxiang Wang ◽  
Feng Yin ◽  
...  

Soil total arsenic (TAs) contamination caused by human activities—such as mining, smelting, and agriculture—is a problem of global concern. Visible/near-infrared (VNIR), X-ray fluorescence spectroscopy (XRF), and laser-induced breakdown spectroscopy (LIBS) do not need too much sample preparation and utilization of chemicals to evaluate total arsenic (TAs) concentration in soil. VNIR with hyperspectral imaging has the potential to predict TAs concentration in soil. In this study, 59 soil samples were collected from the Daye City mining area of China, and hyperspectral imaging of the soil samples was undertaken using a visible/near-infrared hyperspectral imaging system (wavelength range 470–900 nm). Spectral preprocessing included standard normal variate (SNV) transformation, multivariate scatter correction (MSC), first derivative (FD) preprocessing, and second derivative (SD) preprocessing. Characteristic bands were then identified based on Spearman’s rank correlation coefficients. Four regression models were used for the modeling prediction: partial least squares regression (PLSR) (R2 = 0.71, RMSE = 0.48), support vector machine regression (SVMR) (R2 = 0.78, RMSE = 0.42), random forest (RF) (R2 = 0.78, RMSE = 0.42), and extremely randomized trees regression (ETR) (R2 = 0.81, RMSE = 0.38). The prediction results were compared with the results of atomic fluorescence spectrometry methods. In the prediction results of the models, the accuracy of ETR using FD preprocessing was the highest. The results confirmed that hyperspectral imaging combined with Spearman’s rank correlation with machine learning models can be used to estimate soil TAs content.


Author(s):  
Amitesh Gupta ◽  
Biswajeet Pradhan

AbstractThe COVID-19 pandemic has outspread obstreperously in India. As of June 04, 2020, more than 2 lakh cases have been confirmed with a death rate of 2.81%. It has been noticed that, out of each 1000 tests, 53 result positively infected. In order to investigate the impact of weather conditions on daily transmission occurring in India, daily data of Maximum (TMax), Minimum (TMin), Mean (TMean) and Dew Point Temperature (TDew), Diurnal Temperature range (TRange), Average Relative Humidity, Range in Relative Humidity, and Wind Speed (WS) over 9 most affected cities are analysed in several time frames: weather of that day, 7, 10, 12, 14, 16 days before transmission. Spearman’s rank correlation (r) shows significant but low correlation with most of the weather parameters, however, comparatively better association exists on 14 days lag. Diurnal range in Temperature and Relative Humidity shows non-significant correlation. Analysis shows, COVID-19 cases likely to be increased with increasing air temperature, however role of humidity is not clear. Among weather parameters, Minimum Temperature was relatively better correlate than other. 80% of the total confirmed cases were registered when TMax, TMean, TMin, TRange, TDew, and WS on 12-16 days ago vary within a range of 33.6-41.3° C, 29.8-36.5° C, 24.8-30.4° C, 7.5-15.2° C, 18.7-23.6° C, and 4.2-5.75 m/s respectively, hence, it gives an idea of susceptible weather conditions for such transmission in India. Using Support Vector Machine based regression, the daily cases are profoundly estimated with more than 80% accuracy, which indicate that coronavirus transmission can’t be well linearly correlated with any single weather parameters, rather multivariate non-linear approach must be employed. Accounting lag of 12-16 days, the association found to be excellent, thus depict that there is an incubation period of 14 ± 02 days for coronavirus transmission in Indian scenario.


2020 ◽  
Vol 98 (11) ◽  
Author(s):  
Wilson Barragán-Hernández ◽  
Liliana Mahecha-Ledesma ◽  
William Burgos-Paz ◽  
Martha Olivera-Angel ◽  
Joaquín Angulo-Arizala

Abstract This study aimed to predict fat and fatty acids (FA) contents in beef using near-infrared spectroscopy and prediction models based on partial least squares (PLS) and support vector machine regression in radial kernel (R-SVR). Fat and FA were assessed in 200 longissimus thoracis samples, and spectra were collected in reflectance mode from ground meat. The analyses were performed for PLS and R-SVR with and without wavelength selection based on genetic algorithms (GAs). The GA application improved the error prediction by 15% and 68% for PLS and R-SVR, respectively. Models based on GA plus R-SMV showed a prediction ability for fat and FA with an average coefficient of determination of 0.92 and ratio performance deviation of 4.8.


Sign in / Sign up

Export Citation Format

Share Document