Classification of vulnerability levels using multivariate biomarkers in schizophrenia: a machine-learning approach

Abstract Background: Schizophrenia is a heterogeneous neurodevelopmental disease involving cognitive and motor impairments. Motor dysfunctions, such as eye movements or neurological soft signs, are proposed as endophenotypic markers. Methods: Supervised machine-learning methods (Support Vector Machines) applied on oculomotor performances using comprehensive testing with prosaccades, antisaccades, memory-guided saccade tasks and smooth pursuit, as well as neurological soft signs assessment, was used to discriminate patients with schizophrenia (SZ, N=53), full siblings of patients (FS, N=45) and healthy volunteers (C, N=48). 80% of patients were used in a training/validation set and 20% on a test set. The discrimination was measured using the classification error (rate of misclassified patients).Results: The most reliable classification was between C and SZ, with only 15% and 12% of error rates for validation and test, whereas the SZ vs. FS classification provided the highest error rates (32% of error rate in both validation and test). Interestingly, neurological soft signs were selected as the best predictor, together with a combination of measures, for the two classifications: C vs. SZ, SZ vs. FS. In addition, memory-guided saccades were consistently selected among the best two multimodal features for the classifications involving the control group (C vs. SZ or FS). Conclusions: Taken together, these results emphasize the importance of neurological soft signs and sensitive oculomotor parameters, especially memory-guided saccades. This classification provides promising avenues for improving early detection of / early intervention in psychosis.

Download Full-text

Classification of vulnerability levels using multivariate biomarkers in schizophrenia: a machine-learning approach

10.21203/rs.3.rs-15842/v1 ◽

2020 ◽

Author(s):

Simona Caldani ◽

François-Benoît Vialatte ◽

Aurélien Baelde ◽

Maria Pia Bucci ◽

Narjes Bendjemaa ◽

...

Keyword(s):

Machine Learning ◽

Error Rates ◽

Control Group ◽

Motor Impairments ◽

Neurological Soft Signs ◽

Neurodevelopmental Disease ◽

Multimodal Features ◽

Machine Learning Approach ◽

Reliable Classification ◽

Early Intervention In Psychosis

Abstract Background Schizophrenia is a heterogeneous neurodevelopmental disease involving cognitive and motor impairments. Motor dysfunctions, such as eye movements or neurological soft signs (NSS), are proposed as endophenotypic markers. Methods Machine-learning method applied on oculomotor performances using comprehensive testing with prosaccades, antisaccades, memory-guided saccade tasks and smooth pursuit, as well as NSS assessment, was used to discriminate patients with schizophrenia (SZ), full siblings of patients (FS) and healthy volunteers (C). Results The most reliable classification was between C and SZ, with only 15% and 12% of error rates for validation and test, whereas the SZ vs . FS classification provided the highest error rates (32% of error rate in both validation and test). Interestingly, NSS were selected as the best predictor, together with a combination of measures, for the two classifications: C vs . SZ, SZ vs . FS. In addition, memory-guided saccades were consistently selected among the best two multimodal features for the classifications involving the control group (C vs. SZ or FS). Conclusions Taken together, these results emphasize the importance of neurological soft signs and sensitive oculomotor parameters, especially memory-guided saccades. This classification provides promising avenues for improving early detection of / early intervention in psychosis.

Download Full-text

Reliability and data density in high capacity color barcodes

Computer Science and Information Systems ◽

10.2298/csis131218054q ◽

2014 ◽

Vol 11 (4) ◽

pp. 1595-1615 ◽

Cited By ~ 5

Author(s):

Marco Querini ◽

Giuseppe Italiano

Keyword(s):

Machine Learning ◽

Error Rate ◽

Clustering Algorithms ◽

High Capacity ◽

Error Rates ◽

Computational Time ◽

Support Vector ◽

Color Classification ◽

Computational Overhead ◽

Data Density

2D color barcodes have been introduced to obtain larger storage capabilities than traditional black and white barcodes. Unfortunately, the data density of color barcodes is substantially limited by the redundancy needed for correcting errors, which are due not only to geometric but also to chromatic distortions introduced by the printing and scanning process. The higher the expected error rate, the more redundancy is needed for avoiding failures in barcode reading, and thus, the lower the actual data density. Our work addresses this trade-off between reliability and data density in 2D color barcodes and aims at identifying the most effective algorithms, in terms of byte error rate and computational overhead, for decoding 2D color barcodes. In particular, we perform a thorough experimental study to identify the most suitable color classifiers for converting analog barcode cells to digital bit streams. To accomplish this task, we implemented a prototype capable of decoding 2D color barcodes by using different methods, including clustering algorithms and machine learning classifiers. We show that, even if state-of-art methods for color classification could be successfully used for decoding color barcodes in the desktop scenario, there is an emerging need for new color classification methods in the mobile scenario. In desktop scenarios, our experimental findings show that complex techniques, such as support vector machines, does not seem to pay off, as they do not achieve better accuracy in classifying color barcode cells. The lowest error rates are indeed obtained by means of clustering algorithms and probabilistic classifiers. From the computational viewpoint, classification with clustering seems to be the method of choice. In mobile scenarios, simple and efficient methods (in terms of computational time) such as the Euclidean and the K-means classifiers are not effective (in terms of error rate), while, more complex methods are effective but not efficient. Even if a few color barcode designs have been proposed in recent studies, to the best of our knowledge, there is no previous research that addresses a comparative and experimental analysis of clustering and machine learning methods for color classification in 2D color barcodes.

Download Full-text

The TVGH-NYCU Thal-Classifier: Development of a Machine-Learning Classifier for Differentiating Thalassemia and Non-Thalassemia Patients

Diagnostics ◽

10.3390/diagnostics11091725 ◽

2021 ◽

Vol 11 (9) ◽

pp. 1725

Author(s):

Yi-Kai Fu ◽

Hsueng-Mei Liu ◽

Li-Hsuan Lee ◽

Ying-Ju Chen ◽

Sheng-Hsuan Chien ◽

...

Keyword(s):

Machine Learning ◽

Error Rate ◽

Primary Care Physicians ◽

Microcytic Anemia ◽

Machine Learning Techniques ◽

Average Error ◽

Classification Error ◽

Support Vector ◽

Predictive Values ◽

Cell Counts

Thalassemia and iron deficiency are the most common etiologies for microcytic anemia and there are indices discriminating both from common laboratory simple automatic counters. In this study a new classifier for discriminating thalassemia and non-thalassemia microcytic anemia was generated via combination of exciting indices with machine-learning techniques. A total of 350 Taiwanese adult patients whose anemia diagnosis, complete blood cell counts, and hemoglobin gene profiles were retrospectively reviewed. Thirteen prior established indices were applied to current cohort and the sensitivity, specificity, positive and negative predictive values were calculated. A support vector machine (SVM) with Monte-Carlo cross-validation procedure was adopted to generate the classifier. The performance of our classifier was compared with original indices by calculating the average classification error rate and area under the curve (AUC) for the sampled datasets. The performance of this SVM model showed average AUC of 0.76 and average error rate of 0.26, which surpassed all other indices. In conclusion, we developed a convenient tool for primary-care physicians when deferential diagnosis contains thalassemia for the Taiwanese adult population. This approach needs to be validated in other studies or bigger database.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Supervised Machine Learning Methods and Hyperspectral Imaging Techniques Jointly Applied for Brain Cancer Classification

Sensors ◽

10.3390/s21113827 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3827

Author(s):

Gemma Urbanos ◽

Alberto Martín ◽

Guillermo Vázquez ◽

Marta Villanueva ◽

Manuel Villa ◽

...

Keyword(s):

Machine Learning ◽

Blood Vessel ◽

Hyperspectral Imaging ◽

Imaging Techniques ◽

Venous Blood ◽

Healthy Tissue ◽

Supervised Machine Learning ◽

Support Vector ◽

Arterial Blood

Hyperspectral imaging techniques (HSI) do not require contact with patients and are non-ionizing as well as non-invasive. As a consequence, they have been extensively applied in the medical field. HSI is being combined with machine learning (ML) processes to obtain models to assist in diagnosis. In particular, the combination of these techniques has proven to be a reliable aid in the differentiation of healthy and tumor tissue during brain tumor surgery. ML algorithms such as support vector machine (SVM), random forest (RF) and convolutional neural networks (CNN) are used to make predictions and provide in-vivo visualizations that may assist neurosurgeons in being more precise, hence reducing damages to healthy tissue. In this work, thirteen in-vivo hyperspectral images from twelve different patients with high-grade gliomas (grade III and IV) have been selected to train SVM, RF and CNN classifiers. Five different classes have been defined during the experiments: healthy tissue, tumor, venous blood vessel, arterial blood vessel and dura mater. Overall accuracy (OACC) results vary from 60% to 95% depending on the training conditions. Finally, as far as the contribution of each band to the OACC is concerned, the results obtained in this work are 3.81 times greater than those reported in the literature.

Download Full-text

Financial Context News Sentiment Analysis for the Lithuanian Language

Applied Sciences ◽

10.3390/app11104443 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4443

Author(s):

Rokas Štrimaitis ◽

Pavel Stefanovič ◽

Simona Ramanauskaitė ◽

Asta Slotkienė

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Experimental Investigations ◽

Support Vector ◽

Applied Machine Learning ◽

Bayes Algorithm ◽

Website Content

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).

Download Full-text

Optimizing machine learning models for granular NdFeB magnets by very fast simulated annealing

Scientific Reports ◽

10.1038/s41598-021-83315-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hyeon-Kyu Park ◽

Jae-Hyeok Lee ◽

Jehyun Lee ◽

Sang-Koog Kim

Keyword(s):

Machine Learning ◽

Simulated Annealing ◽

Permanent Magnets ◽

Supervised Machine Learning ◽

Support Vector ◽

Micromagnetic Simulations ◽

Ndfeb Magnets ◽

Average Grain Size ◽

Macroscopic Properties ◽

Very Fast Simulated Annealing

AbstractThe macroscopic properties of permanent magnets and the resultant performance required for real implementations are determined by the magnets’ microscopic features. However, earlier micromagnetic simulations and experimental studies required relatively a lot of work to gain any complete and comprehensive understanding of the relationships between magnets’ macroscopic properties and their microstructures. Here, by means of supervised learning, we predict reliable values of coercivity (μ0Hc) and maximum magnetic energy product (BHmax) of granular NdFeB magnets according to their microstructural attributes (e.g. inter-grain decoupling, average grain size, and misalignment of easy axes) based on numerical datasets obtained from micromagnetic simulations. We conducted several tests of a variety of supervised machine learning (ML) models including kernel ridge regression (KRR), support vector regression (SVR), and artificial neural network (ANN) regression. The hyper-parameters of these models were optimized by a very fast simulated annealing (VFSA) algorithm with an adaptive cooling schedule. In our datasets of randomly generated 1,000 polycrystalline NdFeB cuboids with different microstructural attributes, all of the models yielded similar results in predicting both μ0Hc and BHmax. Furthermore, some outliers, which deteriorated the normality of residuals in the prediction of BHmax, were detected and further analyzed. Based on all of our results, we can conclude that our ML approach combined with micromagnetic simulations provides a robust framework for optimal design of microstructures for high-performance NdFeB magnets.

Download Full-text

Prediction of CO2 Minimum Miscibility Pressure Using an Augmented Machine-Learning-Based Model

SPE Journal ◽

10.2118/200326-pa ◽

2021 ◽

pp. 1-13

Author(s):

Utkarsh Sinha ◽

Birol Dindoruk ◽

Mohamed Soliman

Keyword(s):

Machine Learning ◽

Phase Behavior ◽

Hybrid Method ◽

Gas Injection ◽

Supervised Machine Learning ◽

Design Parameters ◽

Support Vector ◽

Hydrocarbon Gases ◽

Minimum Miscibility Pressure ◽

Analytical Correlation

Summary Minimum miscibility pressure (MMP) is one of the key design parameters for gas injection projects. It is a physical parameter that is a measure of local displacement efficiency while subject to some constraints due to its definition. Also, the MMP value is used to tune compositional models along with proper fluid description constrained with other available basic phase behavior data, such as bubble point pressure and volumetric properties. In general, carbon dioxide (CO2) and hydrocarbon gases are the most common gases used for (or screened for) gas injection processes, and because of recent focus, they are used to screen for the coupling of CO2-sequestration and CO2-enhanced oil recovery (EOR) projects. Because the CO2/oil phase behavior is quite different than the hydrocarbon gas/oil phase behavior, researchers developed specialized correlations for CO2 or CO2-rich streams. Therefore, there is a need for a tool with expanded range capabilities for the estimation of MMP for CO2 gas streams. The only known and widely accepted measurement technique for MMP that is coherent with its formal definition is the use of a slimtube apparatus. However, the use of slimtube restricts the amount of data available, even though there are other alternative techniques presented over the last three decades, which all have various limitations (Dindoruk et al. 2021). Due to some of the complexities highlighted in Dindoruk et al. (2021) and time and resource requirements, there have been a number of correlations developed in the literature using mostly classical regression techniques with relatively sparse data using various combinations of limited input data (Cronquist 1978; Lee 1979; Yellig and Metcalfe 1980; Alston et al. 1985; Glaso 1985; Jaubert et al. 1998; Emera and Sarma 2005; Yuan et al. 2005; Ahmadi et al. 2010; Ahmadi and Johns 2011). In this paper, we present two separate approaches for the calculation of the MMP of an oil for CO2 injection: analytical correlation in which the correlation coefficients were tuned using linear support vector machines (SVMs) (Press et al. 2007; MathWorks 2020; RDocumentation 2020b; Cortes and Vapnik 1995) and using a hybrid method (i.e., superlearner model), which consists of the combination of random forest (RF) regression (Breiman 2001) and the proposed analytical correlation. Both models take the compositional analysis of oils up to heptane plus fraction, molecular weight of oil, and the reservoir temperature as input parameters. Based on statistical and data analysis techniques in combination with the help of corresponding crossplots, we showed that the performance of the final proposed method (hybrid method) is superior to all the leading correlations (Cronquist 1978; Lee 1979; Yellig and Metcalfe 1980; Alston et al. 1985; Glaso 1985; Emera and Sarma 2005; Yuan et al. 2005) and supervised machine-learning (Metcalfe 1982) methods considered in the literature (Altman 1992; Chambers and Hastie 1992; Chapelle and Vapnik 2000; Breiman 2001; Press et al. 2007; MathWorks 2020). The proposed model works for the widest spectrum of MMPs from 1,000 to 4,900 psia, which covers the entire range of oils within the scope of CO2 EOR based on the widely used screening criteria (Taber et al. 1997a, 1997b).

Download Full-text

Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning

F1000Research ◽

10.12688/f1000research.14048.1 ◽

2018 ◽

Vol 7 ◽

pp. 233

Author(s):

Jonathan Z.L. Zhao ◽

Eliseos J. Mucaki ◽

Peter K. Rogan

Keyword(s):

Machine Learning ◽

Ionizing Radiation ◽

Radiation Exposure ◽

Large Scale ◽

Nearest Neighbor ◽

Error Rates ◽

Support Vector ◽

Dose Estimation ◽

Gene Signatures ◽

Ionizing Radiation Exposure

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% (DDB2, PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% (DDB2, CD8A, TALDO1, PCNA, EIF4G2, LCN2, CDKN1A, PRKCH, ENO1, and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text