Genes and Mechanisms Associated With Experimentally Induced Bovine Respiratory Disease Identified With Supervised Machine Learning Methodology on Integrated Transcriptomic Datasets

Abstract Bovine respiratory disease (BRD) is a multifactorial disease involving complex host immune interactions shaped by pathogenic agents and environmental factors. Advancements in RNA sequencing and associated analytical methods are improving our understanding of host response related to BRD pathophysiology. Supervised machine learning (ML) approaches present one such method for analyzing new and previously published transcriptome data to identify novel genes and mechanisms. Our objective was to apply ML models to lung and immunological tissue datasets acquired from previous clinical BRD experiments to identify genes that classify disease with high accuracy. Raw mRNA sequencing reads from 151 bovine datasets (n=123 BRD, n=28 control) were downloaded from NCBI-GEO. Quality filtered reads were assembled in a HISAT2/Stringtie2 pipeline. Raw gene counts for ML analysis were normalized, transformed, and analyzed with MLSeq, utilizing six ML models. Cross-validation parameters (5-fold, repeated 10 times) were applied in a 70:30 training/testing ratio. Downstream analysis of genes identified by the top sparse classifiers for each etiological association was performed within WebGestalt and Reactome (FDR < 0.05). Support vector machines was routinely the top non-sparse classifier for predicting etiological disease versus sham control. Nearest shrunken centroid and Poisson linear discriminant analysis with power transformation could reliably classify IBR and BRSV with 100% accuracy. Genes identified in IBR and BRSV, but not BVDV, were related to type I interferon production and IL-8 secretion, specifically in lymphoid tissue and not lung. Genes identified in Mannheimia haemolytica infections were involved in activating classical and alternative pathways of complement. Novel findings, including expression of genes related to reduced mitochondrial oxygenation and ATP synthesis in consolidated lung tissue, were discovered. Genes identified in each analysis represent distinct genomic events relevant to understanding and predicting clinical BRD. The few genes shared across analyses may be reliably associated with clinical BRD. Our analysis demonstrates the utility of ML with published datasets for discovering functional information to support prediction and understanding BRD acquisition.

Download Full-text

Genes and regulatory mechanisms associated with experimentally-induced bovine respiratory disease identified using supervised machine learning methodology

Scientific Reports ◽

10.1038/s41598-021-02343-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Matthew A. Scott ◽

Amelia R. Woolums ◽

Cyprianna E. Swiderski ◽

Andy D. Perkins ◽

Bindu Nanduri

Keyword(s):

Machine Learning ◽

Respiratory Disease ◽

Lung Tissue ◽

Bovine Respiratory Disease ◽

Atp Synthesis ◽

Parameter Tuning ◽

Supervised Machine Learning ◽

Type I ◽

Linear Discriminant ◽

Validation Parameters

AbstractBovine respiratory disease (BRD) is a multifactorial disease involving complex host immune interactions shaped by pathogenic agents and environmental factors. Advancements in RNA sequencing and associated analytical methods are improving our understanding of host response related to BRD pathophysiology. Supervised machine learning (ML) approaches present one such method for analyzing new and previously published transcriptome data to identify novel disease-associated genes and mechanisms. Our objective was to apply ML models to lung and immunological tissue datasets acquired from previous clinical BRD experiments to identify genes that classify disease with high accuracy. Raw mRNA sequencing reads from 151 bovine datasets (n = 123 BRD, n = 28 control) were downloaded from NCBI-GEO. Quality filtered reads were assembled in a HISAT2/Stringtie2 pipeline. Raw gene counts for ML analysis were normalized, transformed, and analyzed with MLSeq, utilizing six ML models. Cross-validation parameters (fivefold, repeated 10 times) were applied to 70% of the compiled datasets for ML model training and parameter tuning; optimized ML models were tested with the remaining 30%. Downstream analysis of significant genes identified by the top ML models, based on classification accuracy for each etiological association, was performed within WebGestalt and Reactome (FDR ≤ 0.05). Nearest shrunken centroid and Poisson linear discriminant analysis with power transformation models identified 154 and 195 significant genes for IBR and BRSV, respectively; from these genes, the two ML models discriminated IBR and BRSV with 100% accuracy compared to sham controls. Significant genes classified by the top ML models in IBR (154) and BRSV (195), but not BVDV (74), were related to type I interferon production and IL-8 secretion, specifically in lymphoid tissue and not homogenized lung tissue. Genes identified in Mannheimia haemolytica infections (97) were involved in activating classical and alternative pathways of complement. Novel findings, including expression of genes related to reduced mitochondrial oxygenation and ATP synthesis in consolidated lung tissue, were discovered. Genes identified in each analysis represent distinct genomic events relevant to understanding and predicting clinical BRD. Our analysis demonstrates the utility of ML with published datasets for discovering functional information to support the prediction and understanding of clinical BRD.

Download Full-text

A Classification Approach for Predicting COVID-19 Patient Survival Outcome with Machine Learning Techniques

10.1101/2020.08.02.20129767 ◽

2020 ◽

Author(s):

Abdulhameed Ado Osi ◽

Hussaini Garba Dikko ◽

Mannir Abdu ◽

Auwalu Ibrahim ◽

Lawan Adamu Isma'il ◽

...

Keyword(s):

Machine Learning ◽

Survival Outcome ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

P Value ◽

Kappa Index ◽

Quality Health Care ◽

Linear Discriminant ◽

Learning Techniques

COVID-19 is an infectious disease discovered after the outbreak began in Wuhan, China, in December 2019. COVID-19 is still becoming an increasing global threat to public health. The virus has been escalated to many countries across the globe. This paper analyzed and compared the performance of three different supervised machine learning techniques; Linear Discriminant Analysis (LDA), Random Forest (RF), and Support Vector Machine (SVM) on COVID-19 dataset. The best level of accuracy between these three algorithms was determined by comparison of some metrics for assessing predictive performance such as accuracy, sensitivity, specificity, F-score, Kappa index, and ROC. From the analysis results, RF was found to be the best algorithm with 100% prediction accuracy in comparison with LDA and SVM with 95.2% and 90.9% respectively. Our analysis shows that out of these three classification models RF predicts COVID-19 patient's survival outcome with the highest accuracy. Chi-square test reveals that all the seven features except sex were significantly correlated with the COVID-19 patient's outcome (P-value < 0.005). Therefore, RF was recommended for COVID-19 patient outcome prediction that will help in early identification of possible sensitive cases for quick provision of quality health care, support and supervision.

Download Full-text

Classification of Sentiment of Reviews using Supervised Machine Learning Techniques

International Journal of Rough Sets and Data Analysis ◽

10.4018/ijrsda.2017010104 ◽

2017 ◽

Vol 4 (1) ◽

pp. 56-74 ◽

Cited By ~ 14

Author(s):

Abinash Tripathy ◽

Santanu Kumar Rath

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Performance Parameters ◽

Linear Discriminant ◽

Learning Techniques

Sentiment analysis helps to determine hidden intention of the concerned author of any topic and provides an evaluation report on the polarity of any document. The polarity may be positive, negative or neutral. It is observed that very often the data associated with the sentiment analysis consist of the feedback given by various specialists on any topic or product. Thus, the review may be categorized properly into any sort of class based on the polarity, in order to have a good knowledge about the product. This article proposes an approach to classify the review dataset made on basis of sentiment analysis into different polarity groups. Four machine learning algorithms viz., Naive Bayes (NB), Support Vector Machine (SVM), Random Forest, and Linear Discriminant Analysis (LDA) have been considered in this paper for classification process. The obtained result on values of accuracy of the algorithms are critically examined by using different performance parameters, applied on two different datasets.

Download Full-text

Classification of Sentiment of Reviews using Supervised Machine Learning Techniques

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch009 ◽

2020 ◽

pp. 143-163

Author(s):

Abinash Tripathy ◽

Santanu Kumar Rath

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Evaluation Report ◽

Linear Discriminant ◽

Learning Techniques

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Supervised Machine Learning Methods and Hyperspectral Imaging Techniques Jointly Applied for Brain Cancer Classification

Sensors ◽

10.3390/s21113827 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3827

Author(s):

Gemma Urbanos ◽

Alberto Martín ◽

Guillermo Vázquez ◽

Marta Villanueva ◽

Manuel Villa ◽

...

Keyword(s):

Machine Learning ◽

Blood Vessel ◽

Hyperspectral Imaging ◽

Imaging Techniques ◽

Venous Blood ◽

Healthy Tissue ◽

Supervised Machine Learning ◽

Support Vector ◽

Arterial Blood

Hyperspectral imaging techniques (HSI) do not require contact with patients and are non-ionizing as well as non-invasive. As a consequence, they have been extensively applied in the medical field. HSI is being combined with machine learning (ML) processes to obtain models to assist in diagnosis. In particular, the combination of these techniques has proven to be a reliable aid in the differentiation of healthy and tumor tissue during brain tumor surgery. ML algorithms such as support vector machine (SVM), random forest (RF) and convolutional neural networks (CNN) are used to make predictions and provide in-vivo visualizations that may assist neurosurgeons in being more precise, hence reducing damages to healthy tissue. In this work, thirteen in-vivo hyperspectral images from twelve different patients with high-grade gliomas (grade III and IV) have been selected to train SVM, RF and CNN classifiers. Five different classes have been defined during the experiments: healthy tissue, tumor, venous blood vessel, arterial blood vessel and dura mater. Overall accuracy (OACC) results vary from 60% to 95% depending on the training conditions. Finally, as far as the contribution of each band to the OACC is concerned, the results obtained in this work are 3.81 times greater than those reported in the literature.

Download Full-text

Financial Context News Sentiment Analysis for the Lithuanian Language

Applied Sciences ◽

10.3390/app11104443 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4443

Author(s):

Rokas Štrimaitis ◽

Pavel Stefanovič ◽

Simona Ramanauskaitė ◽

Asta Slotkienė

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Experimental Investigations ◽

Support Vector ◽

Applied Machine Learning ◽

Bayes Algorithm ◽

Website Content

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).

Download Full-text

Optimizing machine learning models for granular NdFeB magnets by very fast simulated annealing

Scientific Reports ◽

10.1038/s41598-021-83315-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hyeon-Kyu Park ◽

Jae-Hyeok Lee ◽

Jehyun Lee ◽

Sang-Koog Kim

Keyword(s):

Machine Learning ◽

Simulated Annealing ◽

Permanent Magnets ◽

Supervised Machine Learning ◽

Support Vector ◽

Micromagnetic Simulations ◽

Ndfeb Magnets ◽

Average Grain Size ◽

Macroscopic Properties ◽

Very Fast Simulated Annealing

AbstractThe macroscopic properties of permanent magnets and the resultant performance required for real implementations are determined by the magnets’ microscopic features. However, earlier micromagnetic simulations and experimental studies required relatively a lot of work to gain any complete and comprehensive understanding of the relationships between magnets’ macroscopic properties and their microstructures. Here, by means of supervised learning, we predict reliable values of coercivity (μ0Hc) and maximum magnetic energy product (BHmax) of granular NdFeB magnets according to their microstructural attributes (e.g. inter-grain decoupling, average grain size, and misalignment of easy axes) based on numerical datasets obtained from micromagnetic simulations. We conducted several tests of a variety of supervised machine learning (ML) models including kernel ridge regression (KRR), support vector regression (SVR), and artificial neural network (ANN) regression. The hyper-parameters of these models were optimized by a very fast simulated annealing (VFSA) algorithm with an adaptive cooling schedule. In our datasets of randomly generated 1,000 polycrystalline NdFeB cuboids with different microstructural attributes, all of the models yielded similar results in predicting both μ0Hc and BHmax. Furthermore, some outliers, which deteriorated the normality of residuals in the prediction of BHmax, were detected and further analyzed. Based on all of our results, we can conclude that our ML approach combined with micromagnetic simulations provides a robust framework for optimal design of microstructures for high-performance NdFeB magnets.

Download Full-text

Prediction of CO2 Minimum Miscibility Pressure Using an Augmented Machine-Learning-Based Model

SPE Journal ◽

10.2118/200326-pa ◽

2021 ◽

pp. 1-13

Author(s):

Utkarsh Sinha ◽

Birol Dindoruk ◽

Mohamed Soliman

Keyword(s):

Machine Learning ◽

Phase Behavior ◽

Hybrid Method ◽

Gas Injection ◽

Supervised Machine Learning ◽

Design Parameters ◽

Support Vector ◽

Hydrocarbon Gases ◽

Minimum Miscibility Pressure ◽

Analytical Correlation

Summary Minimum miscibility pressure (MMP) is one of the key design parameters for gas injection projects. It is a physical parameter that is a measure of local displacement efficiency while subject to some constraints due to its definition. Also, the MMP value is used to tune compositional models along with proper fluid description constrained with other available basic phase behavior data, such as bubble point pressure and volumetric properties. In general, carbon dioxide (CO2) and hydrocarbon gases are the most common gases used for (or screened for) gas injection processes, and because of recent focus, they are used to screen for the coupling of CO2-sequestration and CO2-enhanced oil recovery (EOR) projects. Because the CO2/oil phase behavior is quite different than the hydrocarbon gas/oil phase behavior, researchers developed specialized correlations for CO2 or CO2-rich streams. Therefore, there is a need for a tool with expanded range capabilities for the estimation of MMP for CO2 gas streams. The only known and widely accepted measurement technique for MMP that is coherent with its formal definition is the use of a slimtube apparatus. However, the use of slimtube restricts the amount of data available, even though there are other alternative techniques presented over the last three decades, which all have various limitations (Dindoruk et al. 2021). Due to some of the complexities highlighted in Dindoruk et al. (2021) and time and resource requirements, there have been a number of correlations developed in the literature using mostly classical regression techniques with relatively sparse data using various combinations of limited input data (Cronquist 1978; Lee 1979; Yellig and Metcalfe 1980; Alston et al. 1985; Glaso 1985; Jaubert et al. 1998; Emera and Sarma 2005; Yuan et al. 2005; Ahmadi et al. 2010; Ahmadi and Johns 2011). In this paper, we present two separate approaches for the calculation of the MMP of an oil for CO2 injection: analytical correlation in which the correlation coefficients were tuned using linear support vector machines (SVMs) (Press et al. 2007; MathWorks 2020; RDocumentation 2020b; Cortes and Vapnik 1995) and using a hybrid method (i.e., superlearner model), which consists of the combination of random forest (RF) regression (Breiman 2001) and the proposed analytical correlation. Both models take the compositional analysis of oils up to heptane plus fraction, molecular weight of oil, and the reservoir temperature as input parameters. Based on statistical and data analysis techniques in combination with the help of corresponding crossplots, we showed that the performance of the final proposed method (hybrid method) is superior to all the leading correlations (Cronquist 1978; Lee 1979; Yellig and Metcalfe 1980; Alston et al. 1985; Glaso 1985; Emera and Sarma 2005; Yuan et al. 2005) and supervised machine-learning (Metcalfe 1982) methods considered in the literature (Altman 1992; Chambers and Hastie 1992; Chapelle and Vapnik 2000; Breiman 2001; Press et al. 2007; MathWorks 2020). The proposed model works for the widest spectrum of MMPs from 1,000 to 4,900 psia, which covers the entire range of oils within the scope of CO2 EOR based on the widely used screening criteria (Taber et al. 1997a, 1997b).

Download Full-text