Using machine learning to identify clotted specimens in coagulation testing

Abstract Objectives A sample with a blood clot may produce an inaccurate outcome in coagulation testing, which may mislead clinicians into making improper clinical decisions. Currently, there is no efficient method to automatically detect clots. This study demonstrates the feasibility of utilizing machine learning (ML) to identify clotted specimens. Methods The results of coagulation testing with 192 clotted samples and 2,889 no-clot-detected (NCD) samples were retrospectively retrieved from a laboratory information system to form the training dataset and testing dataset. Standard and momentum backpropagation neural networks (BPNNs) were trained and validated using the training dataset with a five-fold cross-validation method. The predictive performances of the models were then assessed based on the testing dataset. Results Our results demonstrated that there were intrinsic distinctions between the clotted and NCD specimens regarding differences in the testing results and the separation of the groups (clotted and NCD) in the t-SNE analysis. The standard and momentum BPNNs could identify the sample status (clotted and NCD) with areas under the ROC curves of 0.966 (95% CI, 0.958–0.974) and 0.971 (95% CI, 0.9641–0.9784), respectively. Conclusions Here, we have described the application of ML algorithms in identifying the sample status based on the results of coagulation testing. This approach provides a proof-of-concept application of ML algorithms to evaluate the sample quality, and it has the potential to facilitate clinical laboratory automation.

Download Full-text

Prediction of Tumor Shrinkage Pattern to Neoadjuvant Chemotherapy Using a Multiparametric MRI-Based Machine Learning Model in Patients With Breast Cancer

Frontiers in Bioengineering and Biotechnology ◽

10.3389/fbioe.2021.662749 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yuhong Huang ◽

Wenben Chen ◽

Xiaoling Zhang ◽

Shaofu He ◽

Nan Shao ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cross Validation ◽

Learning Model ◽

Training Dataset ◽

Tumor Shrinkage ◽

Clinicopathologic Characteristics ◽

Testing Dataset ◽

Machine Learning Model ◽

Fold Cross Validation

Aim: After neoadjuvant chemotherapy (NACT), tumor shrinkage pattern is a more reasonable outcome to decide a possible breast-conserving surgery (BCS) than pathological complete response (pCR). The aim of this article was to establish a machine learning model combining radiomics features from multiparametric MRI (mpMRI) and clinicopathologic characteristics, for early prediction of tumor shrinkage pattern prior to NACT in breast cancer.Materials and Methods: This study included 199 patients with breast cancer who successfully completed NACT and underwent following breast surgery. For each patient, 4,198 radiomics features were extracted from the segmented 3D regions of interest (ROI) in mpMRI sequences such as T1-weighted dynamic contrast-enhanced imaging (T1-DCE), fat-suppressed T2-weighted imaging (T2WI), and apparent diffusion coefficient (ADC) map. The feature selection and supervised machine learning algorithms were used to identify the predictors correlated with tumor shrinkage pattern as follows: (1) reducing the feature dimension by using ANOVA and the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation, (2) splitting the dataset into a training dataset and testing dataset, and constructing prediction models using 12 classification algorithms, and (3) assessing the model performance through an area under the curve (AUC), accuracy, sensitivity, and specificity. We also compared the most discriminative model in different molecular subtypes of breast cancer.Results: The Multilayer Perception (MLP) neural network achieved higher AUC and accuracy than other classifiers. The radiomics model achieved a mean AUC of 0.975 (accuracy = 0.912) on the training dataset and 0.900 (accuracy = 0.828) on the testing dataset with 30-round 6-fold cross-validation. When incorporating clinicopathologic characteristics, the mean AUC was 0.985 (accuracy = 0.930) on the training dataset and 0.939 (accuracy = 0.870) on the testing dataset. The model further achieved good AUC on the testing dataset with 30-round 5-fold cross-validation in three molecular subtypes of breast cancer as following: (1) HR+/HER2–: 0.901 (accuracy = 0.816), (2) HER2+: 0.940 (accuracy = 0.865), and (3) TN: 0.837 (accuracy = 0.811).Conclusions: It is feasible that our machine learning model combining radiomics features and clinical characteristics could provide a potential tool to predict tumor shrinkage patterns prior to NACT. Our prediction model will be valuable in guiding NACT and surgical treatment in breast cancer.

Download Full-text

High Accurate and a Variant of k-fold Cross Validation Technique for Predicting the Decision Tree Classifier Accuracy

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8403.0110321 ◽

2021 ◽

Vol 10 (2) ◽

pp. 105-110

Author(s):

D. Mabuni ◽

S. Aquter Babu

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Classification Accuracy ◽

Cross Validation ◽

Training Dataset ◽

Decision Tree Classification ◽

Testing Dataset ◽

Tree Classifier ◽

Validation Technique ◽

Fold Cross Validation

In machine learning data usage is the most important criterion than the logic of the program. With very big and moderate sized datasets it is possible to obtain robust and high classification accuracies but not with small and very small sized datasets. In particular only large training datasets are potential datasets for producing robust decision tree classification results. The classification results obtained by using only one training and one testing dataset pair are not reliable. Cross validation technique uses many random folds of the same dataset for training and validation. In order to obtain reliable and statistically correct classification results there is a need to apply the same algorithm on different pairs of training and validation datasets. To overcome the problem of the usage of only a single training dataset and a single testing dataset the existing k-fold cross validation technique uses cross validation plan for obtaining increased decision tree classification accuracy results. In this paper a new cross validation technique called prime fold is proposed and it is experimentally tested thoroughly and then verified correctly using many bench mark UCI machine learning datasets. It is observed that the prime fold based decision tree classification accuracy results obtained after experimentation are far better than the existing techniques of finding decision tree classification accuracies.

Download Full-text

Deep learning based DNA:RNA triplex forming potential prediction

10.21203/rs.3.rs-41662/v2 ◽

2020 ◽

Author(s):

Yu ZHANG ◽

Yahui Long ◽

Chee Keong Kwoh

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Roc Curves ◽

Triplex Dna ◽

Triplex Formation ◽

Machine Learning Model ◽

Non Coding Rnas ◽

High Level ◽

Integrated Program ◽

Fold Cross Validation

Abstract Background: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: i) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and ii) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data.Results: In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the 5-fold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. Conclusions: The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.

Download Full-text

Deep learning based DNA:RNA triplex forming potential prediction

10.21203/rs.3.rs-41662/v1 ◽

2020 ◽

Author(s):

Yu ZHANG ◽

Yahui Long ◽

Chee Keong Kwoh

Keyword(s):

Machine Learning ◽

Deep Neural Networks ◽

Cross Validation ◽

Roc Curves ◽

Triplex Formation ◽

Machine Learning Model ◽

Non Coding Rnas ◽

High Level ◽

Integrated Program ◽

Fold Cross Validation

Abstract Background Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: i) they identify a large number of triplex forming lncRNAs, but the limited number of experimental verified triplex forming lncRNA indicate that maybe not all of them can from triplex in practice, and ii) their prediction only consider the theoretical relationship while lacking the features from the experimentally verified data. Results In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex forming lncRNAs and DNA sites based on the experimentally verified data, where their high-level features are learned by the deep neural networks. In the 5-fold cross validation, its average values of Area Under the ROC curves and PRC curves for triplex forming lncRNA and DNA sites predictions are 0.9949 and 0.9999, 0.8775 and 0.9692, respectively. Besides, we also briefly summarized the cis and trans targeting of triplexes lncRNAs. Conclusions The TriplexFPP can predict the most likely triplex forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities, and predict the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

A proof of concept for machine learning-based virtual knapping using neural networks

Scientific Reports ◽

10.1038/s41598-021-98755-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jordy Didier Orellana Figueroa ◽

Jonathan Scott Reeves ◽

Shannon P. McPherron ◽

Claudio Tennie

Keyword(s):

Machine Learning ◽

Raw Materials ◽

Training Sample ◽

Stone Tool ◽

Proof Of Concept ◽

Surface Information ◽

Archaeological Data ◽

Testing Dataset ◽

Intact Core ◽

Key Variables

AbstractPrehistoric stone tools are an important source of evidence for the study of human behavioural and cognitive evolution. Archaeologists use insights from the experimental replication of lithics to understand phenomena such as the behaviours and cognitive capacities required to manufacture them. However, such experiments can require large amounts of time and raw materials, and achieving sufficient control of key variables can be difficult. A computer program able to accurately simulate stone tool production would make lithic experimentation faster, more accessible, reproducible, less biased, and may lead to reliable insights into the factors that structure the archaeological record. We present here a proof of concept for a machine learning-based virtual knapping framework capable of quickly and accurately predicting flake removals from 3D cores using a conditional adversarial neural network (CGAN). We programmatically generated a testing dataset of standardised 3D cores with flakes knapped from them. After training, the CGAN accurately predicted the length, volume, width, and shape of these flake removals using the intact core surface information alone. This demonstrates the feasibility of machine learning for investigating lithic production virtually. With a larger training sample and validation against archaeological data, virtual knapping could enable fast, cheap, and highly-reproducible virtual lithic experimentation.

Download Full-text

Emotion Detection using Social Media and Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36117 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 4491-4494

Author(s):

Mr. Bhavar Shivam S.

Keyword(s):

Machine Learning ◽

Social Networking ◽

Social Networking Sites ◽

Training Dataset ◽

Emotion Detection ◽

Pos Tagging ◽

Testing Dataset ◽

Svm Algorithm ◽

The Right ◽

Negative Sentiment

Today we do a lot of things online from shopping to data sharing on social networking sites. Social networking (SNS) is good for releasing stress and depression by sharing one’s thoughts. Thus, emotion detection has become a hot trend to day. But there is a problem in analyzing emotions on a SNS like twitter as it generates lakhs of tweets each day and it is hard to keep track of the emotion behind each tweet as it is impossible for a human being to read and decide the emotions behind tweets. So, to help understand behind the texts in a SNS site we thought of designing a project which will keep track of the tweets and predict the right emotion behind the tweets whether they have a positive or a negative sentiment behind them. This thought of project can be achieved by a integration of SNS with NLP and machine learning together. For SNS we will use Twitter as it generates a lot of data which is accessible freely using an API. First, we will enter a keyword and fetch tweets from the twitter. Then stop words will be removed from these tweets using NLTK stop words database. Then the tweets will be passed for POS tagging and only right form of grammatical words will be kept and others will be removed. Then we create a training dataset with two types positive and negative. Then SVM algorithm will be trained using this training dataset. Then each tweet will be passed to the SVM as testing dataset which in turn will return classification of each tweet as a whole in two classes positive and negative. Thus, our application will be helpful in recognizing emotion behind a tweet.

Download Full-text

Deep learning based DNA:RNA triplex forming potential prediction

BMC Bioinformatics ◽

10.1186/s12859-020-03864-0 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Yu Zhang ◽

Yahui Long ◽

Chee Keong Kwoh

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Cross Validation ◽

Roc Curves ◽

Triplex Dna ◽

Triplex Formation ◽

Machine Learning Model ◽

Non Coding Rnas ◽

High Level ◽

Integrated Program

Abstract Background Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and (2) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data. Results In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the fivefold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. Conclusions The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.

Download Full-text

Clinical risk factors and predictive tool of bacteremia in patients with cirrhosis

Journal of International Medical Research ◽

10.1177/0300060520919220 ◽

2020 ◽

Vol 48 (5) ◽

pp. 030006052091922

Author(s):

Qiao Yang ◽

Xian Zhong Jiang ◽

Yong Fen Zhu ◽

Fang Fang Lv

Keyword(s):

Risk Factors ◽

Risk Assessment ◽

Logistic Regression ◽

Cross Validation ◽

Bloodstream Infections ◽

Roc Curves ◽

Predictive Tool ◽

Reactive Protein ◽

Significant Difference ◽

Fold Cross Validation

Objective We aimed to analyze the risk factors and to establish a predictive tool for the occurrence of bloodstream infections (BSI) in patients with cirrhosis. Methods A total of 2888 patients with cirrhosis were retrospectively included. Multivariate analysis for risk factors of BSI were tested using logistic regression. Multivariate logistic regression was validated using five-fold cross-validation. Results Variables that were independently associated with incidence of BSI were white blood cell count (odds ratio [OR] = 1.094, 95% confidence interval [CI] 1.063–1.127)], C-reactive protein (OR = 1.005, 95% CI 1.002–1.008), total bilirubin (OR = 1.003, 95% CI 1.002–1.004), and previous antimicrobial exposure (OR = 4.556, 95% CI 3.369–6.160); albumin (OR = 0.904, 95% CI 0.883–0.926), platelet count (OR = 0.996, 95% CI 0.994–0.998), and serum creatinine (OR = 0.989, 95% CI 0.985–0.994) were associated with lower odds of BSI. The area under receiver operating characteristic (ROC) curve of the risk assessment scale was 0.850, and its sensitivity and specificity were 0.762 and 0.801, respectively. There was no significant difference between the ROC curves of cross-validation and risk assessment. Conclusions We developed a predictive tool for BSI in patients with cirrhosis, which could help with early identification of such episodes at admission, to improve outcome in these patients.

Download Full-text

Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription

BioMed Research International ◽

10.1155/2019/6847685 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14

Author(s):

Shilun Yang ◽

Yanjia Shen ◽

Wendan Lu ◽

Yinglin Yang ◽

Haigang Wang ◽

...

Keyword(s):

Machine Learning ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Cross Validation ◽

Bayesian Models ◽

Machine Learning Algorithms ◽

Therapeutic Effects ◽

Test Set ◽

Screening Experiments ◽

Fold Cross Validation

Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.

Download Full-text