Generative tensor network classification model for supervised machine learning

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

The prototype device for non-invasive diagnosis of arteriovenous fistula condition using machine learning methods

Scientific Reports ◽

10.1038/s41598-020-72336-5 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Marcin Grochowina ◽

Lucyna Leniowska ◽

Agnieszka Gala-Błądzińska

Keyword(s):

Machine Learning ◽

Arteriovenous Fistula ◽

Low Cost ◽

Principal Component ◽

Special Kind ◽

Classification Model ◽

Supervised Machine Learning ◽

Signal Acquisition ◽

Non Invasive ◽

Prototype Device

Abstract Pattern recognition and automatic decision support methods provide significant advantages in the area of health protection. The aim of this work is to develop a low-cost tool for monitoring arteriovenous fistula (AVF) with the use of phono-angiography method. This article presents a developed and diagnostic device that implements classification algorithms to identify 38 patients with end stage renal disease, chronically hemodialysed using an AVF, at risk of vascular access stenosis. We report on the design, fabrication, and preliminary testing of a prototype device for non-invasive diagnosis which is very important for hemodialysed patients. The system includes three sub-modules: AVF signal acquisition, information processing and classification and a unit for presenting results. This is a non-invasive and inexpensive procedure for evaluating the sound pattern of bruit produced by AVF. With a special kind of head which has a greater sensitivity than conventional stethoscope, a sound signal from fistula was recorded. The proces of signal acquisition was performed by a dedicated software, written specifically for the purpose of our study. From the obtained phono-angiogram, 23 features were isolated for vectors used in a decision-making algorithm, including 6 features based on the waveform of time domain, and 17 features based on the frequency spectrum. Final definition of the feature vector composition was obtained by using several selection methods: the feature-class correlation, forward search, Principal Component Analysis and Joined-Pairs method. The supervised machine learning technique was then applied to develop the best classification model.

Download Full-text

Distinguishing Focal Cortical Dysplasia From Glioneuronal Tumors in Patients With Epilepsy by Machine Learning

Frontiers in Neurology ◽

10.3389/fneur.2020.548305 ◽

2020 ◽

Vol 11 ◽

Author(s):

Yi Guo ◽

Yushan Liu ◽

Wenjie Ming ◽

Zhongjin Wang ◽

Junming Zhu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Focal Cortical Dysplasia ◽

Cortical Dysplasia ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Seizure Onset ◽

Glioneuronal Tumors ◽

Patients With Epilepsy

Purpose: We are aiming to build a supervised machine learning-based classifier, in order to preoperatively distinguish focal cortical dysplasia (FCD) from glioneuronal tumors (GNTs) in patients with epilepsy.Methods: This retrospective study was comprised of 96 patients who underwent epilepsy surgery, with the final neuropathologic diagnosis of either an FCD or GNTs. Seven classical machine learning algorithms (i.e., Random Forest, SVM, Decision Tree, Logistic Regression, XGBoost, LightGBM, and CatBoost) were employed and trained by our dataset to get the classification model. Ten features [i.e., Gender, Past history, Age at seizure onset, Course of disease, Seizure type, Seizure frequency, Scalp EEG biomarkers, MRI features, Lesion location, Number of antiepileptic drug (AEDs)] were analyzed in our study.Results: We enrolled 56 patients with FCD and 40 patients with GNTs, which included 29 with gangliogliomas (GGs) and 11 with dysembryoplasic neuroepithelial tumors (DNTs). Our study demonstrated that the Random Forest-based machine learning model offered the best predictive performance on distinguishing the diagnosis of FCD from GNTs, with an F1-score of 0.9180 and AUC value of 0.9340. Furthermore, the most discriminative factor between FCD and GNTs was the feature “age at seizure onset” with the Chi-square value of 1,213.0, suggesting that patients who had a younger age at seizure onset were more likely to be diagnosed as FCD.Conclusion: The Random Forest-based machine learning classifier can accurately differentiate FCD from GNTs in patients with epilepsy before surgery. This might lead to improved clinician confidence in appropriate surgical planning and treatment outcomes.

Download Full-text

A Self-Supervised Machine Learning Approach for Objective Live Cell Segmentation and Analysis

10.21203/rs.3.rs-147010/v1 ◽

2021 ◽

Author(s):

Marc Raphael ◽

Michael Robitaille ◽

Jeff Byers ◽

Joseph Christodoulides

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Label Cell ◽

Live Cell ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Cell Segmentation ◽

Data Set

Abstract Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm’s initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery’s optical modality, magnification or cell type.

Download Full-text

A Self-Supervised Machine Learning Approach for Objective Live Cell Segmentation and Analysis

10.1101/2021.01.07.425773 ◽

2021 ◽

Author(s):

Michael C. Robitaille ◽

Jeff M. Byers ◽

Joseph A. Christodoulides ◽

Marc P. Raphael

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Label Cell ◽

Live Cell ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Cell Segmentation ◽

Data Set

Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm's initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery's optical modality, magnification or cell type.

Download Full-text

Ensemble machine learning approach for classification of IoT devices in smart home

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-020-01241-0 ◽

2021 ◽

Author(s):

Ivan Cvitić ◽

Dragan Peraković ◽

Marko Periša ◽

Brij Gupta

Keyword(s):

Machine Learning ◽

Smart Home ◽

Technological Development ◽

Model Performance ◽

Classification Model ◽

Supervised Machine Learning ◽

Positive Ratio ◽

Ensemble Machine Learning ◽

Industrial Iot ◽

Iot Devices

AbstractThe emergence of the Internet of Things (IoT) concept as a new direction of technological development raises new problems such as valid and timely identification of such devices, security vulnerabilities that can be exploited for malicious activities, and management of such devices. The communication of IoT devices generates traffic that has specific features and differences with respect to conventional devices. This research seeks to analyze the possibilities of applying such features for classifying devices, regardless of their functionality or purpose. This kind of classification is necessary for a dynamic and heterogeneous environment, such as a smart home where the number and types of devices grow daily. This research uses a total of 41 IoT devices. The logistic regression method enhanced by the concept of supervised machine learning (logitboost) was used for developing a classification model. Multiclass classification model was developed using 13 network traffic features generated by IoT devices. Research has shown that it is possible to classify devices into four previously defined classes with high performances and accuracy (99.79%) based on the traffic flow features of such devices. Model performance measures such as precision, F-measure, True Positive Ratio, False Positive Ratio and Kappa coefficient all show high results (0.997–0.999, 0.997–0.999, 0.997–0.999, 0–0.001 and 0.9973, respectively). Such a developed model can have its application as a foundation for monitoring and managing solutions of large and heterogeneous IoT environments such as Industrial IoT, smart home, and similar.

Download Full-text

Breast cancer prediction using an optimal machine learning technique for next generation sequences

Concurrent Engineering ◽

10.1177/1063293x21991808 ◽

2021 ◽

pp. 1063293X2199180

Author(s):

Babymol Kurian ◽

VL Jyothi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Decision Tree ◽

Classification Model ◽

Supervised Machine Learning ◽

Support Vector ◽

Next Generation ◽

Machine Learning Technique ◽

Cancer Prediction ◽

Learning Technique

A wide reach on cancer prediction and detection using Next Generation Sequencing (NGS) by the application of artificial intelligence is highly appreciated in the current scenario of the medical field. Next generation sequences were extracted from NCBI (National Centre for Biotechnology Information) gene repository. Sequences of normal Homo sapiens (Class 1), BRCA1 (Class 2) and BRCA2 (Class 3) were extracted for Machine Learning (ML) purpose. The total volume of datasets extracted for the process were 1580 in number under four categories of 50, 100, 150 and 200 sequences. The breast cancer prediction process was carried out in three major steps such as feature extraction, machine learning classification and performance evaluation. The features were extracted with sequences as input. Ten features of DNA sequences such as ORF (Open Reading Frame) count, individual nucleobase average count of A, T, C, G, AT and GC-content, AT/GC composition, G-quadruplex occurrence, MR (Mutation Rate) were extracted from three types of sequences for the classification process. The sequence type was also included as a target variable to the feature set with values 0, 1 and 2 for classes 1, 2 and 3 respectively. Nine various supervised machine learning techniques like LR (Logistic Regression statistical model), LDA (Linear Discriminant analysis model), k-NN (k nearest neighbours’ algorithm), DT (Decision tree technique), NB (Naive Bayes classifier), SVM (Support-Vector Machine algorithm), RF (Random Forest learning algorithm), AdaBoost (AB) and Gradient Boosting (GB) were employed on four various categories of datasets. Of all supervised models, decision tree machine learning technique performed most with maximum accuracy in classification of 94.03%. Classification model performance was evaluated using precision, recall, F1-score and support values wherein F1-score was most similar to the classification accuracy.

Download Full-text

Exploring the Colombian digital divide using Moodle logs through supervised learning

Interactive Technology and Smart Education ◽

10.1108/itse-03-2021-0052 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sergio Duban Morales Dussan ◽

Mauricio Leon ◽

Olmer Garcia-Bedoya ◽

Ixent Galpin

Keyword(s):

Machine Learning ◽

Digital Divide ◽

Metropolitan Areas ◽

Virtual Education ◽

Classification Model ◽

Supervised Machine Learning ◽

Student Interactions ◽

Content Type ◽

Industry Standard ◽

The Individual

Purpose This study aims to explore the digital divide between students living in metropolitan and non-metropolitan areas in the Antioquia region of Colombia. This is achieved by collecting data about student interactions from the Moodle learning management system (LMS), and subsequently applying supervised machine learning models to infer the gap between students in metropolitan and non-metropolitan areas. Design/methodology/approach This work uses the well-established Cross-Industry Standard Process for Data Mining methodology, which comprises six phases, viz., problem understanding, data understanding, data preparation, modelling, evaluation and implementation. In this case, student data was collected from the Moodle platform from the Antioquia campus of the UNAD distance learning university. Findings The digital divide is evident in the classification model when observing differences in variables such as the number of accesses to the LMS, the total time spent and the number of distinct IP addresses used, as well as the number of system modification events. Originality/value This study provides conclusions regarding the problems students in virtual education may face as a result of the digital divide in Colombia which have become increasingly visible since the implementation of machine learning methodologies on LMS such as Moodle. However, these practices may be replicated in any virtual educational context and furthermore be extended to enable personalisation of various aspects of the Moodle platform to meet the individual needs of students.

Download Full-text

Machine Learning Assisted Signal Analysis in Acoustic Microscopy for Non-Destructive Defect Identification

ISTFA 2019: Conference Proceedings from the 45th International Symposium for Testing and Failure Analysis ◽

10.31399/asm.cp.istfa2019p0035 ◽

2019 ◽

Author(s):

Michael Kögel ◽

Sebastian Brand ◽

Frank Altmann

Keyword(s):

Machine Learning ◽

Failure Analysis ◽

Human Error ◽

Flip Chip ◽

Acoustic Microscopy ◽

Classification Model ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scanning Acoustic Microscopy ◽

Chip Technology

Abstract Signal processing and data interpretation in scanning acoustic microscopy is often challenging and based on the subjective decisions of the operator, making the defect classification results prone to human error. The aim of this work was to combine unsupervised and supervised machine learning techniques for feature extraction and image segmentation that allows automated classification and predictive failure analysis on scanning acoustic microscopy (SAM) data. In the first part, conspicuous signal components of the time-domain echo signals and their weighting matrices are extracted using independent component analysis. The applicability was shown by the assisted separation of signal patterns to intact and defective bumps from a dataset of a CPU-device manufactured in flip-chip technology. The high success-rate was verified by physical cross-sectioning and high-resolution imaging. In the second part, the before mentioned signal separation was employed to generate a labeled dataset for training and finetuning of a classification model based on a one-dimensional convolutional neural network. The learning model was sensitive to critical features of the given task without human intervention for classification between intact bumps, defective bumps and background. This approach was evaluated on two individual test samples that contained multiple defects in the solder bumps and has been verified by physical inspection. The verification of the classification model reached an accuracy of more than 97% and was successfully applied to an unknown sample which demonstrates the high potential of machine learning concepts for further developments in assisted failure analysis.

Download Full-text

Fault Diagnosis of Belt Conveyor Based on Support Vector Machine and Grey Wolf Optimization

Mathematical Problems in Engineering ◽

10.1155/2020/1367078 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Xiangong Li ◽

Yu Li ◽

Yuzhi Zhang ◽

Feng Liu ◽

Yu Fang

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Fault Diagnosis ◽

Classification Model ◽

Supervised Machine Learning ◽

Support Vector ◽

Grey Wolf Optimizer ◽

Diagnostic Model ◽

Grey Wolf ◽

Belt Conveyor

Belt conveyor is widely used for material transportation over both short and long distances nowadays while the failure of a single component may cause fateful consequences. Accordingly, the use of machine learning in timely fault diagnosis is an efficient way to ensure the safe operation of belt conveyors. The support vector machine is a powerful supervised machine learning algorithm for classification in fault diagnosis. Before the classification, the principal component analysis is used for data reduction according to the varieties of features. To optimize the parameters of the support vector machine, this paper presents a grey wolf optimizer approach. The diagnostic model is applied to an underground mine belt conveyor transportation system fault diagnosis on the basis of monitoring data collected by sensors of mine internet of things. The results show that the recognition accuracy of the fault is up to 97.22% according to the mine site dataset. It is proved that the combined classification model has a better performance in fault intelligent diagnosis.

Download Full-text