Machine learning for cell classification and neighborhood analysis in glioma tissue

Multiplexed and spatially resolved single-cell analyses that intend to study tissue heterogeneity and cell organization invariably face as a first step the challenge of cell classification. Accuracy and reproducibility are important for the downstream process of counting cells, quantifying cell-cell interactions, and extracting information on disease-specific localized cell niches. Novel staining techniques make it possible to visualize and quantify large numbers of cell-specific molecular markers in parallel. However, due to variations in sample handling and artefacts from staining and scanning, cells of the same type may present different marker profiles both within and across samples. We address multiplexed immunofluorescence data from tissue microarrays of low grade gliomas and present a methodology using two different machine learning architectures and features insensitive to illumination to perform cell classification. The fully automated cell classification provides a measure of confidence for the decision and requires a comparably small annotated dataset for training, which can be created using freely available tools. Using the proposed method, we reached an accuracy of 83.1% on cell classification without the need for standardization of samples. Using our confidence measure, cells with low-confidence classifications could be excluded, pushing the classification accuracy to 94.5%. Next, we used the cell classification results to search for cell niches with an unsupervised learning approach based on graph neural networks. We show that the approach can re-detect specialized tissue niches in previously published data, and that our proposed cell classification leads to niche definitions that may be relevant for sub-groups of glioma, if applied to larger datasets.

Download Full-text

Machine Learning Classification of Low-grade and High-grade Chondrosarcomas Based on MRI-based Texture Analysis

10.1055/s-0039-1692575 ◽

2019 ◽

Author(s):

S. Gitto ◽

D. Albano ◽

V. Chianca ◽

R. Cuocolo ◽

L. Ugga ◽

...

Keyword(s):

Machine Learning ◽

Texture Analysis ◽

Low Grade ◽

High Grade ◽

Machine Learning Classification

Download Full-text

Data Mining-based Financial Statement Fraud Detection: Systematic Literature Review and Meta-analysis to Estimate Data Sample Mapping of Fraudulent Companies Against Non-fraudulent Companies

Global Business Review ◽

10.1177/0972150920984857 ◽

2021 ◽

pp. 097215092098485

Author(s):

Sonika Gupta ◽

Sushil Kumar Mehta

Keyword(s):

Machine Learning ◽

Data Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Classification Accuracy ◽

Meta Analysis ◽

Financial Statement ◽

Research Articles ◽

Financial Statement Fraud ◽

Data Mining Techniques

Data mining techniques have proven quite effective not only in detecting financial statement frauds but also in discovering other financial crimes, such as credit card frauds, loan and security frauds, corporate frauds, bank and insurance frauds, etc. Classification of data mining techniques, in recent years, has been accepted as one of the most credible methodologies for the detection of symptoms of financial statement frauds through scanning the published financial statements of companies. The retrieved literature that has used data mining classification techniques can be broadly categorized on the basis of the type of technique applied, as statistical techniques and machine learning techniques. The biggest challenge in executing the classification process using data mining techniques lies in collecting the data sample of fraudulent companies and mapping the sample of fraudulent companies against non-fraudulent companies. In this article, a systematic literature review (SLR) of studies from the area of financial statement fraud detection has been conducted. The review has considered research articles published between 1995 and 2020. Further, a meta-analysis has been performed to establish the effect of data sample mapping of fraudulent companies against non-fraudulent companies on the classification methods through comparing the overall classification accuracy reported in the literature. The retrieved literature indicates that a fraudulent sample can either be equally paired with non-fraudulent sample (1:1 data mapping) or be unequally mapped using 1:many ratio to increase the sample size proportionally. Based on the meta-analysis of the research articles, it can be concluded that machine learning approaches, in comparison to statistical approaches, can achieve better classification accuracy, particularly when the availability of sample data is low. High classification accuracy can be obtained with even a 1:1 mapping data set using machine learning classification approaches.

Download Full-text

Amide proton transfer weighted (APTw) imaging based radiomics allows for the differentiation of gliomas from metastases

Scientific Reports ◽

10.1038/s41598-021-85168-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Elisabeth Sartoretti ◽

Thomas Sartoretti ◽

Michael Wyss ◽

Carolin Reischauer ◽

Luuk van Smoorenburg ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Brain Tumors ◽

Proton Transfer ◽

Multilayer Perceptron ◽

Random Forest Classifier ◽

Amide Proton ◽

Low Grade ◽

Who Grade ◽

Amide Proton Transfer

AbstractWe sought to evaluate the utility of radiomics for Amide Proton Transfer weighted (APTw) imaging by assessing its value in differentiating brain metastases from high- and low grade glial brain tumors. We retrospectively identified 48 treatment-naïve patients (10 WHO grade 2, 1 WHO grade 3, 10 WHO grade 4 primary glial brain tumors and 27 metastases) with either primary glial brain tumors or metastases who had undergone APTw MR imaging. After image analysis with radiomics feature extraction and post-processing, machine learning algorithms (multilayer perceptron machine learning algorithm; random forest classifier) with stratified tenfold cross validation were trained on features and were used to differentiate the brain neoplasms. The multilayer perceptron achieved an AUC of 0.836 (receiver operating characteristic curve) in differentiating primary glial brain tumors from metastases. The random forest classifier achieved an AUC of 0.868 in differentiating WHO grade 4 from WHO grade 2/3 primary glial brain tumors. For the differentiation of WHO grade 4 tumors from grade 2/3 tumors and metastases an average AUC of 0.797 was achieved. Our results indicate that the use of radiomics for APTw imaging is feasible and the differentiation of primary glial brain tumors from metastases is achievable with a high degree of accuracy.

Download Full-text

Towards Robust Representations of Spatial Networks Using Graph Neural Networks

Applied Sciences ◽

10.3390/app11156918 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6918

Author(s):

Chidubem Iddianozie ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Model Performance ◽

Network Models ◽

Data Representation ◽

Spatial Networks ◽

Neural Network Models ◽

Improve Model ◽

Graph Neural Networks ◽

Spatial Entities ◽

Improve Model Performance

The effectiveness of a machine learning model is impacted by the data representation used. Consequently, it is crucial to investigate robust representations for efficient machine learning methods. In this paper, we explore the link between data representations and model performance for inference tasks on spatial networks. We argue that representations which explicitly encode the relations between spatial entities would improve model performance. Specifically, we consider homogeneous and heterogeneous representations of spatial networks. We recognise that the expressive nature of the heterogeneous representation may benefit spatial networks and could improve model performance on certain tasks. Thus, we carry out an empirical study using Graph Neural Network models for two inference tasks on spatial networks. Our results demonstrate that heterogeneous representations improves model performance for down-stream inference tasks on spatial networks.

Download Full-text

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text

Learning deep features for dead and living breast cancer cell classification without staining

Scientific Reports ◽

10.1038/s41598-021-89895-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gisela Pattarone ◽

Laura Acion ◽

Marina Simian ◽

Emmanuel Iarussi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Drug Treatment ◽

Cancer Cells ◽

Cancer Cell ◽

Breast Cancer Cell ◽

Cancer Biology ◽

Breast Cancer Cells ◽

Bright Field ◽

Cell Classification

AbstractAutomated cell classification in cancer biology is a challenging topic in computer vision and machine learning research. Breast cancer is the most common malignancy in women that usually involves phenotypically diverse populations of breast cancer cells and an heterogeneous stroma. In recent years, automated microscopy technologies are allowing the study of live cells over extended periods of time, simplifying the task of compiling large image databases. For instance, there have been several studies oriented towards building machine learning systems capable of automatically classifying images of different cell types (i.e. motor neurons, stem cells). In this work we were interested in classifying breast cancer cells as live or dead, based on a set of automatically retrieved morphological characteristics using image processing techniques. Our hypothesis is that live-dead classification can be performed without any staining and using only bright-field images as input. We tackled this problem using the JIMT-1 breast cancer cell line that grows as an adherent monolayer. First, a vast image set composed by JIMT-1 human breast cancer cells that had been exposed to a chemotherapeutic drug treatment (doxorubicin and paclitaxel) or vehicle control was compiled. Next, several classifiers were trained based on well-known convolutional neural networks (CNN) backbones to perform supervised classification using labels obtained from fluorescence microscopy images associated with each bright-field image. Model performances were evaluated and compared on a large number of bright-field images. The best model reached an AUC = 0.941 for classifying breast cancer cells without treatment. Furthermore, it reached AUC = 0.978 when classifying breast cancer cells under drug treatment. Our results highlight the potential of machine learning and computational image analysis to build new diagnosis tools that benefit the biomedical field by reducing cost, time, and stimulating work reproducibility. More importantly, we analyzed the way our classifiers clusterize bright-field images in the learned high-dimensional embedding and linked these groups to salient visual characteristics in live-dead cell biology observed by trained experts.

Download Full-text

Long-term follow-up after colorectal endoscopic submucosal dissection in 182 cases

Endoscopy International Open ◽

10.1055/a-1321-1271 ◽

2021 ◽

Vol 09 (02) ◽

pp. E258-E262

Author(s):

Christian Suchy ◽

Moritz Berger ◽

Ingo Steinbrück ◽

Tsuneo Oyama ◽

Naohisa Yahagi ◽

...

Keyword(s):

Case Series ◽

En Bloc Resection ◽

Bloc Resection ◽

Published Data ◽

Low Grade ◽

Recurrence Rates ◽

En Bloc ◽

Long Term Follow Up

Abstract Background and study aims We previously reported a case series of our first 182 colorectal endoscopic submucosal dissections (ESDs). In the initial series, 155 ESDs had been technically feasible, with 137 en bloc resections and 97 en bloc resections with free margins (R0). Here, we present long-term follow-up data, with particular emphasis on cases where either en bloc resection was not achieved or en bloc resection resulted in positive margins (R1). Patients and methods Between September 2012 and October 2015, we performed 182 consecutive ESD procedures in 178 patients (median size 41.0 ± 17.4 mm; localization rectum vs. proximal rectum 63 vs. 119). Data on follow-up were obtained from our endoscopy database and from referring physicians. Results Of the initial cohort, 11 patients underwent surgery; follow-up data were available for 141 of the remaining 171 cases (82,5 %) with a median follow-up of 2.43 years (range 0.15–6.53). Recurrent adenoma was observed in 8 patients (n = 2 after margin positive en bloc ESD; n = 6 after fragmented resection). Recurrence rates were lower after en bloc resection, irrespective of involved margins (1.8 vs. 18,2 %; P < 0.01). All recurrences were low-grade adenomas and could be managed endoscopically. Conclusions The rate of recurrence is low after en bloc ESD, in particular if a one-piece resection can be achieved. Recurrence after fragmented resection is comparable to published data on piecemeal mucosal resection.

Download Full-text

Real-Time AI-Based Informational Decision-Making Support System Utilizing Dynamic Text Sources

Applied Sciences ◽

10.3390/app11136237 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6237

Author(s):

Azharul Islam ◽

KyungHi Chang

Keyword(s):

Machine Learning ◽

Decision Making ◽

Random Forest ◽

Support System ◽

Classification Accuracy ◽

Short Term Memory ◽

Learning Algorithm ◽

Unstructured Data ◽

Stochastic Gradient Descent ◽

Decision Making Support

Unstructured data from the internet constitute large sources of information, which need to be formatted in a user-friendly way. This research develops a model that classifies unstructured data from data mining into labeled data, and builds an informational and decision-making support system (DMSS). We often have assortments of information collected by mining data from various sources, where the key challenge is to extract valuable information. We observe substantial classification accuracy enhancement for our datasets with both machine learning and deep learning algorithms. The highest classification accuracy (99% in training, 96% in testing) was achieved from a Covid corpus which is processed by using a long short-term memory (LSTM). Furthermore, we conducted tests on large datasets relevant to the Disaster corpus, with an LSTM classification accuracy of 98%. In addition, random forest (RF), a machine learning algorithm, provides a reasonable 84% accuracy. This research’s main objective is to increase the application’s robustness by integrating intelligence into the developed DMSS, which provides insight into the user’s intent, despite dealing with a noisy dataset. Our designed model selects the random forest and stochastic gradient descent (SGD) algorithms’ F1 score, where the RF method outperforms by improving accuracy by 2% (to 83% from 81%) compared with a conventional method.

Download Full-text

The Classification of Skateboarding Tricks : A Transfer Learning and Machine Learning Approach

Mekatronika ◽

10.15282/mekatronika.v2i2.6683 ◽

2020 ◽

Vol 2 (2) ◽

pp. 1-12

Author(s):

Muhammad Nur Aiman Shapiee ◽

Muhammad Ar Rahim Ibrahim ◽

Muhammad Amirul Abdullah ◽

Rabiu Muazu Musa ◽

Noor Azuan Abu Osman ◽

...

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Olympic Games ◽

Learning Approach ◽

K Nearest Neighbor ◽

Test Dataset ◽

Machine Learning Approach ◽

Competitive Games

The skateboarding scene has arrived at new statures, particularly with its first appearance at the now delayed Tokyo Summer Olympic Games. Hence, attributable to the size of the game in such competitive games, progressed creative appraisal approaches have progressively increased due consideration by pertinent partners, particularly with the enthusiasm of a more goal-based assessment. This study purposes for classifying skateboarding tricks, specifically Frontside 180, Kickflip, Ollie, Nollie Front Shove-it, and Pop Shove-it over the integration of image processing, Trasnfer Learning (TL) to feature extraction enhanced with tradisional Machine Learning (ML) classifier. A male skateboarder performed five tricks every sort of trick consistently and the YI Action camera captured the movement by a range of 1.26 m. Then, the image dataset were features built and extricated by means of three TL models, and afterward in this manner arranged to utilize by k-Nearest Neighbor (k-NN) classifier. The perception via the initial experiments showed, the MobileNet, NASNetMobile, and NASNetLarge coupled with optimized k-NN classifiers attain a classification accuracy (CA) of 95%, 92% and 90%, respectively on the test dataset. Besides, the result evident from the robustness evaluation showed the MobileNet+k-NN pipeline is more robust as it could provide a decent average CA than other pipelines. It would be demonstrated that the suggested study could characterize the skateboard tricks sufficiently and could, over the long haul, uphold judges decided for giving progressively objective-based decision.

Download Full-text

Machine learning for psychiatric patient triaging: an investigation of cascading classifiers

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy109 ◽

2018 ◽

Vol 25 (11) ◽

pp. 1481-1487 ◽

Cited By ~ 1

Author(s):

Vivek Kumar Singh ◽

Utkarsh Shrivastava ◽

Lina Bouayad ◽

Balaji Padmanabhan ◽

Anna Ialynytchev ◽

...

Keyword(s):

Machine Learning ◽

Psychiatric Patient ◽

Classification Accuracy ◽

Psychiatric Patients ◽

Patient Records ◽

Classification Technique ◽

Machine Learning Approach ◽

The One ◽

Unique Dataset

Abstract Objective Develop an approach, One-class-at-a-time, for triaging psychiatric patients using machine learning on textual patient records. Our approach aims to automate the triaging process and reduce expert effort while providing high classification reliability. Materials and Methods The One-class-at-a-time approach is a multistage cascading classification technique that achieves higher triage classification accuracy compared to traditional multiclass classifiers through 1) classifying one class at a time (or stage), and 2) identification and application of the highest accuracy classifier at each stage. The approach was evaluated using a unique dataset of 433 psychiatric patient records with a triage class label provided by “I2B2 challenge,” a recent competition in the medical informatics community. Results The One-class-at-a-time cascading classifier outperformed state-of-the-art classification techniques with overall classification accuracy of 77% among 4 classes, exceeding accuracies of existing multiclass classifiers. The approach also enabled highly accurate classification of individual classes—the severe and mild with 85% accuracy, moderate with 64% accuracy, and absent with 60% accuracy. Discussion The triaging of psychiatric cases is a challenging problem due to the lack of clear guidelines and protocols. Our work presents a machine learning approach using psychiatric records for triaging patients based on their severity condition. Conclusion The One-class-at-a-time cascading classifier can be used as a decision aid to reduce triaging effort of physicians and nurses, while providing a unique opportunity to involve experts at each stage to reduce false positive and further improve the system’s accuracy.

Download Full-text