Machine-learning with a small training set for classification of quantitative phase images of cancer cells (Conference Presentation)

Author(s):  
Natan T. Shaked
2020 ◽  
Author(s):  
L. Sheneman ◽  
G. Stephanopoulos ◽  
A. E. Vasdekis

AbstractWe report the application of supervised machine learning to the automated classification of lipid droplets in label-free, quantitative-phase images. By comparing various machine learning methods commonly used in biomedical imaging and remote sensing, we found convolutional neural networks to outperform others, both quantitatively and qualitatively. We describe our imaging approach, all machine learning methods that we implemented, and their performance in computational requirements, training resource needs, and accuracy. Overall, our results indicate that quantitative-phase imaging coupled to machine learning enables accurate lipid droplet classification in single living cells. As such, the present paradigm presents an excellent alternative of the more common fluorescent and Raman imaging modalities by enabling label-free, ultra-low phototoxicity and deeper insight into the thermodynamics of metabolism of single cells.Author SummaryRecently, quantitative-phase imaging (QPI) has demonstrated the ability to elucidate novel parameters of cellular physiology and metabolism without the need for fluorescent staining. Here, we apply label-free, low photo-toxicity QPI to yeast cells in order to identify lipid droplets (LDs), an important organelle with key implications in human health and biofuel development. Because QPI yields low specificity, we explore the use of modern machine learning methods to rapidly identify intracellular LDs with high discriminatory power and accuracy. In recent years, machine learning has demonstrated exceptional abilities to recognize and segment objects in biomedical imaging, remote sensing, and other areas. Trained machine learning classifiers can be combined with QPI within high-throughput analysis pipelines, allowing for efficient and accurate identification and quantification of cellular components. Non-invasive, accurate and high-throughput classification of these organelles will accelerate research and improve our understanding of cellular functions with beneficial applications in biofuels, biomedicine, and more.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249196
Author(s):  
Luke Sheneman ◽  
Gregory Stephanopoulos ◽  
Andreas E. Vasdekis

We report the application of supervised machine learning to the automated classification of lipid droplets in label-free, quantitative-phase images. By comparing various machine learning methods commonly used in biomedical imaging and remote sensing, we found convolutional neural networks to outperform others, both quantitatively and qualitatively. We describe our imaging approach, all implemented machine learning methods, and their performance with respect to computational efficiency, required training resources, and relative method performance measured across multiple metrics. Overall, our results indicate that quantitative-phase imaging coupled to machine learning enables accurate lipid droplet classification in single living cells. As such, the present paradigm presents an excellent alternative of the more common fluorescent and Raman imaging modalities by enabling label-free, ultra-low phototoxicity, and deeper insight into the thermodynamics of metabolism of single cells.


Author(s):  
K Sooknunan ◽  
M Lochner ◽  
Bruce A Bassett ◽  
H V Peiris ◽  
R Fender ◽  
...  

Abstract With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data challenge and rapidly classify newly detected transients. We present a multiwavelength classification algorithm consisting of three steps: (1) interpolation and augmentation of the data using Gaussian processes; (2) feature extraction using wavelets; (3) classification with random forests. Augmentation provides improved performance at test time by balancing the classes and adding diversity into the training set. In the first application of machine learning to the classification of real radio transient data, we apply our technique to the Green Bank Interferometer and other radio light curves. We find we are able to accurately classify most of the eleven classes of radio variables and transients after just eight hours of observations, achieving an overall test accuracy of 78%. We fully investigate the impact of the small sample size of 82 publicly available light curves and use data augmentation techniques to mitigate the effect. We also show that on a significantly larger simulated representative training set that the algorithm achieves an overall accuracy of 97%, illustrating that the method is likely to provide excellent performance on future surveys. Finally, we demonstrate the effectiveness of simultaneous multiwavelength observations by showing how incorporating just one optical data point into the analysis improves the accuracy of the worst performing class by 19%.


2020 ◽  
Vol 25 (02) ◽  
pp. 1 ◽  
Author(s):  
Van K. Lam ◽  
Thanh C. Nguyen ◽  
Vy Bui ◽  
Byung Min Chung ◽  
Lin-Ching Chang ◽  
...  

2015 ◽  
Vol 9s1 ◽  
pp. CMC.S18746 ◽  
Author(s):  
Amparo Alonso-Betanzos ◽  
Verónica Bolón-Canedo ◽  
Guy R. Heyndrickx ◽  
Peter L.M. Kerkhof

Background Heart failure (HF) manifests as at least two subtypes. The current paradigm distinguishes the two by using both the metric ejection fraction (EF) and a constraint for end-diastolic volume. About half of all HF patients exhibit preserved EF. In contrast, the classical type of HF shows a reduced EF. Common practice sets the cut-off point often at or near EF = 50%, thus defining a linear divider. However, a rationale for this safe choice is lacking, while the assumption regarding applicability of strict linearity has not been justified. Additionally, some studies opt for eliminating patients from consideration for HF if 40 < EF < 50% (gray zone). Thus, there is a need for documented classification guidelines, solving gray zone ambiguity and formulating crisp delineation of transitions between phenotypes. Methods Machine learning (ML) models are applied to classify HF subtypes within the ventricular volume domain, rather than by the single use of EF. Various ML models, both unsupervised and supervised, are employed to establish a foundation for classification. Data regarding 48 HF patients are employed as training set for subsequent classification of Monte Carlo–generated surrogate HF patients ( n = 403). Next, we map consequences when EF cut-off differs from 50% (as proposed for women) and analyze HF candidates not covered by current rules. Results The training set yields best results for the Support Vector Machine method (test error 4.06%), covers the gray zone, and other clinically relevant HF candidates. End-systolic volume (ESV) emerges as a logical discriminator rather than EF as in the prevailing paradigm. Conclusions Selected ML models offer promise for classifying HF patients (including the gray zone), when driven by ventricular volume data. ML analysis indicates that ESV has a role in the development of guidelines to parse HF subtypes. The documented curvilinear relationship between EF and ESV suggests that the assumption concerning a linear EF divider may not be of general utility over the complete clinically relevant range.


2021 ◽  
Vol 13 (16) ◽  
pp. 3176
Author(s):  
Beata Hejmanowska ◽  
Piotr Kramarczyk ◽  
Ewa Głowienka ◽  
Sławomir Mikrut

The study presents the analysis of the possible use of limited number of the Sentinel-2 and Sentinel-1 to check if crop declarations that the EU farmers submit to receive subsidies are true. The declarations used in the research were randomly divided into two independent sets (training and test). Based on the training set, supervised classification of both single images and their combinations was performed using random forest algorithm in SNAP (ESA) and our own Python scripts. A comparative accuracy analysis was performed on the basis of two forms of confusion matrix (full confusion matrix commonly used in remote sensing and binary confusion matrix used in machine learning) and various accuracy metrics (overall accuracy, accuracy, specificity, sensitivity, etc.). The highest overall accuracy (81%) was obtained in the simultaneous classification of multitemporal images (three Sentinel-2 and one Sentinel-1). An unexpectedly high accuracy (79%) was achieved in the classification of one Sentinel-2 image at the end of May 2018. Noteworthy is the fact that the accuracy of the random forest method trained on the entire training set is equal 80% while using the sampling method ca. 50%. Based on the analysis of various accuracy metrics, it can be concluded that the metrics used in machine learning, for example: specificity and accuracy, are always higher then the overall accuracy. These metrics should be used with caution, because unlike the overall accuracy, to calculate these metrics, not only true positives but also false positives are used as positive results, giving the impression of higher accuracy. Correct calculation of overall accuracy values is essential for comparative analyzes. Reporting the mean accuracy value for the classes as overall accuracy gives a false impression of high accuracy. In our case, the difference was 10–16% for the validation data, and 25–45% for the test data.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Guorong Wu ◽  
Zichen Liu ◽  
Xuhui Chen

This paper presents a new method to recognize human activities based on weighted classification for the features extracted by human body. Towards this end, new features depend on weight taken from image or video used in proposed descriptor. Human pose plays an important role in extracted features; then these features are used as the weight input with classifier. We use machine learning during two steps of training and testing images of standard dataset that can be used during benchmarking the system. Unlike previous methods that need size or length of shapes mainly to represent the cues when machine learning is used to recognize human activities, accurate experimental results coming from appropriate segments of the human body proved the worthiness of proposed method. Twelve activities are used in challenging of availability comparison with dataset to demonstrate our method. The results show that we achieved 87.3% in training set, while in testing set, we achieved 94% in terms of precision.


Sign in / Sign up

Export Citation Format

Share Document