scholarly journals A reference library for assigning protein subcellular localizations by image-based machine learning

2020 ◽  
Vol 219 (3) ◽  
Author(s):  
Wiebke Schormann ◽  
Santosh Hariharan ◽  
David W. Andrews

Confocal micrographs of EGFP fusion proteins localized at key cell organelles in murine and human cells were acquired for use as subcellular localization landmarks. For each of the respective 789,011 and 523,319 optically validated cell images, morphology and statistical features were measured. Machine learning algorithms using these features permit automated assignment of the localization of other proteins and dyes in both cell types with very high accuracy. Automated assignment of subcellular localizations for model tail-anchored (TA) proteins with randomly mutated C-terminal targeting sequences allowed the discovery of motifs responsible for targeting to mitochondria, endoplasmic reticulum, and the late secretory pathway. Analysis of directed mutants enabled refinement of these motifs and characterization of protein distributions in within cellular subcompartments.

2021 ◽  
Author(s):  
Rammohan Shukla ◽  
Nicholas D Henkel ◽  
Marissa A Smail ◽  
Xiaojun Wu ◽  
Heather A Enright ◽  
...  

We probed a transcriptomic dataset of pilocarpine-induced TLE using various ontological, machine-learning, and systems-biology approaches. We showed that, underneath the complex and penetrant changes, moderate-to-subtle upregulated homeostatic and downregulated synaptic changes associated with the dentate gyrus and hippocampal subfields could not only predict TLE but various other forms of epilepsy. At the cellular level, pyramidal neurons and interneurons showed disparate changes, whereas the proportion of non-neuronal cells increased steadily. A probabilistic Bayesian network demonstrated an aberrant and oscillating physiological interaction between oligodendrocytes and interneurons in driving seizures. Validating the Bayesian inference, we showed that the cell types driving the seizures were associated with known antiepileptic and epileptic drugs. These findings provide predictive biomarkers of epilepsy, insights into the cellular connections and causal changes associated with TLE, and a drug discovery method focusing on these events.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 45-46
Author(s):  
Christian Pohlkamp ◽  
Kapil Jhalani ◽  
Niroshan Nadarajah ◽  
Inseok Heo ◽  
William Wetton ◽  
...  

Background: Cytomorphology is the gold standard for quick assessment of peripheral blood and bone marrow samples in hematological neoplasms. It is a broadly-accepted method for orchestrating more specific diagnostics including immunophenotyping or genetics. Inter-/intra-observer-reproducibility of single cell classification is only 75 to 90%. Only a limited number of cells (100 - 500 cells/smear) is read in a time-consuming procedure. Machine learning (ML) is more reliable where human skills are limited, i.e. in handling large amounts of data or images. We here tested ML to differentiate peripheral blood leukocytes in a high throughput hematology laboratory. Aim: To establish an ML-based cell classifier capable of identifying healthy and pathologic cells in digitalized peripheral blood smear scans at an accuracy competitive with or outperforming human expert level. Methods: We selected >2,600 smears out of our unique archive of > 250,000 peripheral blood smears from hematological neoplasms. Depending on quality, we scanned up to 1,000 single cell images per smear. For image acquisition, a Metafer Scanning System (Zeiss Axio Imager.Z2 microscope, automatic slide feeder and automatic oiling device) from MetaSystems (Altlussheim, GER) was used. Areas of interest were defined by pre-scan in 10x magnification followed by high resolution scan in 40x to generate cell images for analysis. Average capture times for 300/500 cells were 3:43/4:37 min We set up a supervised ML-learning model using colour images (144x144 pixels) as input, outputting predicted probabilities of 21 predefined classes. We used ImageNet-pretrained Xception as our base model. We trained, evaluated and deployed the model using Amazon SageMaker on a subset of 82,974 images randomly selected from 514,183 cells captured and labelled for this study. 20 different cell types and one garbage class were classified. We included cell type categories referring to the critical importance of detecting rare leukemia subtypes (e.g. APL). Numbers of images from respective 21 classes ranged from 1,830 to 14,909 (median: 2,945). Minority classes were up-sampledto handle imbalances. Each picture was labelled by highly skilled technicians (median years practicing in this laboratory: 5) and two independent hematologists (median years at microscope: 20). Results: On a separate test set of 8,297 cells, our classifier was able to predict any of the five cell types occurring in the peripheral blood of healthy individuals (PMN, lymphocytes, monocytes, eosinophils, basophils) at very high median accuracy (97.0%) Median prediction accuracy of 15 rare or pathological cell types was 91.3%. For six critical pathological cell forms (myeloblasts, atypical/bilobulated promyelocytes in APL/APLv, hairy cells, lymphoma cells,plasma cells), median accuracy was 93.4% (sensitivity 93.8%). We saw a very high "T98 accuracy" for these cell types (98.5%) which is the accuracy of cell type predictions with prediction probability >0.98 (achieved in 2231/2417 cases), implicating that critical cells predicted with probability <0.98 should be flagged for human expert validation with priority. For all 21 classes median accuracy was 91.7%. Accuracy was lower for cells representing consecutive steps of maturation, e.g. promyelo-/myelo-/metamyelocytes, reproducing inconsistencies from the human-built phenotypic classification system (s.Fig.). Conclusions: We demonstrate an automated workflow using automatic microscopic cell capturing and ML-driven cell differentiation in samples of hematologic patients. Reproducibility, accuracy, sensitivity and specificity are above 90%, for many cell types above 98%. By flagging suspicious cells for humanvalidation, this tool can support even experienced hematology professionals, especially in detecting rare cell types. Given an appropriate scanning speed, it clearly outperforms human investigators in terms of examination time and number of differentiated cells. An ML-based intelligence can make its skills accessible to hematology laboratories on site or after upload of scanned cell images, independent of time/location. A cloud-based infrastructure is available. A prospective head to head challenge between ML-based classifier and human experts comparing sensitivity and accuracy for detection of all cell classes in peripheral blood will be tested to proof suitability for routine use (NCT 4466059). Figure Disclosures Heo: AWS: Current Employment. Wetton:AWS: Current Employment. Drescher:MetaSystems: Current Employment. Hänselmann:MetaSystems: Current Employment. Lörch:MetaSystems: Current equity holder in private company.


2021 ◽  
Author(s):  
Ali Nadernezhad ◽  
Jürgen Groll

With the continuous growth of extrusion bioprinting techniques, ink formulations based on rheology modifiers are becoming increasingly popular, as they enable 3D printing of non-printable biologically-favored materials. However, benchmarking and characterization of such systems are inherently complicated due to the variety of rheology modifiers and differences in mechanisms of inducing printability. This study tries to explain induced printability in formulations by incorporating machine learning algorithms that describe the underlying basis for decision-making in classifying a printable formulation. For this purpose, a library of rheological data and printability scores for 180 different formulations of hyaluronic acid solutions with varying molecular weights and concentrations and three rheology modifiers were produced. A feature screening methodology was applied to collect and separate the impactful features, which consisted of physically interpretable and easily measurable properties of formulations. In the final step, all relevant features influencing the model’s output were analyzed by advanced yet explainable statistical methods. The outcome provides a guideline for designing new formulations based on data-driven correlations from multiple systems.


2020 ◽  
Vol 12 (23) ◽  
pp. 3926
Author(s):  
Martina Deur ◽  
Mateo Gašparović ◽  
Ivan Balenović

Spatially explicit information on tree species composition is important for both the forest management and conservation sectors. In combination with machine learning algorithms, very high-resolution satellite imagery may provide an effective solution to reduce the need for labor-intensive and time-consuming field-based surveys. In this study, we evaluated the possibility of using multispectral WorldView-3 (WV-3) satellite imagery for the classification of three main tree species (Quercus robur L., Carpinus betulus L., and Alnus glutinosa (L.) Geartn.) in a lowland, mixed deciduous forest in central Croatia. The pixel-based supervised classification was performed using two machine learning algorithms: random forest (RF) and support vector machine (SVM). Additionally, the contribution of gray level cooccurrence matrix (GLCM) texture features from WV-3 imagery in tree species classification was evaluated. Principal component analysis confirmed GLCM variance to be the most significant texture feature. Of the 373 visually interpreted reference polygons, 237 were used as training polygons and 136 were used as validation polygons. The validation results show relatively high overall accuracy (85%) for tree species classification based solely on WV-3 spectral characteristics and the RF classification approach. As expected, an improvement in classification accuracy was achieved by a combination of spectral and textural features. With the additional use of GLCM variance, the overall accuracy improved by 10% and 7% for RF and SVM classification approaches, respectively.


2020 ◽  
Vol 1 (2) ◽  
pp. 1-4
Author(s):  
Priyam Guha ◽  
Abhishek Mukherjee ◽  
Abhishek Verma

This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives. This research paper deals with using supervised machine learning algorithms to detect authenticity of bank notes. In this research we were successful in achieving very high accuracy (of the order of 99%) by applying some data preprocessing tricks and then running the processed data on supervised learning algorithms like SVM, Decision Trees, Logistic Regression, KNN. We then proceed to analyze the misclassified points. We examine the confusion matrix to find out which algorithms had more number of false positives and which algorithm had more number of False negatives.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
André F. M. Batista ◽  
Carmen S. G. Diniz ◽  
Eliana A. Bonilha ◽  
Ichiro Kawachi ◽  
Alexandre D. P. Chiavegatto Filho

Abstract Background Recent decreases in neonatal mortality have been slower than expected for most countries. This study aims to predict the risk of neonatal mortality using only data routinely available from birth records in the largest city of the Americas. Methods A probabilistic linkage of every birth record occurring in the municipality of São Paulo, Brazil, between 2012 e 2017 was performed with the death records from 2012 to 2018 (1,202,843 births and 447,687 deaths), and a total of 7282 neonatal deaths were identified (a neonatal mortality rate of 6.46 per 1000 live births). Births from 2012 and 2016 (N = 941,308; or 83.44% of the total) were used to train five different machine learning algorithms, while births occurring in 2017 (N = 186,854; or 16.56% of the total) were used to test their predictive performance on new unseen data. Results The best performance was obtained by the extreme gradient boosting trees (XGBoost) algorithm, with a very high AUC of 0.97 and F1-score of 0.55. The 5% births with the highest predicted risk of neonatal death included more than 90% of the actual neonatal deaths. On the other hand, there were no deaths among the 5% births with the lowest predicted risk. There were no significant differences in predictive performance for vulnerable subgroups. The use of a smaller number of variables (WHO’s five minimum perinatal indicators) decreased overall performance but the results still remained high (AUC of 0.91). With the addition of only three more variables, we achieved the same predictive performance (AUC of 0.97) as using all the 23 variables originally available from the Brazilian birth records. Conclusion Machine learning algorithms were able to identify with very high predictive performance the neonatal mortality risk of newborns using only routinely collected data.


Author(s):  
Lagerstrand Kerstin ◽  
Hebelka Hanna ◽  
Brisby Helerna

Abstract Purpose It is suggested that non-specific low back pain (LBP) can be related to nerve ingrowth along granulation tissue in disc fissures, extending into the outer layers of the annulus fibrosus. Present study aimed to investigate if machine-learning modelling of magnetic resonance imaging (MRI) data can classify such fissures as well as pain, provoked by discography, with plausible accuracy and precision. Methods The study was based on previously collected data from 30 LBP patients (age = 26–64 years, 11 males). Pressure-controlled discography was performed in 86 discs with pain-positive discograms, categorized as concordant pain-response at a pressure ≤ 50 psi and for each patient one negative control disc. The CT-discograms were used for categorization of fissures. MRI values and standard deviations were extracted from the midsagittal part and from 5 different sub-regions of the discs. Machine-learning algorithms were trained on the extracted MRI markers to classify discs with fissures extending into the outer annulus or not, as well as to classify discs as painful or non-painful. Results Discs with outer annular fissures were classified in MRI with very high precision (mean of 10 repeated testings: 99%) and accuracy (mean: 97%) using machine-learning modelling, but the pain model only demonstrated moderate diagnostic accuracy (mean accuracy: 69%; precision: 71%). Conclusion The present study showed that machine-learning modelling based on MRI can classify outer annular fissures with very high diagnostic accuracy and, hence, enable individualized diagnostics. However, the model only demonstrated moderate diagnostic accuracy regarding pain that could be assigned to either a non-sufficient model or the used pain reference.


Sign in / Sign up

Export Citation Format

Share Document