scholarly journals The Active Segmentation Platform for Microscopic Image Classification and Segmentation

2021 ◽  
Vol 11 (12) ◽  
pp. 1645
Author(s):  
Sumit K. Vohra ◽  
Dimiter Prodanov

Image segmentation still represents an active area of research since no universal solution can be identified. Traditional image segmentation algorithms are problem-specific and limited in scope. On the other hand, machine learning offers an alternative paradigm where predefined features are combined into different classifiers, providing pixel-level classification and segmentation. However, machine learning only can not address the question as to which features are appropriate for a certain classification problem. The article presents an automated image segmentation and classification platform, called Active Segmentation, which is based on ImageJ. The platform integrates expert domain knowledge, providing partial ground truth, with geometrical feature extraction based on multi-scale signal processing combined with machine learning. The approach in image segmentation is exemplified on the ISBI 2012 image segmentation challenge data set. As a second application we demonstrate whole image classification functionality based on the same principles. The approach is exemplified using the HeLa and HEp-2 data sets. Obtained results indicate that feature space enrichment properly balanced with feature selection functionality can achieve performance comparable to deep learning approaches. In summary, differential geometry can substantially improve the outcome of machine learning since it can enrich the underlying feature space with new geometrical invariant objects.

Author(s):  
S. Prasanthi ◽  
S.Durga Bhavani ◽  
T. Sobha Rani ◽  
Raju S. Bapi

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.


2013 ◽  
pp. 937-947
Author(s):  
S. Prasanthi ◽  
S.Durga Bhavani ◽  
T. Sobha Rani ◽  
Raju S. Bapi

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.


2020 ◽  
Author(s):  
Adam Pond ◽  
Seongwon Hwang ◽  
Berta Verd ◽  
Benjamin Steventon

AbstractMachine learning approaches are becoming increasingly widespread and are now present in most areas of research. Their recent surge can be explained in part due to our ability to generate and store enormous amounts of data with which to train these models. The requirement for large training sets is also responsible for limiting further potential applications of machine learning, particularly in fields where data tend to be scarce such as developmental biology. However, recent research seems to indicate that machine learning and Big Data can sometimes be decoupled to train models with modest amounts of data. In this work we set out to train a CNN-based classifier to stage zebrafish tail buds at four different stages of development using small information-rich data sets. Our results show that two and three dimensional convolutional neural networks can be trained to stage developing zebrafish tail buds based on both morphological and gene expression confocal microscopy images, achieving in each case up to 100% test accuracy scores. Importantly, we show that high accuracy can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a convolutional neural net. Furthermore, our classifier shows that it is possible to stage isolated embryonic structures without the need to refer to classic developmental landmarks in the whole embryo, which will be particularly useful to stage 3D culture in vitro systems such as organoids. We hope that this work will provide a proof of principle that will help dispel the myth that large data set sizes are always required to train CNNs, and encourage researchers in fields where data are scarce to also apply ML approaches.Author summaryThe application of machine learning approaches currently hinges on the availability of large data sets to train the models with. However, recent research has shown that large data sets might not always be required. In this work we set out to see whether we could use small confocal microscopy image data sets to train a convolutional neural network (CNN) to stage zebrafish tail buds at four different stages in their development. We found that high test accuracies can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a CNN. This work also shows that we can robustly stage the embryonic development of isolated structures, without the need to refer back to landmarks in the tail bud. This constitutes an important methodological advance for staging organoids and other 3D culture in vitro systems. This work proves that prohibitively large data sets are not always required to train CNNs, and we hope will encourage others to apply the power of machine learning to their areas of study even if data are scarce.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Maria El Abbassi ◽  
Jan Overbeck ◽  
Oliver Braun ◽  
Michel Calame ◽  
Herre S. J. van der Zant ◽  
...  

AbstractUnsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific disciplines and is particularly useful for applications without a priori knowledge of the data structure. Here, we introduce an approach for unsupervised data classification of any dataset consisting of a series of univariate measurements. It is therefore ideally suited for a wide range of measurement types. We apply it to the field of nanoelectronics and spectroscopy to identify meaningful structures in data sets. We also provide guidelines for the estimation of the optimum number of clusters. In addition, we have performed an extensive benchmark of novel and existing machine learning approaches and observe significant performance differences. Careful selection of the feature space construction method and clustering algorithms for a specific measurement type can therefore greatly improve classification accuracies.


2014 ◽  
Vol 2014 ◽  
pp. 1-17 ◽  
Author(s):  
Carlos López-Franco ◽  
Luis Villavicencio ◽  
Nancy Arana-Daniel ◽  
Alma Y. Alanis

Image classification is a process that depends on the descriptor used to represent an object. To create such descriptors we use object models with rich information of the distribution of points. The object model stage is improved with an optimization process by spreading the point that conforms the mesh. In this paper, particle swarm optimization (PSO) is used to improve the model generation, while for the classification problem a support vector machine (SVM) is used. In order to measure the performance of the proposed method a group of objects from a public RGB-D object data set has been used. Experimental results show that our approach improves the distribution on the feature space of the model, which allows to reduce the number of support vectors obtained in the training process.


2021 ◽  
Vol 9 (2) ◽  
pp. 157
Author(s):  
Xi Yu ◽  
Bing Ouyang ◽  
Jose C. Principe

Deep neural networks provide remarkable performances on supervised learning tasks with extensive collections of labeled data. However, creating such large well-annotated data sets requires a considerable amount of resources, time and effort, especially for underwater images data sets such as corals and marine animals. Therefore, the overreliance on labels is one of the main obstacles for widespread applications of deep learning methods. In order to overcome this need for large annotated dataset, this paper proposes a label-efficient deep learning framework for image segmentation using only very sparse point-supervision. Our approach employs a latent Dirichlet allocation (LDA) with spatial coherence on feature space to iteratively generate pseudo labels. The method requires, as an initial condition, a Wide Residual Network (WRN) trained with sparse labels and mutual information constraints. The proposed method is evaluated on the sparsely labeled coral image data set collected from the Pulley Ridge region in the Gulf of Mexico. Experiments show that our method can improve image segmentation performance against sparsely labeled samples and achieves better results compared with other semi-supervised approaches.


2019 ◽  
Vol 7 (3) ◽  
pp. SE175-SE187 ◽  
Author(s):  
Yazeed Alaudah ◽  
Patrycja Michałowicz ◽  
Motaz Alfarraj ◽  
Ghassan AlRegib

The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely, the absence of large publicly available annotated data sets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes or use different train and test splits. In addition, it is common for papers that apply machine learning for facies classification to not contain quantitative results, and rather rely solely on visual inspection of the results. All of these practices have led to subjective results and have greatly hindered our ability to compare different machine-learning models against each other and understand the advantages and disadvantages of each approach. To address these issues, we open source a fully annotated 3D geologic model of the Netherlands F3 block. This model is based on study of the 3D seismic data in addition to 26 well logs, and it is grounded on the careful study of the geology of the region. Furthermore, we have developed two baseline models for facies classification based on a deconvolution network architecture and make their codes publicly available. Finally, we have developed a scheme for evaluating different models on this data set, and we have evaluated the results of our baseline models. In addition to making the data set and the code publicly available, our work helps advance research in this area by creating an objective benchmark for comparing the results of different machine-learning approaches for facies classification.


2016 ◽  
Vol 23 (4) ◽  
pp. 507-534 ◽  
Author(s):  
ROBERTO ZANOLI ◽  
SILVIA COLOMBO

AbstractTextual Entailment is a directional relation between two text fragments. The relation holds whenever the truth of one text fragment, called Hypothesis (H), follows from another text fragment, called Text (T). Up until now, using machine learning approaches for recognizing textual entailment has been hampered by the limited availability of data. We present an approach based on syntactic transformations and machine learning techniques which is designed to fit well with a new type of available data sets that are larger but less complex than data sets used in the past. The transformations are not predefined, but calculated from the data sets, and then used as features in a supervised learning classifier. The method has been evaluated using two data sets: the SICK data set and the EXCITEMENT English data set. While both data sets are of a larger order of magnitude than data sets such as RTE-3, they are also of lower levels of complexity, each in its own way. SICK consists of pairs created by applying a predefined set of syntactic and lexical rules to its T and H pairs, which can be accurately captured by our transformations. The EXCITEMENT English data contains short pieces of text that do not require a high degree of text understanding to be annotated. The resulting AdArte system is simple to understand and implement, but also effective when compared with other existing systems. AdArte has been made freely available with the EXCITEMENT Open Platform, an open source platform for textual inference.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


Sign in / Sign up

Export Citation Format

Share Document