The Active Segmentation Platform for Microscopic Image Classification and Segmentation

Sumit K. Vohra; Dimiter Prodanov

doi:10.3390/brainsci11121645

Analysis of Kinase Inhibitors and Druggability of Kinase-Targets Using Machine Learning Techniques

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch009 ◽

2012 ◽

pp. 155-165

Author(s):

S. Prasanthi ◽

S.Durga Bhavani ◽

T. Sobha Rani ◽

Raju S. Bapi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Kinase Inhibitors ◽

Kinase Inhibitor ◽

Classification Problem ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Decision Tree Classifier ◽

Data Set ◽

Learning Techniques

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.

Get full-text (via PubEx)

Analysis of Kinase Inhibitors and Druggability of Kinase-Targets Using Machine Learning Techniques

Bioinformatics ◽

10.4018/978-1-4666-3604-0.ch050 ◽

2013 ◽

pp. 937-947

Author(s):

S. Prasanthi ◽

S.Durga Bhavani ◽

T. Sobha Rani ◽

Raju S. Bapi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Kinase Inhibitors ◽

Kinase Inhibitor ◽

Classification Problem ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Decision Tree Classifier ◽

Data Set ◽

Learning Techniques

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.

Get full-text (via PubEx)

A deep learning approach for staging embryonic tissue isolates with small data

10.1101/2020.07.15.204735 ◽

2020 ◽

Author(s):

Adam Pond ◽

Seongwon Hwang ◽

Berta Verd ◽

Benjamin Steventon

Keyword(s):

Machine Learning ◽

3D Culture ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Learning Approaches ◽

Data Set ◽

Set Size ◽

In Vitro Systems

AbstractMachine learning approaches are becoming increasingly widespread and are now present in most areas of research. Their recent surge can be explained in part due to our ability to generate and store enormous amounts of data with which to train these models. The requirement for large training sets is also responsible for limiting further potential applications of machine learning, particularly in fields where data tend to be scarce such as developmental biology. However, recent research seems to indicate that machine learning and Big Data can sometimes be decoupled to train models with modest amounts of data. In this work we set out to train a CNN-based classifier to stage zebrafish tail buds at four different stages of development using small information-rich data sets. Our results show that two and three dimensional convolutional neural networks can be trained to stage developing zebrafish tail buds based on both morphological and gene expression confocal microscopy images, achieving in each case up to 100% test accuracy scores. Importantly, we show that high accuracy can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a convolutional neural net. Furthermore, our classifier shows that it is possible to stage isolated embryonic structures without the need to refer to classic developmental landmarks in the whole embryo, which will be particularly useful to stage 3D culture in vitro systems such as organoids. We hope that this work will provide a proof of principle that will help dispel the myth that large data set sizes are always required to train CNNs, and encourage researchers in fields where data are scarce to also apply ML approaches.Author summaryThe application of machine learning approaches currently hinges on the availability of large data sets to train the models with. However, recent research has shown that large data sets might not always be required. In this work we set out to see whether we could use small confocal microscopy image data sets to train a convolutional neural network (CNN) to stage zebrafish tail buds at four different stages in their development. We found that high test accuracies can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a CNN. This work also shows that we can robustly stage the embryonic development of isolated structures, without the need to refer back to landmarks in the tail bud. This constitutes an important methodological advance for staging organoids and other 3D culture in vitro systems. This work proves that prohibitively large data sets are not always required to train CNNs, and we hope will encourage others to apply the power of machine learning to their areas of study even if data are scarce.

Get full-text (via PubEx)

Benchmark and application of unsupervised classification approaches for univariate data

Communications Physics ◽

10.1038/s42005-021-00549-9 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Maria El Abbassi ◽

Jan Overbeck ◽

Oliver Braun ◽

Michel Calame ◽

Herre S. J. van der Zant ◽

...

Keyword(s):

Machine Learning ◽

A Priori ◽

Clustering Algorithms ◽

Feature Space ◽

Unsupervised Classification ◽

Optimum Number ◽

Data Sets ◽

Learning Approaches ◽

Wide Range ◽

Characteristic Features

AbstractUnsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific disciplines and is particularly useful for applications without a priori knowledge of the data structure. Here, we introduce an approach for unsupervised data classification of any dataset consisting of a series of univariate measurements. It is therefore ideally suited for a wide range of measurement types. We apply it to the field of nanoelectronics and spectroscopy to identify meaningful structures in data sets. We also provide guidelines for the estimation of the optimum number of clusters. In addition, we have performed an extensive benchmark of novel and existing machine learning approaches and observe significant performance differences. Careful selection of the feature space construction method and clustering algorithms for a specific measurement type can therefore greatly improve classification accuracies.

Get full-text (via PubEx)

Image Classification Using PSO-SVM and an RGB-D Sensor

Mathematical Problems in Engineering ◽

10.1155/2014/695910 ◽

2014 ◽

Vol 2014 ◽

pp. 1-17 ◽

Cited By ~ 4

Author(s):

Carlos López-Franco ◽

Luis Villavicencio ◽

Nancy Arana-Daniel ◽

Alma Y. Alanis

Keyword(s):

Image Classification ◽

Feature Space ◽

Classification Problem ◽

Support Vector ◽

Object Model ◽

Training Process ◽

Data Set ◽

Distribution Of Points ◽

Rich Information ◽

Object Models

Image classification is a process that depends on the descriptor used to represent an object. To create such descriptors we use object models with rich information of the distribution of points. The object model stage is improved with an optimization process by spreading the point that conforms the mesh. In this paper, particle swarm optimization (PSO) is used to improve the model generation, while for the classification problem a support vector machine (SVM) is used. In order to measure the performance of the proposed method a group of objects from a public RGB-D object data set has been used. Experimental results show that our approach improves the distribution on the feature space of the model, which allows to reduce the number of support vectors obtained in the training process.

Get full-text (via PubEx)

Coral Image Segmentation with Point-Supervision via Latent Dirichlet Allocation with Spatial Coherence

Journal of Marine Science and Engineering ◽

10.3390/jmse9020157 ◽

2021 ◽

Vol 9 (2) ◽

pp. 157

Author(s):

Xi Yu ◽

Bing Ouyang ◽

Jose C. Principe

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Latent Dirichlet Allocation ◽

Spatial Coherence ◽

Image Data ◽

Feature Space ◽

Data Sets ◽

Marine Animals ◽

Data Set ◽

Dirichlet Allocation

Deep neural networks provide remarkable performances on supervised learning tasks with extensive collections of labeled data. However, creating such large well-annotated data sets requires a considerable amount of resources, time and effort, especially for underwater images data sets such as corals and marine animals. Therefore, the overreliance on labels is one of the main obstacles for widespread applications of deep learning methods. In order to overcome this need for large annotated dataset, this paper proposes a label-efficient deep learning framework for image segmentation using only very sparse point-supervision. Our approach employs a latent Dirichlet allocation (LDA) with spatial coherence on feature space to iteratively generate pseudo labels. The method requires, as an initial condition, a Wide Residual Network (WRN) trained with sparse labels and mutual information constraints. The proposed method is evaluated on the sparsely labeled coral image data set collected from the Pulley Ridge region in the Gulf of Mexico. Experiments show that our method can improve image segmentation performance against sparsely labeled samples and achieves better results compared with other semi-supervised approaches.

Get full-text (via PubEx)

A machine-learning benchmark for facies classification

Interpretation ◽

10.1190/int-2018-0249.1 ◽

2019 ◽

Vol 7 (3) ◽

pp. SE175-SE187 ◽

Cited By ~ 11

Author(s):

Yazeed Alaudah ◽

Patrycja Michałowicz ◽

Motaz Alfarraj ◽

Ghassan AlRegib

Keyword(s):

Machine Learning ◽

Network Architecture ◽

Data Sets ◽

Careful Study ◽

Learning Approaches ◽

Data Set ◽

Advantages And Disadvantages ◽

Facies Classification ◽

Advance Research ◽

Quantitative Results

The recent interest in using deep learning for seismic interpretation tasks, such as facies classification, has been facing a significant obstacle, namely, the absence of large publicly available annotated data sets for training and testing models. As a result, researchers have often resorted to annotating their own training and testing data. However, different researchers may annotate different classes or use different train and test splits. In addition, it is common for papers that apply machine learning for facies classification to not contain quantitative results, and rather rely solely on visual inspection of the results. All of these practices have led to subjective results and have greatly hindered our ability to compare different machine-learning models against each other and understand the advantages and disadvantages of each approach. To address these issues, we open source a fully annotated 3D geologic model of the Netherlands F3 block. This model is based on study of the 3D seismic data in addition to 26 well logs, and it is grounded on the careful study of the geology of the region. Furthermore, we have developed two baseline models for facies classification based on a deconvolution network architecture and make their codes publicly available. Finally, we have developed a scheme for evaluating different models on this data set, and we have evaluated the results of our baseline models. In addition to making the data set and the code publicly available, our work helps advance research in this area by creating an objective benchmark for comparing the results of different machine-learning approaches for facies classification.

Get full-text (via PubEx)

A transformation-driven approach for recognizing textual entailment

Natural Language Engineering ◽

10.1017/s1351324916000176 ◽

2016 ◽

Vol 23 (4) ◽

pp. 507-534 ◽

Cited By ~ 2

Author(s):

ROBERTO ZANOLI ◽

SILVIA COLOMBO

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Data Sets ◽

Learning Approaches ◽

Data Set ◽

Open Platform ◽

Text Fragment ◽

Textual Entailment ◽

Order Of Magnitude ◽

Recognizing Textual Entailment

AbstractTextual Entailment is a directional relation between two text fragments. The relation holds whenever the truth of one text fragment, called Hypothesis (H), follows from another text fragment, called Text (T). Up until now, using machine learning approaches for recognizing textual entailment has been hampered by the limited availability of data. We present an approach based on syntactic transformations and machine learning techniques which is designed to fit well with a new type of available data sets that are larger but less complex than data sets used in the past. The transformations are not predefined, but calculated from the data sets, and then used as features in a supervised learning classifier. The method has been evaluated using two data sets: the SICK data set and the EXCITEMENT English data set. While both data sets are of a larger order of magnitude than data sets such as RTE-3, they are also of lower levels of complexity, each in its own way. SICK consists of pairs created by applying a predefined set of syntactic and lexical rules to its T and H pairs, which can be accurately captured by our transformations. The EXCITEMENT English data contains short pieces of text that do not require a high degree of text understanding to be annotated. The resulting AdArte system is simple to understand and implement, but also effective when compared with other existing systems. AdArte has been made freely available with the EXCITEMENT Open Platform, an open source platform for textual inference.

Get full-text (via PubEx)

Machine Learning Approaches for the Analysis of Non-Metallic Inclusion Data Sets

AISTech2019 Proceedings of the Iron and Steel Technology Conference ◽

10.33313/377/275 ◽

2019 ◽

Author(s):

M. Webler ◽

B. Abdulsalam

Keyword(s):

Machine Learning ◽

Data Sets ◽

Learning Approaches ◽

Metallic Inclusion

Get full-text (via PubEx)

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Get full-text (via PubEx)