Machine Learning Approaches for the Analysis of Non-Metallic Inclusion Data Sets

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.

Download Full-text

Cyber Security Tool Kit (CyberSecTK): A Python Library for Machine Learning and Cyber Security

Information ◽

10.3390/info11020100 ◽

2020 ◽

Vol 11 (2) ◽

pp. 100

Author(s):

Ricardo A. Calix ◽

Sumendra B. Singh ◽

Tingyu Chen ◽

Dingkai Zhang ◽

Michael Tu

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Cyber Security ◽

Research Work ◽

Data Sets ◽

Learning Approaches ◽

Related Data ◽

Research And Teaching ◽

Survey Results ◽

Program Modules

The cyber security toolkit, CyberSecTK, is a simple Python library for preprocessing and feature extraction of cyber-security-related data. As the digital universe expands, more and more data need to be processed using automated approaches. In recent years, cyber security professionals have seen opportunities to use machine learning approaches to help process and analyze their data. The challenge is that cyber security experts do not have necessary trainings to apply machine learning to their problems. The goal of this library is to help bridge this gap. In particular, we propose the development of a toolkit in Python that can process the most common types of cyber security data. This will help cyber experts to implement a basic machine learning pipeline from beginning to end. This proposed research work is our first attempt to achieve this goal. The proposed toolkit is a suite of program modules, data sets, and tutorials supporting research and teaching in cyber security and defense. An example of use cases is presented and discussed. Survey results of students using some of the modules in the library are also presented.

Download Full-text

Human Activity Recognition of Exoskeleton Robot with Supervised Learning Techniques

10.21203/rs.3.rs-1161576/v1 ◽

2021 ◽

Author(s):

Jiacheng Mai ◽

zhiyuan chen ◽

Chunzhi Yi ◽

Zhen Ding

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Activity Recognition ◽

Human Activity ◽

Human Activity Recognition ◽

The Body ◽

Lower Limbs ◽

Data Sets ◽

Learning Approaches ◽

Learning Method

Abstract Lower limbs exoskeleton robots improve the motor ability of humans and can facilitate superior rehabilitative training. By training large datasets, many of the currently available mobile and signal devices that may be worn on the body can employ machine learning approaches to forecast and classify people's movement characteristics. This approach could help exoskeleton robots improve their ability to predict human activities. Two popular data sets are PAMAP2, which was obtained by measuring people's movement through inertial sensors, and WISDM, which was collected people's activity information through mobile phones. With the focus on human activity recognition, this paper applied the traditional machine learning method and deep learning method to train and test these datasets, whereby it was found that the prediction performance of a decision tree model was highest on these two data sets, which is 99% and 72% separately, and the time consumption of decision tree is the least. In addition, a comparison of the signals collected from different parts of the human body showed that the signals deriving from the hands presented the best performance in terms of recognizing human movement types.

Download Full-text

Comparative Characterization of Crofelemer Samples Using Data Mining and Machine Learning Approaches With Analytical Stability Data Sets

Journal of Pharmaceutical Sciences ◽

10.1016/j.xphs.2017.07.013 ◽

2017 ◽

Vol 106 (11) ◽

pp. 3270-3279 ◽

Cited By ~ 5

Author(s):

Maulik K. Nariya ◽

Jae Hyun Kim ◽

Jian Xiong ◽

Peter A. Kleindl ◽

Asha Hewarathna ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Data Sets ◽

Learning Approaches ◽

Comparative Characterization ◽

Analytical Stability ◽

Stability Data ◽

Using Data

Download Full-text

A deep learning approach for staging embryonic tissue isolates with small data

10.1101/2020.07.15.204735 ◽

2020 ◽

Author(s):

Adam Pond ◽

Seongwon Hwang ◽

Berta Verd ◽

Benjamin Steventon

Keyword(s):

Machine Learning ◽

3D Culture ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Learning Approaches ◽

Data Set ◽

Set Size ◽

In Vitro Systems

AbstractMachine learning approaches are becoming increasingly widespread and are now present in most areas of research. Their recent surge can be explained in part due to our ability to generate and store enormous amounts of data with which to train these models. The requirement for large training sets is also responsible for limiting further potential applications of machine learning, particularly in fields where data tend to be scarce such as developmental biology. However, recent research seems to indicate that machine learning and Big Data can sometimes be decoupled to train models with modest amounts of data. In this work we set out to train a CNN-based classifier to stage zebrafish tail buds at four different stages of development using small information-rich data sets. Our results show that two and three dimensional convolutional neural networks can be trained to stage developing zebrafish tail buds based on both morphological and gene expression confocal microscopy images, achieving in each case up to 100% test accuracy scores. Importantly, we show that high accuracy can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a convolutional neural net. Furthermore, our classifier shows that it is possible to stage isolated embryonic structures without the need to refer to classic developmental landmarks in the whole embryo, which will be particularly useful to stage 3D culture in vitro systems such as organoids. We hope that this work will provide a proof of principle that will help dispel the myth that large data set sizes are always required to train CNNs, and encourage researchers in fields where data are scarce to also apply ML approaches.Author summaryThe application of machine learning approaches currently hinges on the availability of large data sets to train the models with. However, recent research has shown that large data sets might not always be required. In this work we set out to see whether we could use small confocal microscopy image data sets to train a convolutional neural network (CNN) to stage zebrafish tail buds at four different stages in their development. We found that high test accuracies can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a CNN. This work also shows that we can robustly stage the embryonic development of isolated structures, without the need to refer back to landmarks in the tail bud. This constitutes an important methodological advance for staging organoids and other 3D culture in vitro systems. This work proves that prohibitively large data sets are not always required to train CNNs, and we hope will encourage others to apply the power of machine learning to their areas of study even if data are scarce.

Download Full-text

The Active Segmentation Platform for Microscopic Image Classification and Segmentation

Brain Sciences ◽

10.3390/brainsci11121645 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1645

Author(s):

Sumit K. Vohra ◽

Dimiter Prodanov

Keyword(s):

Machine Learning ◽

Image Segmentation ◽

Image Classification ◽

Domain Knowledge ◽

Feature Space ◽

Ground Truth ◽

Classification Problem ◽

Data Sets ◽

Learning Approaches ◽

Data Set

Image segmentation still represents an active area of research since no universal solution can be identified. Traditional image segmentation algorithms are problem-specific and limited in scope. On the other hand, machine learning offers an alternative paradigm where predefined features are combined into different classifiers, providing pixel-level classification and segmentation. However, machine learning only can not address the question as to which features are appropriate for a certain classification problem. The article presents an automated image segmentation and classification platform, called Active Segmentation, which is based on ImageJ. The platform integrates expert domain knowledge, providing partial ground truth, with geometrical feature extraction based on multi-scale signal processing combined with machine learning. The approach in image segmentation is exemplified on the ISBI 2012 image segmentation challenge data set. As a second application we demonstrate whole image classification functionality based on the same principles. The approach is exemplified using the HeLa and HEp-2 data sets. Obtained results indicate that feature space enrichment properly balanced with feature selection functionality can achieve performance comparable to deep learning approaches. In summary, differential geometry can substantially improve the outcome of machine learning since it can enrich the underlying feature space with new geometrical invariant objects.

Download Full-text

Benchmark and application of unsupervised classification approaches for univariate data

Communications Physics ◽

10.1038/s42005-021-00549-9 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Maria El Abbassi ◽

Jan Overbeck ◽

Oliver Braun ◽

Michel Calame ◽

Herre S. J. van der Zant ◽

...

Keyword(s):

Machine Learning ◽

A Priori ◽

Clustering Algorithms ◽

Feature Space ◽

Unsupervised Classification ◽

Optimum Number ◽

Data Sets ◽

Learning Approaches ◽

Wide Range ◽

Characteristic Features

AbstractUnsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific disciplines and is particularly useful for applications without a priori knowledge of the data structure. Here, we introduce an approach for unsupervised data classification of any dataset consisting of a series of univariate measurements. It is therefore ideally suited for a wide range of measurement types. We apply it to the field of nanoelectronics and spectroscopy to identify meaningful structures in data sets. We also provide guidelines for the estimation of the optimum number of clusters. In addition, we have performed an extensive benchmark of novel and existing machine learning approaches and observe significant performance differences. Careful selection of the feature space construction method and clustering algorithms for a specific measurement type can therefore greatly improve classification accuracies.

Download Full-text

SLIMP: Supervised learning of metabolite-protein interactions from co-fractionation mass spectrometry data

10.1101/2021.06.16.448636 ◽

2021 ◽

Author(s):

Boris M. Zühlke ◽

Ewelina M. Sokolowska ◽

Marcin Luzarowski ◽

Denis Schlossarek ◽

Monika Chodasiewicz ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Protein Interactions ◽

Mass Spectrometry Data ◽

Supervised Machine Learning ◽

Growth Stages ◽

Data Sets ◽

Learning Approaches ◽

Proteomic Data ◽

A Genome

AbstractMetabolite-protein interactions affect and shape diverse cellular processes. Yet, despite advances, approaches for identifying metabolite-protein interactions at a genome-wide scale are lacking. Here we present an approach termed SLIMP that predicts metabolite-protein interactions using supervised machine learning on features engineered from metabolic and proteomic profiles from a co-fractionation mass spectrometry-based technique. By applying SLIMP with gold standards, assembled from public databases, along with metabolic and proteomic data sets from multiple conditions and growth stages we predicted over 9,000 and 20,000 metabolite-protein interactions for Saccharomyces cerevisiae and Arabidopsis thaliana, respectively. Extensive comparative analyses corroborated the quality of the predictions from SLIMP with respect to widely-used performance measures (e.g. F1-score exceeding 0.8). SLIMP predicted novel targets of 2’, 3’ cyclic nucleotides and dipeptides, which we analysed comparatively between the two organisms. Finally, predicted interactions for the dipeptide Tyr-Asp in Arabidopsis and the dipeptide Ser-Leu in yeast were independently validated, opening the possibility for future applications of supervised machine learning approaches in this area of systems biology.

Download Full-text

AImmune: a new blood-based machine learning approach to improving immune profiling analysis on COVID-19 patients

10.1101/2021.11.26.21266883 ◽

2021 ◽

Author(s):

Xi Tom Zhang ◽

Runpeng Harris Han

Keyword(s):

Machine Learning ◽

High Performance ◽

Mononuclear Cells ◽

Data Sets ◽

Learning Approaches ◽

Rna Seq ◽

Real World Data ◽

Novel Approach ◽

Massive Number ◽

Immune Profiling

A massive number of transcriptomic profiles of blood samples from COVID-19 patients has been produced since pandemic COVID-19 begins, however, these big data from primary studies have not been well integrated by machine learning approaches. Taking advantage of modern machine learning arthrograms, we integrated and collected single cell RNA-seq (scRNA-seq) data from three independent studies, identified genes potentially available for interpretation of severity, and developed a high-performance deep learning-based deconvolution model AImmune that can predict the proportion of seven different immune cells from the bulk RNA-seq results of human peripheral mononuclear cells. This novel approach which can be used for clinical blood testing of COVID-19 on the ground that previous research shows that mRNA alternations in blood-derived PBMCs may serve as a severity indicator. Assessed on real-world data sets, the AImmune model outperformed the most recognized immune profiling model CIBERSORTx. The presented study showed the results obtained by the true scRNA-seq route can be consistently reproduced through the new approach AImmune, indicating a potential replacing the costly scRNA-seq technique for the analysis of circulating blood cells for both clinical and research purposes.

Download Full-text

Predicting novel microRNA: a comprehensive comparison of machine learning approaches

Briefings in Bioinformatics ◽

10.1093/bib/bby037 ◽

2018 ◽

Vol 20 (5) ◽

pp. 1607-1620 ◽

Cited By ~ 6

Author(s):

Georgina Stegmayer ◽

Leandro E Di Persia ◽

Mariano Rubiolo ◽

Matias Gerard ◽

Milton Pividori ◽

...

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Computational Prediction ◽

Data Sets ◽

Learning Approaches ◽

Mirna Prediction ◽

A Genome ◽

Supervised Methods ◽

Comparative Results ◽

Almost All

Abstract Motivation The importance of microRNAs (miRNAs) is widely recognized in the community nowadays because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The big issue with this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance. This imbalance has a strong influence on most standard classifiers, and if not properly addressed in the model and the experiments, not only performance reported can be completely unrealistic but also the classifier will not be able to work properly for pre-miRNA prediction. Besides, another important issue is that for most of the machine learning (ML) approaches already used (supervised methods), it is necessary to have both positive and negative examples. The selection of positive examples is straightforward (well-known pre-miRNAs). However, it is difficult to build a representative set of negative examples because they should be sequences with hairpin structure that do not contain a pre-miRNA. Results This review provides a comprehensive study and comparative assessment of methods from these two ML approaches for dealing with the prediction of novel pre-miRNAs: supervised and unsupervised training. We present and analyze the ML proposals that have appeared during the past 10 years in literature. They have been compared in several prediction tasks involving two model genomes and increasing imbalance levels. This work provides a review of existing ML approaches for pre-miRNA prediction and fair comparisons of the classifiers with same features and data sets, instead of just a revision of published software tools. The results and the discussion can help the community to select the most adequate bioinformatics approach according to the prediction task at hand. The comparative results obtained suggest that from low to mid-imbalance levels between classes, supervised methods can be the best. However, at very high imbalance levels, closer to real case scenarios, models including unsupervised and deep learning can provide better performance.

Download Full-text