Machine Learning Improves the Precision and Robustness of High-Content Screens

Imaging-based high-content screens often rely on single cell-based evaluation of phenotypes in large data sets of microscopic images. Traditionally, these screens are analyzed by extracting a few image-related parameters and use their ratios (linear single or multiparametric separation) to classify the cells into various phenotypic classes. In this study, the authors show how machine learning–based classification of individual cells outperforms those classical ratio-based techniques. Using fluorescent intensity and morphological and texture features, they evaluated how the performance of data analysis increases with increasing feature numbers. Their findings are based on a case study involving an siRNA screen monitoring nucleoplasmic and nucleolar accumulation of a fluorescently tagged reporter protein. For the analysis, they developed a complete analysis workflow incorporating image segmentation, feature extraction, cell classification, hit detection, and visualization of the results. For the classification task, the authors have established a new graphical framework, the Advanced Cell Classifier, which provides a very accurate high-content screen analysis with minimal user interaction, offering access to a variety of advanced machine learning methods.

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

A User Interaction Model for Manipulation of Large Data Sets

Computer Science and Statistics: Proceedings of the 14th Symposium on the Interface ◽

10.1007/978-1-4612-5545-1_21 ◽

1983 ◽

pp. 118-128

Author(s):

James J. Thomas

Keyword(s):

Interaction Model ◽

User Interaction ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Deep Learning Approaches for Sentiment Analysis Challenges and Future Issues

10.4018/978-1-7998-8161-2.ch003 ◽

2022 ◽

pp. 27-50

Author(s):

Rajalaxmi Prabhu B. ◽

Seema S.

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Model Building ◽

Large Data ◽

Machine Learning Algorithms ◽

Large Data Sets ◽

Data Sets ◽

Learning Approaches ◽

Learning Techniques ◽

Important Challenge

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.

Download Full-text

A Framework for the Forensic Analysis of User Interaction with Social Media

International Journal of Digital Crime and Forensics ◽

10.4018/jdcf.2012100102 ◽

2012 ◽

Vol 4 (4) ◽

pp. 15-30 ◽

Cited By ~ 2

Author(s):

John Haggerty ◽

Mark C. Casson ◽

Sheryllynne Haggerty ◽

Mark J. Taylor

Keyword(s):

Social Media ◽

User Interaction ◽

Large Data ◽

Forensic Analysis ◽

Data Sets ◽

User Engagement ◽

Online Data ◽

Use Of Social Media ◽

Temporal Dimensions ◽

Media Applications

The increasing use of social media, applications or platforms that allow users to interact online, ensures that this environment will provide a useful source of evidence for the forensics examiner. Current tools for the examination of digital evidence find this data problematic as they are not designed for the collection and analysis of online data. Therefore, this paper presents a framework for the forensic analysis of user interaction with social media. In particular, it presents an inter-disciplinary approach for the quantitative analysis of user engagement to identify relational and temporal dimensions of evidence relevant to an investigation. This framework enables the analysis of large data sets from which a (much smaller) group of individuals of interest can be identified. In this way, it may be used to support the identification of individuals who might be ‘instigators’ of a criminal event orchestrated via social media, or a means of potentially identifying those who might be involved in the ‘peaks’ of activity. In order to demonstrate the applicability of the framework, this paper applies it to a case study of actors posting to a social media Web site.

Download Full-text

Machine learning-based fracture-hit detection algorithm using LFDAS signal

The Leading Edge ◽

10.1190/tle38070520.1 ◽

2019 ◽

Vol 38 (7) ◽

pp. 520-524 ◽

Cited By ~ 5

Author(s):

Ge Jin ◽

Kevin Mendoza ◽

Baishali Roy ◽

Darryl G. Buswell

Keyword(s):

Machine Learning ◽

Low Frequency ◽

Detection Algorithm ◽

Data Sets ◽

Fracture Zones ◽

Learning Techniques ◽

Distributed Acoustic Sensing ◽

Probability Of Fracture ◽

Simple Neural Network

Low-frequency distributed acoustic sensing (LFDAS) signal has been used to detect fracture hits at offset monitor wells during hydraulic fracturing operations. Typically, fracture hits are manually identified, which can be subjective and inefficient. We implemented machine learning-based models using supervised learning techniques in order to identify fracture zones, which demonstrate a high probability of fracture hits automatically. Several features are designed and calculated from LFDAS data to highlight fracture-hit characterizations. A simple neural network model is trained to fit the manually picked fracture hits. The fracture-hit probability, as predicted by the model, agrees well with the manual picks in training, validation, and test data sets. The algorithm was used in a case study of an unconventional reservoir. The results indicate that smaller cluster spacing design creates denser fractures.

Download Full-text

Interactive exploration and modeling of large data sets: a case study with Venus light scattering data

Proceedings of Seventh Annual IEEE Visualization '96 ◽

10.1109/visual.1996.568150 ◽

1996 ◽

Cited By ~ 1

Author(s):

J.J. van Wijk ◽

H.J.W. Spoelder ◽

W.-J. Knibbe ◽

K.E. Shahroudi

Keyword(s):

Light Scattering ◽

Large Data ◽

Large Data Sets ◽

Scattering Data ◽

Data Sets ◽

Interactive Exploration

Download Full-text

Precision-Recall versus Accuracy and the Role of Large Data Sets

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014039 ◽

2019 ◽

Vol 33 ◽

pp. 4039-4048 ◽

Cited By ~ 8

Author(s):

Brendan Juba ◽

Hai S. Le

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Imbalanced Data ◽

Large Data ◽

Constant Factor ◽

Data Sets ◽

Data Set ◽

Small Constant ◽

Classifier Performance ◽

Necessary And Sufficient

Practitioners of data mining and machine learning have long observed that the imbalance of classes in a data set negatively impacts the quality of classifiers trained on that data. Numerous techniques for coping with such imbalances have been proposed, but nearly all lack any theoretical grounding. By contrast, the standard theoretical analysis of machine learning admits no dependence on the imbalance of classes at all. The basic theorems of statistical learning establish the number of examples needed to estimate the accuracy of a classifier as a function of its complexity (VC-dimension) and the confidence desired; the class imbalance does not enter these formulas anywhere. In this work, we consider the measures of classifier performance in terms of precision and recall, a measure that is widely suggested as more appropriate to the classification of imbalanced data. We observe that whenever the precision is moderately large, the worse of the precision and recall is within a small constant factor of the accuracy weighted by the class imbalance. A corollary of this observation is that a larger number of examples is necessary and sufficient to address class imbalance, a finding we also illustrate empirically.

Download Full-text