Highly accurate differentiation of bone marrow cell morphologies using deep neural networks on a large image data set

Abstract Biomedical applications of deep learning algorithms rely on large expert annotated data sets. The classification of bone marrow (BM) cell cytomorphology, an important cornerstone of hematological diagnosis, is still done manually thousands of times every day because of a lack of data sets and trained models. We applied convolutional neural networks (CNNs) to a large data set of 171 374 microscopic cytological images taken from BM smears from 945 patients diagnosed with a variety of hematological diseases. The data set is the largest expert-annotated pool of BM cytology images available in the literature. It allows us to train high-quality classifiers of leukocyte cytomorphology that identify a wide range of diagnostically relevant cell species with high precision and recall. Our CNNs outcompete previous feature-based approaches and provide a proof-of-concept for the classification problem of single BM cells. This study is a step toward automated evaluation of BM cell morphology using state-of-the-art image-classification algorithms. The underlying data set represents an educational resource, as well as a reference for future artificial intelligence–based approaches to BM cytomorphology.

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text

Feed Forward Backpropagation Neural Networks and their Use in Predicting the Acute Toxicity of Chemicals to the Fathead Minnow

Water Quality Research Journal ◽

10.2166/wqrj.1997.037 ◽

1997 ◽

Vol 32 (3) ◽

pp. 637-658 ◽

Cited By ~ 19

Author(s):

Klaus L.E. Kaiser ◽

Stefan P. Niculescu ◽

Gerrit Schüürmann

Keyword(s):

Neural Networks ◽

Fathead Minnow ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Explicit Equation ◽

Data Set ◽

Feed Forward ◽

Backpropagation Neural Networks ◽

Water Partition Coefficient

Abstract Various aspects connected to the use of feed forward backpropagation neural networks to build multivariate QSARs based on large data sets containing considerable amounts of important information are investigated. Based on such a model and a 419 compound data set, the explicit equation of one of the resulting multivariate QSARs for the computation of toxicity to the fathead minnow is presented as function of measured Microtox, logarithms of molecular weight and octanol/water partition coefficient, and 48 other functional group and discrete descriptors.

Download Full-text

Hedging and machine learning driven crude oil data analysis using a refined Barndorff–Nielsen and Shephard model

International Journal of Financial Engineering ◽

10.1142/s2424786321500158 ◽

2021 ◽

pp. 2150015

Author(s):

Humayra Shoshi ◽

Indranil SenGupta

Keyword(s):

Crude Oil ◽

Classification Problem ◽

Commodity Markets ◽

Data Sets ◽

Refined Model ◽

Hedging Strategy ◽

Data Set ◽

Wide Range ◽

S Model ◽

Better Than

In this paper, a refined Barndorff-Nielsen and Shephard (BN-S) model is implemented to find an optimal hedging strategy for commodity markets. The refinement of the BN-S model is obtained with various machine and deep learning algorithms. The refinement leads to the extraction of a deterministic parameter from the empirical data set. The problem is transformed to an appropriate classification problem with a couple of different approaches — the volatility approach and the duration approach. The analysis is implemented to the Bakken crude oil data and the aforementioned deterministic parameter is obtained for a wide range of data sets. With the implementation of this parameter in the refined model, the resulting model performs much better than the classical BN-S model.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

A Visual and VAE Based Hierarchical Indoor Localization Method

Sensors ◽

10.3390/s21103406 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3406

Author(s):

Jie Jiang ◽

Yin Zou ◽

Lidong Chen ◽

Yujie Fang

Keyword(s):

Image Retrieval ◽

Indoor Localization ◽

Data Sets ◽

Indoor Environments ◽

Global Features ◽

Data Set ◽

Data Annotation ◽

Wide Range ◽

Annotation Costs ◽

Global And Local

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Download Full-text

A batch-wise non-linear fitting and analysis tool for treating large X-ray diffraction data sets

Journal of Applied Crystallography ◽

10.1107/s0021889805035351 ◽

2006 ◽

Vol 39 (2) ◽

pp. 262-266 ◽

Cited By ~ 7

Author(s):

R. J. Davies

Keyword(s):

Diffraction Data ◽

Operation Mode ◽

Large Data ◽

Scattering Data ◽

Data Sets ◽

Analysis Tool ◽

Data Set ◽

X Ray ◽

Linear Fitting ◽

Non Linear

Synchrotron sources offer high-brilliance X-ray beams which are ideal for spatially and time-resolved studies. Large amounts of wide- and small-angle X-ray scattering data can now be generated rapidly, for example, during routine scanning experiments. Consequently, the analysis of the large data sets produced has become a complex and pressing issue. Even relatively simple analyses become difficult when a single data set can contain many thousands of individual diffraction patterns. This article reports on a new software application for the automated analysis of scattering intensity profiles. It is capable of batch-processing thousands of individual data files without user intervention. Diffraction data can be fitted using a combination of background functions and non-linear peak functions. To compliment the batch-wise operation mode, the software includes several specialist algorithms to ensure that the results obtained are reliable. These include peak-tracking, artefact removal, function elimination and spread-estimate fitting. Furthermore, as well as non-linear fitting, the software can calculate integrated intensities and selected orientation parameters.

Download Full-text

Effects of genotype and lactation number on health and reproductive problems in dairy cows

Proceedings of the British Society of Animal Science ◽

10.1017/s1752756200595842 ◽

1997 ◽

Vol 1997 ◽

pp. 143-143

Author(s):

B.L. Nielsen ◽

R.F. Veerkamp ◽

J.E. Pryce ◽

G. Simm ◽

J.D. Oldham

Keyword(s):

Dairy Cows ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Variation Analysis ◽

Genetic Line ◽

Data Set ◽

Health Events ◽

Use Of Data ◽

Low Incidence

High producing dairy cows have been found to be more susceptible to disease (Jones et al., 1994; Göhn et al., 1995) raising concerns about the welfare of the modern dairy cow. Genotype and number of lactations may affect various health problems differently, and their relative importance may vary. The categorical nature and low incidence of health events necessitates large data-sets, but the use of data collected across herds may introduce unwanted variation. Analysis of a comprehensive data-set from a single herd was carried out to investigate the effects of genetic line and lactation number on the incidence of various health and reproductive problems.

Download Full-text

Empirical modeling of very large data sets using neural networks

Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium ◽

10.1109/ijcnn.2000.859413 ◽

2000 ◽

Cited By ~ 5

Author(s):

A.J. Owens

Keyword(s):

Neural Networks ◽

Large Data ◽

Empirical Modeling ◽

Large Data Sets ◽

Data Sets

Download Full-text

Automated Analysis of SEM X-Ray Spectral Images: A Powerful New Microanalysis Tool

Microscopy and Microanalysis ◽

10.1017/s1431927603030058 ◽

2003 ◽

Vol 9 (1) ◽

pp. 1-17 ◽

Cited By ~ 217

Author(s):

Paul G. Kotula ◽

Michael R. Keenan ◽

Joseph R. Michael

Keyword(s):

Image Data ◽

Large Data ◽

Automated Analysis ◽

Phase Identification ◽

Data Sets ◽

Multivariate Statistical ◽

X Ray ◽

Analysis Technique ◽

Statistical Analysis Technique ◽

Multi Phase

Spectral imaging in the scanning electron microscope (SEM) equipped with an energy-dispersive X-ray (EDX) analyzer has the potential to be a powerful tool for chemical phase identification, but the large data sets have, in the past, proved too large to efficiently analyze. In the present work, we describe the application of a new automated, unbiased, multivariate statistical analysis technique to very large X-ray spectral image data sets. The method, based in part on principal components analysis, returns physically accurate (all positive) component spectra and images in a few minutes on a standard personal computer. The efficacy of the technique for microanalysis is illustrated by the analysis of complex multi-phase materials, particulates, a diffusion couple, and a single-pixel-detection problem.

Download Full-text