Machine learning for improved data analysis of biological aerosol using the WIBS

Simon Ruske; David O. Topping; Virginia E. Foot; Andrew P. Morse; Martin W. Gallagher

doi:10.5194/amt-11-6203-2018

Machine learning for improved data analysis of biological aerosol using the WIBS

Atmospheric Measurement Techniques ◽

10.5194/amt-11-6203-2018 ◽

2018 ◽

Vol 11 (11) ◽

pp. 6203-6230 ◽

Cited By ~ 8

Author(s):

Simon Ruske ◽

David O. Topping ◽

Virginia E. Foot ◽

Andrew P. Morse ◽

Martin W. Gallagher

Keyword(s):

Ultraviolet Light ◽

Fungal Spores ◽

Laboratory Data ◽

Misclassification Rate ◽

Gradient Boosting ◽

Classification Error ◽

Data Sets ◽

Data Preparation ◽

Different Types ◽

Selection Of

Abstract. Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will respond differently in the presence of ultraviolet light, potentially allowing for different types of biological aerosol to be discriminated. Development of ultraviolet light induced fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) has allowed for size, morphology and fluorescence measurements to be collected in real-time. However, it is unclear without studying instrument responses in the laboratory, the extent to which different types of particles can be discriminated. Collection of laboratory data is vital to validate any approach used to analyse data and ensure that the data available is utilized as effectively as possible. In this paper a variety of methodologies are tested on a range of particles collected in the laboratory. Hierarchical agglomerative clustering (HAC) has been previously applied to UV-LIF data in a number of studies and is tested alongside other algorithms that could be used to solve the classification problem: Density Based Spectral Clustering and Noise (DBSCAN), k-means and gradient boosting. Whilst HAC was able to effectively discriminate between reference narrow-size distribution PSL particles, yielding a classification error of only 1.8 %, similar results were not obtained when testing on laboratory generated aerosol where the classification error was found to be between 11.5 % and 24.2 %. Furthermore, there is a large uncertainty in this approach in terms of the data preparation and the cluster index used, and we were unable to attain consistent results across the different sets of laboratory generated aerosol tested. The lowest classification errors were obtained using gradient boosting, where the misclassification rate was between 4.38 % and 5.42 %. The largest contribution to the error, in the case of the higher misclassification rate, was the pollen samples where 28.5 % of the samples were incorrectly classified as fungal spores. The technique was robust to changes in data preparation provided a fluorescent threshold was applied to the data. In the event that laboratory training data are unavailable, DBSCAN was found to be a potential alternative to HAC. In the case of one of the data sets where 22.9 % of the data were left unclassified we were able to produce three distinct clusters obtaining a classification error of only 1.42 % on the classified data. These results could not be replicated for the other data set where 26.8 % of the data were not classified and a classification error of 13.8 % was obtained. This method, like HAC, also appeared to be heavily dependent on data preparation, requiring a different selection of parameters depending on the preparation used. Further analysis will also be required to confirm our selection of the parameters when using this method on ambient data. There is a clear need for the collection of additional laboratory generated aerosol to improve interpretation of current databases and to aid in the analysis of data collected from an ambient environment. New instruments with a greater resolution are likely to improve on current discrimination between pollen, bacteria and fungal spores and even between different species, however the need for extensive laboratory data sets will grow as a result.

Get full-text (via PubEx)

Machine learning for improved data analysis of biological aerosol using the WIBS

10.5194/amt-2018-126 ◽

2018 ◽

Cited By ~ 1

Author(s):

Simon Ruske ◽

David O. Topping ◽

Virginia E. Foot ◽

Andrew P. Morse ◽

Martin W. Gallagher

Keyword(s):

Ultraviolet Light ◽

Fungal Spores ◽

Training Data ◽

Gradient Boosting ◽

Classification Error ◽

Data Sets ◽

Data Preparation ◽

Different Types ◽

Laboratory Training ◽

Selection Of

Abstract. Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will provide different responses in the presence of ultraviolet light which potentially could be used to discriminate between different types of biological aerosol. Development of ultraviolet light induced fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) has made is possible to collect size, morphology and fluorescence measurements in real-time. However, it is unclear without studying responses from the instrument in the laboratory, the extent to which we can discriminate between different types of particles. Collection of laboratory data is vital to validate any approach used to analyse the data and to ensure that the data available is utilised as effectively as possible. In this manuscript we test a variety of methodologies on traditional reference particles and a range of laboratory generated aerosols. Hierarchical Agglomerative Clustering (HAC) has been previously applied to UV-LIF data in a number of studies and is tested alongside other algorithms that could be used to solve the classification problem: Density Based Spectral Clustering and Noise (DBSCAN), k-means and gradient boosting. Whilst HAC was able to effectively discriminate between the reference particles, yielding a classification error of only 1.8 %, similar results were not obtained when testing on laboratory generated aerosol where the classification error was found to be between 11.5 % and 24.2 %. Furthermore, there is a worryingly large uncertainty in this approach in terms of the data preparation and the cluster index used, and we were unable attain consistent results across the different sets of laboratory generated aerosol tested. The best results were obtained using gradient boosting, where the misclassification rate was between 4.38 % and 5.42 %. The largest contribution to this error was the pollen samples where 28.5 % of the samples were misclassified as fungal spores. The technique was also robust to changes in data preparation provided a fluorescent threshold was applied to the data. Where laboratory training data is unavailable, DBSCAN was found to be an potential alternative to HAC. In the case of one of the data sets where 22.9 % of the data was left unclassified we were able to produce three distinct clusters obtaining a classification error of only 1.42 % on the classified data. These results could not be replicated however for the other data set where 26.8 % of the data was not classified and a classification error of 13.8 % was obtained. This method, like HAC, also appeared to be heavily dependent on data preparation, requiring different selection of parameters dependent on the preparation used. Further analysis will also be required to confirm our selection of parameters when using this method on ambient data. There is a clear need for the collection of additional laboratory generated aerosol to improve interpretation of current databases and to aid in the analysis of data collected from an ambient environment. New instruments with a greater resolution are likely improve on current discrimination between pollen, bacteria and fungal spores and even between their different types, however the need for extensive laboratory training data sets will grow as a result.

Get full-text (via PubEx)

Evaluation of Machine Learning Algorithms for Classification of Primary Biological Aerosol using a new UV-LIF spectrometer

10.5194/amt-2016-214 ◽

2016 ◽

Cited By ~ 1

Author(s):

Simon Ruske ◽

David O. Topping ◽

Virginia E. Foot ◽

Paul H. Kaye ◽

Warren R. Stanley ◽

...

Keyword(s):

Supervised Learning ◽

Fungal Spores ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Data Sets ◽

Agglomerative Clustering ◽

Real World Data ◽

Linear Discriminant ◽

Accuracy Of Measurements

Abstract. Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen. This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification. For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and AdaBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian naïve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm. The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. Clustering, in general performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7 and 91.1 percent for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets.

Get full-text (via PubEx)

Automated Sentiment Analysis in Tourism: Comparison of Approaches

Journal of Travel Research ◽

10.1177/0047287517729757 ◽

2017 ◽

Vol 57 (8) ◽

pp. 1012-1025 ◽

Cited By ~ 35

Author(s):

Andrei P. Kirilenko ◽

Svetlana O. Stepchenkova ◽

Hany Kim ◽

Xiang (Robert) Li

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Automated Analysis ◽

Data Sets ◽

Performance Indices ◽

Analysis Software ◽

Machine Learning Methods ◽

Different Types ◽

Manual Processing ◽

Selection Of

Interest in applying Big Data to tourism is increasing, and automated sentiment analysis has been used to extract public opinion from various sources. This article evaluates the suitability of different types of automated classifiers for applications typical in tourism, hospitality, and marketing studies by comparing their performance to that of human raters. While the commonly used performance indices suggest that on easier-to-classify data sets machine learning methods demonstrate performance comparable to that by human raters, other performance measures such as Cohen’s kappa show that the results of machine learning are still inferior to manual processing. On more difficult and noisy data sets, automated analysis has poorer performance than human raters. The article discusses issues pertinent to selection of appropriate sentiment analysis software and offers a word of caution against using automated classifiers uncritically.

Get full-text (via PubEx)

Selection of one-dimensional sedimentation: models for on-line use

Water Science & Technology ◽

10.2166/wst.1995.0100 ◽

1995 ◽

Vol 31 (2) ◽

pp. 193-204 ◽

Cited By ~ 7

Author(s):

Koen Grijspeerdt ◽

Peter Vanrolleghem ◽

Willy Verstraete

Keyword(s):

Steady State ◽

Selection Criteria ◽

Data Sets ◽

Concentration Profiles ◽

A Posteriori ◽

One Dimensional ◽

On Line ◽

Dynamic Concentration ◽

Selection Of ◽

Modelling Task

A comparative study of several recently proposed one-dimensional sedimentation models has been made. This has been achieved by fitting these models to steady-state and dynamic concentration profiles obtained in a down-scaled secondary decanter. The models were evaluated with several a posteriori model selection criteria. Since the purpose of the modelling task is to do on-line simulations, the calculation time was used as one of the selection criteria. Finally, the practical identifiability of the models for the available data sets was also investigated. It could be concluded that the model of Takács et al. (1991) gave the most reliable results.

Get full-text (via PubEx)

Multi-Criteria Selection of Additives in Porous Asphalt Mixtures Using Mechanical, Hydraulic, Economic, and Environmental Indicators

Sustainability ◽

10.3390/su13042146 ◽

2021 ◽

Vol 13 (4) ◽

pp. 2146

Author(s):

Anik Gupta ◽

Carlos J. Slebi-Acevedo ◽

Esther Lizasoain-Arteaga ◽

Jorge Rodriguez-Hernandez ◽

Daniel Castro-Fresno

Keyword(s):

Cellulose Fiber ◽

Environmental Indicators ◽

Asphalt Mixtures ◽

Environmental Benefits ◽

Modified Bitumen ◽

Porous Asphalt ◽

Different Types ◽

Product Assessment ◽

Polymer Modified Bitumen ◽

Selection Of

Porous asphalt (PA) mixtures are more environmentally friendly but have lower durability than dense-graded mixtures. Additives can be incorporated into PA mixtures to enhance their mechanical strength; however, they may compromise the hydraulic characteristics, increase the total cost of pavement, and negatively affect the environment. In this paper, PA mixtures were produced with 5 different types of additives including 4 fibers and 1 filler. Their performances were compared with the reference mixtures containing virgin bitumen and polymer-modified bitumen. The performance of all mixes was assessed using: mechanical, hydraulic, economic, and environmental indicators. Then, the Delphi method was applied to compute the relative weights for the parameters in multi-criteria decision-making methods. Evaluation based on distance from average solution (EDAS), technique for order of the preference by similarity to ideal solution (TOPSIS), and weighted aggregated sum product assessment (WASPAS) were employed to rank the additives. According to the results obtained, aramid pulp displayed comparable and, for some parameters such as abrasion resistance, even better performance than polymer-modified bitumen, whereas cellulose fiber demonstrated the best performance regarding sustainability, due to economic and environmental benefits.

Get full-text (via PubEx)

Real-time Approximation of Photometric Polygonal Lights

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3384537 ◽

2020 ◽

Vol 3 (1) ◽

pp. 1-18

Author(s):

Christian Luksch ◽

Lukas Prost ◽

Michael Wimmer

Keyword(s):

Real Time ◽

Specular Reflection ◽

Near Field ◽

Measurement Data ◽

Data Sets ◽

Photometric Measurement ◽

Integration Technique ◽

Time Approximation ◽

Light Emitter ◽

Selection Of

We present a real-time rendering technique for photometric polygonal lights. Our method uses a numerical integration technique based on a triangulation to calculate noise-free diffuse shading. We include a dynamic point in the triangulation that provides a continuous near-field illumination resembling the shape of the light emitter and its characteristics. We evaluate the accuracy of our approach with a diverse selection of photometric measurement data sets in a comprehensive benchmark framework. Furthermore, we provide an extension for specular reflection on surfaces with arbitrary roughness that facilitates the use of existing real-time shading techniques. Our technique is easy to integrate into real-time rendering systems and extends the range of possible applications with photometric area lights.

Get full-text (via PubEx)

Possible Pitfalls in the Analysis of Minerals and Loose Materials by Portable XRF, and How to Overcome Them

Minerals ◽

10.3390/min11010033 ◽

2020 ◽

Vol 11 (1) ◽

pp. 33

Author(s):

Valérie Laperche ◽

Bruno Lemière

Keyword(s):

Fluorescence Spectroscopy ◽

Laboratory Data ◽

Data Sets ◽

Light Elements ◽

Portable Xrf ◽

X Ray ◽

Protective Films ◽

The Matrix ◽

Valuable Complement ◽

Physical Effects

Portable X-ray fluorescence spectroscopy is now widely used in almost any field of geoscience. Handheld XRF analysers are easy to use, and results are available in almost real time anywhere. However, the results do not always match laboratory analyses, and this may deter users. Rather than analytical issues, the bias often results from sample preparation differences. Instrument setup and analysis conditions need to be fully understood to avoid reporting erroneous results. The technique’s limitations must be kept in mind. We describe a number of issues and potential pitfalls observed from our experience and described in the literature. This includes the analytical mode and parameters; protective films; sample geometry and density, especially for light elements; analytical interferences between elements; physical effects of the matrix and sample condition, and more. Nevertheless, portable X-ray fluorescence spectroscopy (pXRF) results gathered with sufficient care by experienced users are both precise and reliable, if not fully accurate, and they can constitute robust data sets. Rather than being a substitute for laboratory analyses, pXRF measurements are a valuable complement to those. pXRF improves the quality and relevance of laboratory data sets.

Get full-text (via PubEx)

Methods for Preventing Visual Attacks in Convolutional Neural Networks Based on Data Discard and Dimensionality Reduction

Applied Sciences ◽

10.3390/app11115235 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5235

Author(s):

Nikita Andriyanov

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Inference ◽

Recognition Accuracy ◽

Network Architectures ◽

Reduced Dimensions ◽

Different Types ◽

Rectangular Area ◽

Noise Impulse ◽

Selection Of

The article is devoted to the study of convolutional neural network inference in the task of image processing under the influence of visual attacks. Attacks of four different types were considered: simple, involving the addition of white Gaussian noise, impulse action on one pixel of an image, and attacks that change brightness values within a rectangular area. MNIST and Kaggle dogs vs. cats datasets were chosen. Recognition characteristics were obtained for the accuracy, depending on the number of images subjected to attacks and the types of attacks used in the training. The study was based on well-known convolutional neural network architectures used in pattern recognition tasks, such as VGG-16 and Inception_v3. The dependencies of the recognition accuracy on the parameters of visual attacks were obtained. Original methods were proposed to prevent visual attacks. Such methods are based on the selection of “incomprehensible” classes for the recognizer, and their subsequent correction based on neural network inference with reduced image sizes. As a result of applying these methods, gains in the accuracy metric by a factor of 1.3 were obtained after iteration by discarding incomprehensible images, and reducing the amount of uncertainty by 4–5% after iteration by applying the integration of the results of image analyses in reduced dimensions.

Get full-text (via PubEx)

An algebraic approach to N-soft sets with application in decision-making using TOPSIS

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202717 ◽

2021 ◽

pp. 1-21

Author(s):

Muhammad Shabir ◽

Rimsha Mushtaq ◽

Munazza Naz

Keyword(s):

Decision Making ◽

Mathematical Models ◽

Real World ◽

Algebraic Approach ◽

Multi Criteria Decision Making ◽

Commutative Monoids ◽

Soft Sets ◽

Algebraic Properties ◽

Different Types ◽

Selection Of

In this paper, we focus on two main objectives. Firstly, we define some binary and unary operations on N-soft sets and study their algebraic properties. In unary operations, three different types of complements are studied. We prove De Morgan’s laws concerning top complements and for bottom complements for N-soft sets where N is fixed and provide a counterexample to show that De Morgan’s laws do not hold if we take different N. Then, we study different collections of N-soft sets which become idempotent commutative monoids and consequently show, that, these monoids give rise to hemirings of N-soft sets. Some of these hemirings are turned out as lattices. Finally, we show that the collection of all N-soft sets with full parameter set E and collection of all N-soft sets with parameter subset A are Stone Algebras. The second objective is to integrate the well-known technique of TOPSIS and N-soft set-based mathematical models from the real world. We discuss a hybrid model of multi-criteria decision-making combining the TOPSIS and N-soft sets and present an algorithm with implementation on the selection of the best model of laptop.

Get full-text (via PubEx)

Outlier Detection at the Parcel-Level in Wheat and Rapeseed Crops Using Multispectral and SAR Time Series

Remote Sensing ◽

10.3390/rs13050956 ◽

2021 ◽

Vol 13 (5) ◽

pp. 956

Author(s):

Florian Mouret ◽

Mohanad Albughdadi ◽

Sylvie Duthoit ◽

Denis Kouamé ◽

Guillaume Rieu ◽

...

Keyword(s):

Outlier Detection ◽

Experimental Validation ◽

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Multispectral Images ◽

Vegetation Indexes ◽

Crop Development ◽

Different Types ◽

Selection Of ◽

Sentinel 2

This paper studies the detection of anomalous crop development at the parcel-level based on an unsupervised outlier detection technique. The experimental validation is conducted on rapeseed and wheat parcels located in Beauce (France). The proposed methodology consists of four sequential steps: (1) preprocessing of synthetic aperture radar (SAR) and multispectral images acquired using Sentinel-1 and Sentinel-2 satellites, (2) extraction of SAR and multispectral pixel-level features, (3) computation of parcel-level features using zonal statistics and (4) outlier detection. The different types of anomalies that can affect the studied crops are analyzed and described. The different factors that can influence the outlier detection results are investigated with a particular attention devoted to the synergy between Sentinel-1 and Sentinel-2 data. Overall, the best performance is obtained when using jointly a selection of Sentinel-1 and Sentinel-2 features with the isolation forest algorithm. The selected features are co-polarized (VV) and cross-polarized (VH) backscattering coefficients for Sentinel-1 and five Vegetation Indexes for Sentinel-2 (among us, the Normalized Difference Vegetation Index and two variants of the Normalized Difference Water). When using these features with an outlier ratio of 10%, the percentage of detected true positives (i.e., crop anomalies) is equal to 94.1% for rapeseed parcels and 95.5% for wheat parcels.

Get full-text (via PubEx)