A Machine Learning Approach to Cloud Masking in Sentinel-3 SLSTR Data

Author(s):  
Samuel Jackson ◽  
Jeyarajan Thiyagalingam ◽  
Caroline Cox

<p><span>Clouds appear ubiquitously in the Earth's atmosphere, and thus present a persistent problem for the accurate retrieval of remotely sensed information. The task of identifying which pixels are cloud, and which are not, is what we refer as the cloud masking problem. The task of cloud masking essentially boils down to assigning a binary label, representing either "cloud" or "clear", to each pixel. </span></p><p><span>Although this problem appears trivial, it is often complicated by a diverse number of issues that affect the imagery obtained from remote sensing instruments. For instance, snow, sea ice, dust, smoke, and sun glint can easily challenge the robustness and consistency of any cloud masking algorithm. The cloud masking problem is also further complicated by geographic and seasonal variation in acquired scenes. </span></p><p><span>In this work, we present a machine learning approach to handle the problem of cloud masking for the Sea and Land Surface Temperature Radiometer (SLSTR) on board the Sentinel-3 satellites. Our model uses Gradient Boosting Decision Trees (GBDTs), to perform pixel-wise segmentation of satellite images. The model is trained using a hand labelled dataset of ~12,000 individual pixels covering both the spatial and temporal domains of the SLSTR instrument and utilises the combined channels of the dual-view swaths. Pixel level annotations, while lacking spatial context, have the advantage of being cheaper to obtain compared to fully labelled images, a major problem in applying machine learning to remote sensing imagrey.</span></p><p><span>We validate the performance of our mask using cross validation and compare its performance with two baseline models provided in the SLSTR level 1 product. We show up to 10% improvement in binary classification accuracy compared with the baseline methods. Additionally, we show that our model has the ability to distinguish between different classes of cloud to reasonable accuracy.</span></p>

2020 ◽  
Vol 49 (1) ◽  
pp. 76-90
Author(s):  
Richard T. Wang ◽  
Patrick D. Tucker

We investigate the influence of partisanship on congressional communication by analyzing 180,000 press releases issued by members of Congress (MCs) between 2005 and 2019. Specifically, we examine whether partisan factors such as party control of the White House and/or Congress influence the tone used by MCs and whether MCs are more likely to focus on issues that their respective party owns. Our analyses include the use of multiple OLS models, the machine learning approach gradient boosting, and Grimmer’s topical modeling software “expAgenda.” We find that (1) partisanship influences the tone MCs use when communicating online; and (2) MCs are unable to prioritize discussing issues that their respective party own but devote slightly greater attention to their party’s issues than MCs from the opposite party. Our study ultimately finds strong evidence of partisan influence in the way MCs design their press releases and has important implications for online congressional communication.


Author(s):  
Amy Marie Campbell ◽  
Marie-Fanny Racault ◽  
Stephen Goult ◽  
Angus Laurenson

Oceanic and coastal ecosystems have undergone complex environmental changes in recent years, amid a context of climate change. These changes are also reflected in the dynamics of water-borne diseases as some of the causative agents of these illnesses are ubiquitous in the aquatic environment and their survival rates are impacted by changes in climatic conditions. Previous studies have established strong relationships between essential climate variables and the coastal distribution and seasonal dynamics of the bacteria Vibrio cholerae, pathogenic types of which are responsible for human cholera disease. In this study we provide a novel exploration of the potential of a machine learning approach to forecast environmental cholera risk in coastal India, home to more than 200 million inhabitants, utilising atmospheric, terrestrial and oceanic satellite-derived essential climate variables. A Random Forest classifier model is developed, trained and tested on a cholera outbreak dataset over the period 2010–2018 for districts along coastal India. The random forest classifier model has an Accuracy of 0.99, an F1 Score of 0.942 and a Sensitivity score of 0.895, meaning that 89.5% of outbreaks are correctly identified. Spatio-temporal patterns emerged in terms of the model’s performance based on seasons and coastal locations. Further analysis of the specific contribution of each Essential Climate Variable to the model outputs shows that chlorophyll-a concentration, sea surface salinity and land surface temperature are the strongest predictors of the cholera outbreaks in the dataset used. The study reveals promising potential of the use of random forest classifiers and remotely-sensed essential climate variables for the development of environmental cholera-risk applications. Further exploration of the present random forest model and associated essential climate variables is encouraged on cholera surveillance datasets in other coastal areas affected by the disease to determine the model’s transferability potential and applicative value for cholera forecasting systems.


2021 ◽  
Author(s):  
Md. Zahangir Alam ◽  
Albino Simonetti ◽  
Rafaelle Billantino ◽  
Nick Tayler ◽  
Chris Grainge ◽  
...  

Providing proper timely treatment of asthma, self-monitoring can play a vital role in disease control. Existing methods (such as peak flow meter, smart spirometer) requires special equipment and are not always used by the patient. Using voice recording as surrogate measures of lung function can be used to assess asthma, which has good potential to self-monitor asthma and could be integrated into telehealth platforms. This study aims to apply machine learning approach to predict lung functions from recorded voice for asthma patients. A threshold-based mechanism was designed to separate speech and breathing from recordings (323 recordings from 26 participants) and features extracted from these were combined with biological attributes and lung function (percentage predicted forced expiratory volume in 1 second, FEV1%). Three predictive models were developed: (a) regression models to predict lung function, (b) multi-class classification models to predict the severity, and (c) binary classification models to predict abnormality. Random Forest (RF), Support Vector Machine (SVM), and Linear Regression (LR) algorithms were implemented to develop these predictive models. Training and test samples were separated (70%:30% using balanced portioning). Features were normalised and 10-fold cross-validation used to measure the model's training performances on the training samples. Models were then run on the test samples to measure the final performances. The RF based regression model performed better with lowest root mean square error = 10.86, and mean absolute score = 11.47, as compared to other models. In predicting the severity of lung function, the SVM based model performed better with 73.20% accuracy. The RF based model performed better in binary classification models for predicting abnormality of lung function (accuracy = 0.85, F1-score = 0.84, and area under the receiver operating characteristic curve = 0.88). The proposed machine learning approach can predict lung function (in terms of FEV1%), from the recorded voice files, better than other published approaches. These models can be extended to predict both the severity and abnormality of lung function with reasonable accuracies. This technique could be used to develop future telehealth solutions including smartphone-based applications which have potential to aid decision making and self-monitoring in asthma.


Author(s):  
T. Stomberg ◽  
I. Weber ◽  
M. Schmitt ◽  
R. Roscher

Abstract. Explainable machine learning has recently gained attention due to its contribution to understanding how a model works and why certain decisions are made. A so far less targeted goal, especially in remote sensing, is the derivation of new knowledge and scientific insights from observational data. In our paper, we propose an explainable machine learning approach to address the challenge that certain land cover classes such as wilderness are not well-defined in satellite imagery and can only be used with vague labels for mapping. Our approach consists of a combined U-Net and ResNet-18 that can perform scene classification while providing at the same time interpretable information with which we can derive new insights about classes. We show that our methodology allows us to deepen our understanding of what makes nature wild by automatically identifying simple concepts such as wasteland that semantically describes wilderness. It further quantifies a class’s sensitivity with respect to a concept and uses it as an indicator for how well a concept describes the class.


Sign in / Sign up

Export Citation Format

Share Document