Leveraging spatial textures, through machine learning, to identify aerosols and distinct cloud types from multispectral observations

Abstract. Current cloud and aerosol identification methods for multispectral radiometers, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS), employ multichannel spectral tests on individual pixels (i.e., fields of view). The use of the spatial information in cloud and aerosol algorithms has been primarily through statistical parameters such as nonuniformity tests of surrounding pixels with cloud classification provided by the multispectral microphysical retrievals such as phase and cloud top height. With these methodologies there is uncertainty in identifying optically thick aerosols, since aerosols and clouds have similar spectral properties in coarse-spectral-resolution measurements. Furthermore, identifying clouds regimes (e.g., stratiform, cumuliform) from just spectral measurements is difficult, since low-altitude cloud regimes have similar spectral properties. Recent advances in computer vision using deep neural networks provide a new opportunity to better leverage the coherent spatial information in multispectral imagery. Using a combination of machine learning techniques combined with a new methodology to create the necessary training data, we demonstrate improvements in the discrimination between cloud and severe aerosols and an expanded capability to classify cloud types. The labeled training dataset was created from an adapted NASA Worldview platform that provides an efficient user interface to assemble a human-labeled database of cloud and aerosol types. The convolutional neural network (CNN) labeling accuracy of aerosols and cloud types was quantified using independent Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) and MODIS cloud and aerosol products. By harnessing CNNs with a unique labeled dataset, we demonstrate the improvement of the identification of aerosols and distinct cloud types from MODIS and VIIRS images compared to a per-pixel spectral and standard deviation thresholding method. The paper concludes with case studies that compare the CNN methodology results with the MODIS cloud and aerosol products.

Download Full-text

Leveraging spatial textures, through machine learning, to identify aerosol and distinct cloud types from multispectral observations

10.5194/amt-2020-74 ◽

2020 ◽

Author(s):

Willem J. Marais ◽

Robert E. Holz ◽

Jeffrey S. Reid ◽

Rebecca M. Willett

Keyword(s):

Machine Learning ◽

Spectral Properties ◽

Infrared Imaging ◽

Spatial Information ◽

Training Data ◽

Machine Learning Techniques ◽

Orthogonal Polarization ◽

Statistical Parameters ◽

Moderate Resolution ◽

Moderate Resolution Imaging Spectroradiometer

Abstract. Current cloud and aerosol identification methods for multi-spectral radiometers, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS), employ multi-channel spectral tests on individual pixels (i.e. field of views). The use of the spatial information in cloud and aerosol algorithms has been primarily statistical parameters such as non-uniformity tests of surrounding pixels with cloud classification provided by the multi-spectral microphysical retrievals such as phase and cloud top height. With these methodologies there is uncertainty in identifying optically thick aerosols, since aerosols and clouds have similar spectral properties in coarse spectral-resolution measurements. Furthermore, identifying clouds regimes (e.g. stratiform, cumuliform) from just spectral measurements is difficult, since low-altitude cloud regimes have similar spectral properties. Recent advances in computer vision using deep neural networks provide a new opportunity to better leverage the coherent spatial information in multi-spectral imagery. Using a combination of machine learning techniques combined with a new methodology to create the necessary training data we demonstrate improvements in the discrimination between cloud and severe aerosols and an expanded capability to classify cloud types. The training labeled dataset was created from an adapted NASA Worldview platform that provides an efficient user interface to assemble a human labeled database of cloud and aerosol types. The Convolutional Neural Network (CNN) labeling accuracy of aerosols and cloud types was quantified using independent Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) and MODIS cloud and aerosol products. By harnessing CNNs with a unique labeled dataset, we demonstrate the improvement of the identification of aerosol and distinct cloud types from MODIS and VIIRS images compared to a per-pixel spectral and standard deviation thresholding method. The paper concludes with case studies that compare the CNN methodology results with the MODIS cloud and aerosol products.

Download Full-text

Where’s the Rock: Using Convolutional Neural Networks to Improve Land Cover Classification

Remote Sensing ◽

10.3390/rs11192211 ◽

2019 ◽

Vol 11 (19) ◽

pp. 2211 ◽

Cited By ~ 2

Author(s):

Helen Petliak ◽

Corina Cerovski-Darriau ◽

Vadim Zaliva ◽

Jonathan Stock

Keyword(s):

Machine Learning ◽

Land Cover ◽

Sierra Nevada ◽

Land Cover Classification ◽

Training Data ◽

Machine Learning Techniques ◽

Training Dataset ◽

High Quality ◽

Bare Rock ◽

Exposed Rock

While machine learning techniques have been increasingly applied to land cover classification problems, these techniques have not focused on separating exposed bare rock from soil covered areas. Therefore, we built a convolutional neural network (CNN) to differentiate exposed bare rock (rock) from soil cover (other). We made a training dataset by mapping exposed rock at eight test sites across the Sierra Nevada Mountains (California, USA) using USDA’s 0.6 m National Aerial Inventory Program (NAIP) orthoimagery. These areas were then used to train and test the CNN. The resulting machine learning approach classifies bare rock in NAIP orthoimagery with a 0.95 F 1 score. Comparatively, the classical OBIA approach gives only a 0.84 F 1 score. This is an improvement over existing land cover maps, which underestimate rock by almost 90%. The resulting CNN approach is likely scalable but dependent on high-quality imagery and high-performance algorithms using representative training sets informed by expert mapping. As image quality and quantity continue to increase globally, machine learning models that incorporate high-quality training data informed by geologic, topographic, or other topical maps may be applied to more effectively identify exposed rock in large image collections.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Machine Learning Based Assembly of Fragments of Ancient Papyrus

Journal on Computing and Cultural Heritage ◽

10.1145/3460961 ◽

2021 ◽

Vol 14 (3) ◽

pp. 1-21

Author(s):

Roy Abitbol ◽

Ilan Shimshoni ◽

Jonathan Ben-Dov

Keyword(s):

Machine Learning ◽

Spatial Information ◽

Real Life ◽

Dead Sea ◽

Machine Learning Techniques ◽

Automated Classification ◽

Learning Techniques ◽

Test Batch ◽

Validation Set ◽

Local Edge

The task of assembling fragments in a puzzle-like manner into a composite picture plays a significant role in the field of archaeology as it supports researchers in their attempt to reconstruct historic artifacts. In this article, we propose a method for matching and assembling pairs of ancient papyrus fragments containing mostly unknown scriptures. Papyrus paper is manufactured from papyrus plants and therefore portrays typical thread patterns resulting from the plant’s stems. The proposed algorithm is founded on the hypothesis that these thread patterns contain unique local attributes such that nearby fragments show similar patterns reflecting the continuations of the threads. We posit that these patterns can be exploited using image processing and machine learning techniques to identify matching fragments. The algorithm and system which we present support the quick and automated classification of matching pairs of papyrus fragments as well as the geometric alignment of the pairs against each other. The algorithm consists of a series of steps and is based on deep-learning and machine learning methods. The first step is to deconstruct the problem of matching fragments into a smaller problem of finding thread continuation matches in local edge areas (squares) between pairs of fragments. This phase is solved using a convolutional neural network ingesting raw images of the edge areas and producing local matching scores. The result of this stage yields very high recall but low precision. Thus, we utilize these scores in order to conclude about the matching of entire fragments pairs by establishing an elaborate voting mechanism. We enhance this voting with geometric alignment techniques from which we extract additional spatial information. Eventually, we feed all the data collected from these steps into a Random Forest classifier in order to produce a higher order classifier capable of predicting whether a pair of fragments is a match. Our algorithm was trained on a batch of fragments which was excavated from the Dead Sea caves and is dated circa the 1st century BCE. The algorithm shows excellent results on a validation set which is of a similar origin and conditions. We then tried to run the algorithm against a real-life set of fragments for which we have no prior knowledge or labeling of matches. This test batch is considered extremely challenging due to its poor condition and the small size of its fragments. Evidently, numerous researchers have tried seeking matches within this batch with very little success. Our algorithm performance on this batch was sub-optimal, returning a relatively large ratio of false positives. However, the algorithm was quite useful by eliminating 98% of the possible matches thus reducing the amount of work needed for manual inspection. Indeed, experts that reviewed the results have identified some positive matches as potentially true and referred them for further investigation.

Download Full-text

Synthetic Sonic Log Generation With Machine Learning: A Contest Summary From Five Methods

Petrophysics – The SPWLA Journal of Formation Evaluation and Reservoir Description ◽

10.30632/pjv62n4-2021a4 ◽

2021 ◽

Vol 62 (4) ◽

pp. 393-406

Author(s):

Yanxiang Yu ◽

◽

Chicheng Xu ◽

Siddharth Misra ◽

Weichang Li ◽

...

Keyword(s):

Machine Learning ◽

Test Data ◽

Short Term Memory ◽

Rock Physics ◽

Training Data ◽

Machine Learning Techniques ◽

Blind Test ◽

Data Set ◽

Benchmark Model ◽

Sonic Log

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

A Soft Voting Ensemble-Based Model for the Early Prediction of Idiopathic Pulmonary Fibrosis (IPF) Disease Severity in Lungs Disease Patients

Life ◽

10.3390/life11101092 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1092

Author(s):

Sikandar Ali ◽

Ali Hussain ◽

Satyabrata Aich ◽

Moo Suk Park ◽

Man Pyo Chung ◽

...

Keyword(s):

Machine Learning ◽

Idiopathic Pulmonary Fibrosis ◽

Pulmonary Fibrosis ◽

Lung Disease ◽

Lung Diseases ◽

Early Stage ◽

Confusion Matrix ◽

Machine Learning Techniques ◽

Training Dataset ◽

Proposed Model

Idiopathic pulmonary fibrosis, which is one of the lung diseases, is quite rare but fatal in nature. The disease is progressive, and detection of severity takes a long time as well as being quite tedious. With the advent of intelligent machine learning techniques, and also the effectiveness of these techniques, it was possible to detect many lung diseases. So, in this paper, we have proposed a model that could be able to detect the severity of IPF at the early stage so that fatal situations can be controlled. For the development of this model, we used the IPF dataset of the Korean interstitial lung disease cohort data. First, we preprocessed the data while applying different preprocessing techniques and selected 26 highly relevant features from a total of 502 features for 2424 subjects. Second, we split the data into 80% training and 20% testing sets and applied oversampling on the training dataset. Third, we trained three state-of-the-art machine learning models and combined the results to develop a new soft voting ensemble-based model for the prediction of severity of IPF disease in patients with this chronic lung disease. Hyperparameter tuning was also performed to get the optimal performance of the model. Fourth, the performance of the proposed model was evaluated by calculating the accuracy, AUC, confusion matrix, precision, recall, and F1-score. Lastly, our proposed soft voting ensemble-based model achieved the accuracy of 0.7100, precision 0.6400, recall 0.7100, and F1-scores 0.6600. This proposed model will help the doctors, IPF patients, and physicians to diagnose the severity of the IPF disease in its early stages and assist them to take proactive measures to overcome this disease by enabling the doctors to take necessary decisions pertaining to the treatment of IPF disease.

Download Full-text

Artificially Generated Training Data-sets for Supervised Machine Learning Techniques in Magnetic Resonance Imaging: An Example in Myocardial Segmentation

2019 Computing in Cardiology Conference (CinC) ◽

10.22489/cinc.2019.220 ◽

2019 ◽

Author(s):

Christos Xanthis ◽

Kostas Haris ◽

Dimitrios Filos ◽

Anthony Aletras

Keyword(s):

Magnetic Resonance Imaging ◽

Machine Learning ◽

Magnetic Resonance ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Sets ◽

Resonance Imaging ◽

Learning Techniques ◽

Myocardial Segmentation

Download Full-text

TermPicks: A century of Greenland glacier terminus data for use in machine learning applications

10.5194/tc-2021-311 ◽

2021 ◽

Author(s):

Sophie Goliber ◽

Taryn Black ◽

Ginny Catania ◽

James M. Lea ◽

Helene Olsen ◽

...

Keyword(s):

Machine Learning ◽

Greenland Ice Sheet ◽

Median Number ◽

Google Earth ◽

Training Data ◽

Machine Learning Techniques ◽

Data Set ◽

Outlet Glacier ◽

Glacier Terminus ◽

Median Error

Abstract. Marine-terminating outlet glacier terminus traces, mapped from satellite and aerial imagery, have been used extensively in understanding how outlet glaciers adjust to climate change variability over a range of time scales. Numerous studies have digitized termini manually, but this process is labor-intensive, and no consistent approach exists. A lack of coordination leads to duplication of efforts, particularly for Greenland, which is a major scientific research focus. At the same time, machine learning techniques are rapidly making progress in their ability to automate accurate extraction of glacier termini, with promising developments across a number of optical and SAR satellite sensors. These techniques rely on high quality, manually digitized terminus traces to be used as training data for robust automatic traces. Here we present a database of manually digitized terminus traces for machine learning and scientific applications. These data have been collected, cleaned, assigned with appropriate metadata including image scenes, and compiled so they can be easily accessed by scientists. The TermPicks data set includes 39,060 individual terminus traces for 278 glaciers with a mean and median number of traces per glacier of 136 ± 190 and 93, respectively. Across all glaciers, 32,567 dates have been picked, of which 4,467 have traces from more than one author (duplication of 14 %). We find a median error of ∼100 m among manually-traced termini. Most traces are obtained after 1999, when Landsat 7 was launched. We also provide an overview of an updated version of The Google Earth Engine Digitization Tool (GEEDiT), which has been developed specifically for future manual picking of the Greenland Ice Sheet.

Download Full-text

Android Malware Detection using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1011.0982s1219 ◽

2020 ◽

Vol 8 (2S12) ◽

pp. 65-70

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

User Interest ◽

Android Malware ◽

Android Malware Detection

Machine Learning is empowering many aspects of day-to-day lives from filtering the content on social networks to suggestions of products that we may be looking for. This technology focuses on taking objects as image input to find new observations or show items based on user interest. The major discussion here is the Machine Learning techniques where we use supervised learning where the computer learns by the input data/training data and predict result based on experience. We also discuss the machine learning algorithms: Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Decision Tress, Boosted Trees, Support Vector Machine, and use these classifiers on a dataset Malgenome and Drebin which are the Android Malware Dataset. Android is an operating system that is gaining popularity these days and with a rise in demand of these devices the rise in Android Malware. The traditional techniques methods which were used to detect malware was unable to detect unknown applications. We have run this dataset on different machine learning classifiers and have recorded the results. The experiment result provides a comparative analysis that is based on performance, accuracy, and cost.

Download Full-text