MAPPING GLACIER CHANGES USING CLUSTERING TECHNIQUES ON CLOUD COMPUTING INFRASTRUCTURE

Abstract. Climate change and its effects are taking more importance nowadays; and glaciers are one of the most affected ecosystems by that, considering that the energy of Earth’s surface and its temperature may be directly related to glacier temporal changes. Then, the comprehension of glaciers behaviour, by its retreating or melting critical conditions, can be achieved by the analysis of Remote Sensing data, but considering the unprecedented volumes of information currently provided by satellites sensors, we can refer to this analysis as a big data problem. Machine learning techniques have the potential to improve the analysis of this type of data; however, most current machine learning algorithms are unable to properly process such huge volumes of data. In the attempt to overcome the computational limitations related to Remote Sensing Big Data analysis, we implemented the K-Means and Expectation Maximization algorithms, as distributed clustering solutions, exploiting the capabilities of cloud computing infrastructure for processing very large datasets. The solution was developed over the InterCloud Data Mining Package, which is a suite of distributed classification methods, previously employed in hyperspectral image analysis. In this work we extended the functionalities of that package, by making it able to process multispectral images using the aforementioned clustering algorithms. To validate our proposal, we analysed the Ausangate glacier, located on the Andes Mountains, in Peru, by mapping the changes in such environment through a multi-temporal Remote Sensing analysis. Our results and conclusions are focused on the thematic accuracy and the computational performance achieved by our proposed solution. Thematic accuracy was assessed by comparing the automatically detected glacier areas by the clustering approaches against the manually selected ground truth data. We compared the computational load involved in executing the clustering processes sequentially and in a distributed fashion, using a local mode and cluster configuration over a cloud computing infrastructure.

Download Full-text

Aflatoxin detection on direction of the 4.0 age at 3.0 costs

International Journal for Innovation Education and Research ◽

10.31686/ijier.vol7.iss7.1615 ◽

2019 ◽

Vol 7 (7) ◽

pp. 338-346 ◽

Cited By ~ 1

Author(s):

Mariana Matulovic ◽

Flávio José de Oliveira Morais ◽

Angela Vacaro de Souza ◽

Cleber Aalexandre de Amorim ◽

Luiz Fernando Sommaggio Coletta

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Cloud Computing ◽

Environmental Variables ◽

Low Cost ◽

Machine Learning Algorithms ◽

Toxic Metabolite ◽

Monitoring And Control ◽

Carcinogenic Effects ◽

And Control

Articulate the most diverse and sophisticated technologies, such as Remote Sensing, Big Data, Cloud Computing, Internet of Things, 3D Printing, among others, is part of universe 4.0, whether industrial or agricultural. Focusing on agricultural context, this paper proposes a low-cost 4.0 device to perform the monitoring and control of certain environmental variables for the detection of aflatoxins in peanut crops. Aflatoxins are toxic metabolite of fungi genus Aspergillus that can cause toxic and carcinogenic effects in humans and animals. The device developed was able to monitor temperature and humidity variations helping the aflatoxins identification. The equipment portability allows its use in silos with encapsulation via Additive Manufacturing, besides the aflatoxin prediction from Machine Learning algorithms.

Download Full-text

THE FUTURE OF SMART INDUSTRY: TECHNOLOGIES, MACRO-TRENDS AND APPLICATION AREAS

DYNA INGENIERIA E INDUSTRIA ◽

10.6036/10342 ◽

2021 ◽

Vol 96 (6) ◽

pp. 561-562

Author(s):

MIKEL NIÑO

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Big Data ◽

Data Analysis ◽

Internet Of Things ◽

Machine Learning Algorithms ◽

The Internet ◽

Big Data Technologies ◽

Smart Industry ◽

The Internet Of Things

The Smart Industry has been developing has been developing at an accelerated pace since the beginning of the last decade, driven by of the last decade, driven by the by the emergence of technologies such as the Internet of Things, Compute of Things, Cloud Computing and Big Data Cloud Computing and Big Data technologies, as well as their connection and Big Data technologies, as well as their connection with machine learning algorithms for predictive data analysis [1] of data [1].

Download Full-text

Machine Learning Algorithms for Short-Term Load Forecast in Residential Buildings Using Smart Meters, Sensors and Big Data Solutions

IEEE Access ◽

10.1109/access.2019.2958383 ◽

2019 ◽

Vol 7 ◽

pp. 177874-177889 ◽

Cited By ~ 10

Author(s):

Simona-Vasilica Oprea ◽

Adela Bara

Keyword(s):

Machine Learning ◽

Big Data ◽

Residential Buildings ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Short Term ◽

Smart Meters ◽

Load Forecast

Download Full-text

Accelerating organic solar cell material's discovery: high-throughput screening and big data

Energy & Environmental Science ◽

10.1039/d1ee00559f ◽

2021 ◽

Author(s):

Xabier Rodríguez-Martínez ◽

Enrique Pascual-San-José ◽

Mariano Campoy-Quiles

Keyword(s):

Machine Learning ◽

Big Data ◽

High Throughput ◽

Organic Solar Cells ◽

High Throughput Screening ◽

Organic Solar Cell ◽

State Of The Art ◽

Review Article ◽

Machine Learning Algorithms ◽

Device Optimization

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.

Download Full-text

Semantic segmentation of PolSAR image data using advanced deep learning model

Scientific Reports ◽

10.1038/s41598-021-94422-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Rajat Garg ◽

Anil Kumar ◽

Nikunj Bansal ◽

Manish Prateek ◽

Shashi Kumar

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Deep Learning ◽

Urban Area ◽

Urban Areas ◽

Learning Algorithms ◽

Semantic Segmentation ◽

Learning Model ◽

Machine Learning Algorithms ◽

Deep Learning Model

AbstractUrban area mapping is an important application of remote sensing which aims at both estimation and change in land cover under the urban area. A major challenge being faced while analyzing Synthetic Aperture Radar (SAR) based remote sensing data is that there is a lot of similarity between highly vegetated urban areas and oriented urban targets with that of actual vegetation. This similarity between some urban areas and vegetation leads to misclassification of the urban area into forest cover. The present work is a precursor study for the dual-frequency L and S-band NASA-ISRO Synthetic Aperture Radar (NISAR) mission and aims at minimizing the misclassification of such highly vegetated and oriented urban targets into vegetation class with the help of deep learning. In this study, three machine learning algorithms Random Forest (RF), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) have been implemented along with a deep learning model DeepLabv3+ for semantic segmentation of Polarimetric SAR (PolSAR) data. It is a general perception that a large dataset is required for the successful implementation of any deep learning model but in the field of SAR based remote sensing, a major issue is the unavailability of a large benchmark labeled dataset for the implementation of deep learning algorithms from scratch. In current work, it has been shown that a pre-trained deep learning model DeepLabv3+ outperforms the machine learning algorithms for land use and land cover (LULC) classification task even with a small dataset using transfer learning. The highest pixel accuracy of 87.78% and overall pixel accuracy of 85.65% have been achieved with DeepLabv3+ and Random Forest performs best among the machine learning algorithms with overall pixel accuracy of 77.91% while SVM and KNN trail with an overall accuracy of 77.01% and 76.47% respectively. The highest precision of 0.9228 is recorded for the urban class for semantic segmentation task with DeepLabv3+ while machine learning algorithms SVM and RF gave comparable results with a precision of 0.8977 and 0.8958 respectively.

Download Full-text

Mapping Allochemical Limestone Formations in Hazara, Pakistan Using Google Cloud Architecture: Application of Machine-Learning Algorithms on Multispectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020058 ◽

2021 ◽

Vol 10 (2) ◽

pp. 58

Author(s):

Muhammad Fawad Akbar Khan ◽

Khan Muhammad ◽

Shahid Bashir ◽

Shahab Ud Din ◽

Muhammad Hanif

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Kappa Coefficient ◽

Machine Learning Algorithms ◽

Landsat 8 ◽

Sensing Data ◽

Fossiliferous Limestone

Low-resolution Geological Survey of Pakistan (GSP) maps surrounding the region of interest show oolitic and fossiliferous limestone occurrences correspondingly in Samanasuk, Lockhart, and Margalla hill formations in the Hazara division, Pakistan. Machine-learning algorithms (MLAs) have been rarely applied to multispectral remote sensing data for differentiating between limestone formations formed due to different depositional environments, such as oolitic or fossiliferous. Unlike the previous studies that mostly report lithological classification of rock types having different chemical compositions by the MLAs, this paper aimed to investigate MLAs’ potential for mapping subclasses within the same lithology, i.e., limestone. Additionally, selecting appropriate data labels, training algorithms, hyperparameters, and remote sensing data sources were also investigated while applying these MLAs. In this paper, first, oolitic (Samanasuk), fossiliferous (Lockhart and Margalla) limestone-bearing formations along with the adjoining Hazara formation were mapped using random forest (RF), support vector machine (SVM), classification and regression tree (CART), and naïve Bayes (NB) MLAs. The RF algorithm reported the best accuracy of 83.28% and a Kappa coefficient of 0.78. To further improve the targeted allochemical limestone formation map, annotation labels were generated by the fusion of maps obtained from principal component analysis (PCA), decorrelation stretching (DS), X-means clustering applied to ASTER-L1T, Landsat-8, and Sentinel-2 datasets. These labels were used to train and validate SVM, CART, NB, and RF MLAs to obtain a binary classification map of limestone occurrences in the Hazara division, Pakistan using the Google Earth Engine (GEE) platform. The classification of Landsat-8 data by CART reported 99.63% accuracy, with a Kappa coefficient of 0.99, and was in good agreement with the field validation. This binary limestone map was further classified into oolitic (Samanasuk) and fossiliferous (Lockhart and Margalla) formations by all the four MLAs; in this case, RF surpassed all the other algorithms with an improved accuracy of 96.36%. This improvement can be attributed to better annotation, resulting in a binary limestone classification map, which formed a mask for improved classification of oolitic and fossiliferous limestone in the area.

Download Full-text

Remote sensing inversion of water quality in coastal sea area based on machine learning: a case study of Shenzhen bay, China

10.5194/egusphere-egu21-1972 ◽

2021 ◽

Author(s):

Xiaotong Zhu ◽

Jinhui Jeanne Huang

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Water Quality ◽

Predictive Accuracy ◽

Water Environment ◽

Quality Parameters ◽

Machine Learning Algorithms ◽

Dynamic Monitoring ◽

Support Vector ◽

Seawater Quality

Remote sensing monitoring has the characteristics of wide monitoring range, celerity, low cost for long-term dynamic monitoring of water environment. With the flourish of artificial intelligence, machine learning has enabled remote sensing inversion of seawater quality to achieve higher prediction accuracy. However, due to the physicochemical property of the water quality parameters, the performance of algorithms differs a lot. In order to improve the predictive accuracy of seawater quality parameters, we proposed a technical framework to identify the optimal machine learning algorithms using Sentinel-2 satellite and in-situ seawater sample data. In the study, we select three algorithms, i.e. support vector regression (SVR), XGBoost and deep learning (DL), and four seawater quality parameters, i.e. dissolved oxygen (DO), total dissolved solids (TDS), turbidity(TUR) and chlorophyll-a (Chla). The results show that SVR is a more precise algorithm to inverse DO (R2 = 0.81). XGBoost has the best accuracy for Chla and Tur inversion (R2 = 0.75 and 0.78 respectively) while DL performs better in TDS (R2 =0.789). Overall, this research provides a theoretical support for high precision remote sensing inversion of offshore seawater quality parameters based on machine learning.

Download Full-text

Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2017.2741558 ◽

2017 ◽

Vol 47 (10) ◽

pp. 2625-2626 ◽

Cited By ~ 16

Author(s):

Fuchun Sun ◽

Guang-Bin Huang ◽

Q. M. Jonathan Wu ◽

Shiji Song ◽

Donald C. Wunsch II

Keyword(s):

Machine Learning ◽

Big Data ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Scientific Programming ◽

10.1155/2021/7998417 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yao Huimin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lambda Architecture ◽

Vector Machines ◽

Data Platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text