scholarly journals Gap-Filling of NDVI Satellite Data Using Tucker Decomposition: Exploiting Spatio-Temporal Patterns

2021 ◽  
Vol 13 (19) ◽  
pp. 4007
Author(s):  
Andri Freyr Þórðarson ◽  
Andreas Baum ◽  
Mónica García ◽  
Sergio M. Vicente-Serrano ◽  
Anders Stockmarr

Remote sensing satellite images in the optical domain often contain missing or misleading data due to overcast conditions or sensor malfunctioning, concealing potentially important information. In this paper, we apply expectation maximization (EM) Tucker to NDVI satellite data from the Iberian Peninsula in order to gap-fill missing information. EM Tucker belongs to a family of tensor decomposition methods that are known to offer a number of interesting properties, including the ability to directly analyze data stored in multidimensional arrays and to explicitly exploit their multiway structure, which is lost when traditional spatial-, temporal- and spectral-based methods are used. In order to evaluate the gap-filling accuracy of EM Tucker for NDVI images, we used three data sets based on advanced very-high resolution radiometer (AVHRR) imagery over the Iberian Peninsula with artificially added missing data as well as a data set originating from the Iberian Peninsula with natural missing data. The performance of EM Tucker was compared to a simple mean imputation, a spatio-temporal hybrid method, and an iterative method based on principal component analysis (PCA). In comparison, imputation of the missing data using EM Tucker consistently yielded the most accurate results across the three simulated data sets, with levels of missing data ranging from 10 to 90%.

2018 ◽  
Vol 12 (7) ◽  
pp. 2349-2370 ◽  
Author(s):  
Christine Kroisleitner ◽  
Annett Bartsch ◽  
Helena Bergstedt

Abstract. Gap filling is required for temporally and spatially consistent records of land surface temperature from satellite data due to clouds or snow cover. Land surface state, frozen versus unfrozen conditions, can be, however, captured globally with satellite data obtained by microwave sensors. The number of frozen days per year has been previously proposed to be used for permafrost extent determination. This suggests an underlying relationship between number of frozen days and mean annual ground temperature (MAGT). We tested this hypothesis for the Northern Hemisphere north of 50∘ N using coarse-spatial-resolution microwave satellite data (Metop Advanced SCATterometer – ASCAT – and Special Sensor Microwave Imager – SSM/I; 12.5 and 25 km nominal resolution; 2007–2012), which provide the necessary temporal sampling. The MAGT from GTN-P (Global Terrestrial Network for Permafrost) borehole records at the coldest sensor depth was tested for validity in order to build a comprehensive in situ data set for calibration and validation and was eventually applied. Results are discussed with respect to snow water equivalent, soil properties, land cover and permafrost type. The obtained temperature maps were classified for permafrost extent and compared to alternative approaches. An R2 of 0.99 was found for correlation between and MAGT at zero annual amplitude provided in GTN-P metadata and MAGT at the coldest sensor depth. The latter could be obtained with an RMSE of 2.2 ∘C from ASCAT and 2.5 ∘C from SSM/I surface state records using a linear model. The average deviation within the validation period is less than 1 ∘C at locations without glaciers and coastlines within the resolution cell in the case of ASCAT. The exclusion of snow melt days (available for ASCAT) led to better results. This suggests that soil warming under wet snow cover needs to be accounted for in this context. Specifically Scandinavia and western Russia are affected. In addition, MAGT at the coldest sensor depth was overestimated in areas with a certain amount of organic material and in areas of cold permafrost. The derived permafrost extent differed between the used data sets and methods. Deviations are high in central Siberia, for example. We show that microwave-satellite-derived surface state records can provide an estimation of not only permafrost extent but also MAGT without the need for gap filling. This applies specifically to ASCAT. The deviations among the tested data sets, their spatial patterns as well as in relation to environmental conditions, revealed areas which need special attention for modelling of MAGT.


2007 ◽  
Vol 56 (6) ◽  
pp. 75-83 ◽  
Author(s):  
X. Flores ◽  
J. Comas ◽  
I.R. Roda ◽  
L. Jiménez ◽  
K.V. Gernaey

The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.


2021 ◽  
pp. gr.273631.120
Author(s):  
Xinhao Liu ◽  
Huw A Ogilvie ◽  
Luay Nakhleh

Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, are coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation it is flexible enough to enable future implementations of all kinds of population models.


Author(s):  
M. McDermott ◽  
S. K. Prasad ◽  
S. Shekhar ◽  
X. Zhou

Discovery of interesting paths and regions in spatio-temporal data sets is important to many fields such as the earth and atmospheric sciences, GIS, public safety and public health both as a goal and as a preliminary step in a larger series of computations. This discovery is usually an exhaustive procedure that quickly becomes extremely time consuming to perform using traditional paradigms and hardware and given the rapidly growing sizes of today’s data sets is quickly outpacing the speed at which computational capacity is growing. In our previous work (Prasad et al., 2013a) we achieved a 50 times speedup over sequential using a single GPU. We were able to achieve near linear speedup over this result on interesting path discovery by using Apache Hadoop to distribute the workload across multiple GPU nodes. Leveraging the parallel architecture of GPUs we were able to drastically reduce the computation time of a 3-dimensional spatio-temporal interest region search on a single tile of normalized difference vegetative index for Saudi Arabia. We were further able to see an almost linear speedup in compute performance by distributing this workload across several GPUs with a simple MapReduce model. This increases the speed of processing 10 fold over the comparable sequential while simultaneously increasing the amount of data being processed by 384 fold. This allowed us to process the entirety of the selected data set instead of a constrained window.


Author(s):  
Andrew J. Connolly ◽  
Jacob T. VanderPlas ◽  
Alexander Gray ◽  
Andrew J. Connolly ◽  
Jacob T. VanderPlas ◽  
...  

With the dramatic increase in data available from a new generation of astronomical telescopes and instruments, many analyses must address the question of the complexity as well as size of the data set. This chapter deals with how we can learn which measurements, properties, or combinations thereof carry the most information within a data set. It describes techniques that are related to concepts discussed when describing Gaussian distributions, density estimation, and the concepts of information content. The chapter begins with an exploration of the problems posed by high-dimensional data. It then describes the data sets used in this chapter, and introduces perhaps the most important and widely used dimensionality reduction technique, principal component analysis (PCA). The remainder of the chapter discusses several alternative techniques which address some of the weaknesses of PCA.


2020 ◽  
Vol 21 (6) ◽  
pp. 1263-1290
Author(s):  
Gerald Blasch ◽  
Zhenhai Li ◽  
James A. Taylor

Abstract Easy-to-use tools using modern data analysis techniques are needed to handle spatio-temporal agri-data. This research proposes a novel pattern recognition-based method, Multi-temporal Yield Pattern Analysis (MYPA), to reveal long-term (> 10 years) spatio-temporal variations in multi-temporal yield data. The specific objectives are: i) synthesis of information within multiple yield maps into a single understandable and interpretable layer that is indicative of the variability and stability in yield over a 10 + years period, and ii) evaluation of the hypothesis that the MYPA enhances multi-temporal yield interpretation compared to commonly-used statistical approaches. The MYPA method automatically identifies potential erroneous yield maps; detects yield patterns using principal component analysis; evaluates temporal yield pattern stability using a per-pixel analysis; and generates productivity-stability units based on k-means clustering and zonal statistics. The MYPA method was applied to two commercial cereal fields in Australian dryland systems and two commercial fields in a UK cool-climate system. To evaluate the MYPA, its output was compared to results from a classic, statistical yield analysis on the same data sets. The MYPA explained more of the variance in the yield data and generated larger and more coherent yield zones that are more amenable to site-specific management. Detected yield patterns were associated with varying production conditions, such as soil properties, precipitation patterns and management decisions. The MYPA was demonstrated as a robust approach that can be encoded into an easy-to-use tool to produce information layers from a time-series of yield data to support management.


2018 ◽  
Vol 17 ◽  
pp. 117693511877108 ◽  
Author(s):  
Min Wang ◽  
Steven M Kornblau ◽  
Kevin R Coombes

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.


2019 ◽  
Vol 34 (9) ◽  
pp. 1369-1383 ◽  
Author(s):  
Dirk Diederen ◽  
Ye Liu

Abstract With the ongoing development of distributed hydrological models, flood risk analysis calls for synthetic, gridded precipitation data sets. The availability of large, coherent, gridded re-analysis data sets in combination with the increase in computational power, accommodates the development of new methodology to generate such synthetic data. We tracked moving precipitation fields and classified them using self-organising maps. For each class, we fitted a multivariate mixture model and generated a large set of synthetic, coherent descriptors, which we used to reconstruct moving synthetic precipitation fields. We introduced randomness in the original data set by replacing the observed precipitation fields in the original data set with the synthetic precipitation fields. The output is a continuous, gridded, hourly precipitation data set of a much longer duration, containing physically plausible and spatio-temporally coherent precipitation events. The proposed methodology implicitly provides an important improvement in the spatial coherence of precipitation extremes. We investigate the issue of unrealistic, sudden changes on the grid and demonstrate how a dynamic spatio-temporal generator can provide spatial smoothness in the probability distribution parameters and hence in the return level estimates.


Author(s):  
Suryaefiza Karjanto ◽  
Norazan Mohamed Ramli ◽  
Nor Azura Md Ghaninor Azura Md Ghani

<p class="lead">The relationship between genes in gene set analysis in microarray data is analyzed using Hotelling’s <em>T</em><sup>2</sup> but the test cannot be applied when the number of samples is larger than the number of variables which is uncommon in the microarray. Thus, in this study, we proposed shrinkage approaches to estimating the covariance matrix in Hotelling’s <em>T<sup>2</sup></em> particularly to cater high dimensionality problem in microarray data. Three shrinkage covariance methods were proposed in this study and are referred as Shrink A, Shrink B and Shrink C. The analysis of the three proposed shrinkage methods was compared with the Regularized Covariance Matrix Approach and Kong’s Principal Component Analysis. The performances of the proposed methods were assessed using several cases of simulated data sets. In many cases, the Shrink A method performed the best, followed by the Shrink C and RCMAT methods. In contrast, both the Shrink B and KPCA methods showed relatively poor results. The study contributes to an establishment of modified multivariate approach to differential gene expression analysis and expected to be applied in other areas with similar data characteristics.</p>


2016 ◽  
Vol 2016 ◽  
pp. 1-7
Author(s):  
Zhizheng Liang

Feature scaling has attracted considerable attention during the past several decades because of its important role in feature selection. In this paper, a novel algorithm for learning scaling factors of features is proposed. It first assigns a nonnegative scaling factor to each feature of data and then adopts a generalized performance measure to learn the optimal scaling factors. It is of interest to note that the proposed model can be transformed into a convex optimization problem: second-order cone programming (SOCP). Thus the scaling factors of features in our method are globally optimal in some sense. Several experiments on simulated data, UCI data sets, and the gene data set are conducted to demonstrate that the proposed method is more effective than previous methods.


Sign in / Sign up

Export Citation Format

Share Document