scholarly journals A topological approach to inferring the intrinsic dimension of convex sensing data

Author(s):  
Min-Chun Wu ◽  
Vladimir Itskov

AbstractWe consider a common measurement paradigm, where an unknown subset of an affine space is measured by unknown continuous quasi-convex functions. Given the measurement data, can one determine the dimension of this space? In this paper, we develop a method for inferring the intrinsic dimension of the data from measurements by quasi-convex functions, under natural assumptions. The dimension inference problem depends only on discrete data of the ordering of the measured points of space, induced by the sensor functions. We construct a filtration of Dowker complexes, associated to measurements by quasi-convex functions. Topological features of these complexes are then used to infer the intrinsic dimension. We prove convergence theorems that guarantee obtaining the correct intrinsic dimension in the limit of large data, under natural assumptions. We also illustrate the usability of this method in simulations.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Michele Allegra ◽  
Elena Facco ◽  
Francesco Denti ◽  
Alessandro Laio ◽  
Antonietta Mira

Abstract One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discriminate regions with different local IDs and segment the points accordingly. Our approach is computationally efficient and can be proficiently used even on large data sets. We find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded versus unfolded configurations in a protein molecular dynamics trajectory, active versus non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. A simple topological feature, the local ID, is thus sufficient to achieve an unsupervised segmentation of high-dimensional data, complementary to the one given by clustering algorithms.


2021 ◽  
Vol 5 (3) ◽  
pp. 271
Author(s):  
Agnes S Payani ◽  
Siti D Wahyuningsih ◽  
Gusti D Yudha ◽  
Nico Cendiana ◽  
Hanna Afida ◽  
...  

SPACeMAP is a remote-sensing data portal system owned by LAPAN used to distribute mosaic data of Medium-Resolution to Very-High-Resolution for Provincial Governments. The frequently arising problem is that mosaic images have very large data size, especially for SPOT-6/7 mosaic images. The increasing number of data and users may affect the data loading process on the portal so that mosaic data compression can be considered. SPACeMAP has the Image Compressor feature using the Tile and Line algorithms with a compression ratio (target rate) recommended for optics (15 to 20). This study aims to determine the best algorithm and target rate to get compressed mosaic SPOT-6/7 imagery. The comparison method was done qualitatively through visual comparison and quantitatively by using Compression Ratio (CR), Bit per Pixel (BPP), and Peak Signal to Noise Ratio (PSNR).  Results of the experiment show that, quantitatively, both Tile and Line algorithms give a different performance, depends on the zoom level and land cover characteristics. In terms of the qualitative result, the Tile algorithm gives better overall results compare to the Line algorithm. Quantitatively, both algorithms show good performance in the homogenous area. The target rate difference on the testing range does not affect process duration, nevertheless, the Line algorithm has a long process duration compare to the Tile algorithm. However, compression mosaics with lower or higher resolution remote sensing data may provide different results. Hence, this need be addressed on further studies.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Nikolay Makarenko ◽  
Maksat Kalimoldayev ◽  
Ivan Pak ◽  
Ainur Yessenaliyeva

Abstract High spatial resolution satellite images are different from Gaussian statistics of counts. Therefore, texture recognition methods based on variances become ineffective. The aim of this paper is to study possibilities of completely different, topological approach to problems of structures classification. Persistent Betti numbers are signs of texture recognition. They are not associated with metrics and received directly fromdata in form of so-called persistence diagram (PD). The different structures built on PD are used to get convenient numerical statistics. At the present time, three of such objects are known: topological landscapes, persistent images and rank functions. They have been introduced recently and appeared as an attempt to vectorize PD. Typically, each of the proposed structures was illustrated by the authors with simple examples.However, the practical application of these approaches to large data sets requires to evaluate their efficiency within the frame of the selected task at the same standard database. In our case, such a task is to recognize different textures of the Remote Sensing Data (RSD). We check efficiency of structure, called persistent images in this work. We calculate PD for base containing 800 images of high resolution representing 20 texture classes. We have found out that average efficiency of separate image recognition in the classes is 84%, and in 11 classes, it is not less than 90%. By comparison topological landscapes provide 68% for average efficiency, and only 3 classes of not less than 90%. Reached conclusions are of interest for new methods of texture recognition in RSD.


2020 ◽  
Vol 12 (20) ◽  
pp. 3411
Author(s):  
Kamil Szewczak ◽  
Helena Łoś ◽  
Rafał Pudełko ◽  
Andrzej Doroszewski ◽  
Łukasz Gluba ◽  
...  

The current Polish Agricultural Drought Monitoring System (ADMS) adopted Climatic Water Balance (CWB) as the main indicator of crop losses caused by drought conditions. All meteorological data needed for CWB assessment are provided by the ground meteorological stations network. In 2018, the network consisted of 665 stations, among which in only 58 stations full weather parameters were registered. Therefore, only these stations offered a possibility to estimate the exact values of potential evapotranspiration, which is a component of the CWB algorithm. This limitation affects the quality of CWB raster maps, interpolated on the basis of the meteorological stations network for the entire country. However, the interpolation process itself may introduce errors; therefore, the adaptation of satellite data (that are spatially continuous) should be taken into account, even if the lack of data due to cloudiness remains a serious problem. In this paper, we involved the remote sensing data from MODIS instrument and considered the ability to integrate those data with values determined by using ground measurements. The paper presents results of comparisons for the CWB index assessed using ground station data and those obtained from potential evapotranspiration as the product from Moderate Resolution Imaging Spectroradiometer (MODIS) remote sensing instrument. The comparisons of results were performed for specific points (locations of ground stations) and were expressed by differences in means values. Analysis of Pearson’s correlation coefficient (r), Mann–Kendal trend test (Z-index), mean absolute error (MAE) and root mean square error (RMSE) for ten years’ series were evaluated and are presented. In addition, the basic spatial interpretation of results has been proposed. The correlation test revealed the r coefficient in the range from 0.06 to 0.68. The results show good trend agreement in time between two types of CWB with constantly higher values of this index, which is estimated using ground measurement data. In results for 34 (from 43 analyzed) stations the Mann–Kendal test provide the consistent trend, and only nine trends were inconsistent. Analyses revealed that the disagreement between the two considered indices (determined in different ways) increased significantly in the warmer period with a significant break point between R7 and R8 that falls at the end of May for each examined year. The value of MAE varied from 80 mm to 135 mm.


Author(s):  
Uyioghosa Igie ◽  
Pablo Diez-Gonzalez ◽  
Antoine Giraud ◽  
Orlando Minervino

Gas turbine (GT) operators are often met with the challenge of utilizing and making meaning of the vast measurement data collected from machine sensors during operation. This can easily be about 576 × 106 data points of gas path measurements for one machine in a base load operation in a year, if the width of the data is 20 columns of measured and calculated parameters. This study focuses on the utilization of large data in the context of quantifying the degradation that is mostly related to compressor fouling, in addition to investigations on the impact of offline and online compressor washing. To achieve this, four GT engines operating for about 3.5 years with 51 offline washes and 1184 occasions of online washes were examined. This investigation includes different wash frequencies, liquid concentrations, and one engine operation without online washing (only offline). This study has involved correcting measurement data not only just with compressor inlet temperatures (CITs) and pressures but also with relative humidity (RH). turbomatch, an in-house GT performance simulation software has been implemented to obtain nondimensional factors for the corrections. All of the data visualization and analysis have been conducted using tableau analytics software, which facilitates the investigation of global and local events within an operation. The concept of using of handles and filters is proposed in this study, and it demonstrates the level of insight to the data and forms the basis of the outcomes obtained. This work shows that during operation, the engine performance is mostly deteriorating, though to varying degrees. Online washing also showed an influence on this, reducing the average degradation rate each hour by half, when compared to the engine operating only with offline washing. Hourly marginal improvements were also observed with an increased average wash frequency of nine hours and a similar outcome obtained when the washing solution is 2.3 times more concentrated. Clear benefits of offline washes are also presented, alongside the typically obtainable values of increased power output after a wash, also in relation to the number of operating hours before a wash.


2016 ◽  
pp. 81-86
Author(s):  
Péter Ragán

Long-term experiments are required to evaluate the impact of irrigation, nutrient utilization, and year factor as well as to assess the potential consequences of climate change. However, in the long-term experiment, it may be necessary to display spatial data for each parcel, either for investigation of soil heterogeneity or presentation. This article aims to provide help for researchers working in long-term experiments for storing and displaying spatial data. After the outlines of each experimental site were measured with GPS, a spatial database has been created in Quantum GIS. Then, a filter script in R statistical environment using RStudio graphical interface was written. The script helps avoid the QGIS data input interface so that large data can be attached to each parcel directly and as a result there is no need for a separate data entry, only the basic statistical database. The created GIS database can be used in many ways; it can be exported to KML file format that can be displayed using Google Earth. It is possible to view exported KML files in Google Drive with importing them to Google My Maps application, and with that a browser can display the map. With the Google Drive the maps can be shared within the research group, additionally the outlines can be edited and it is possible to upload the measurement data to the attributes table to existing empty table columns. The map created in Quantum GIS can be used for presentation purposes.


2021 ◽  
Author(s):  
Riccardo Fellegara ◽  
Markus Flatken ◽  
Francesco De Zan ◽  
Andreas Gerndt

<p>Over the last few years, the amount of large and complex data in the public domain has increased enormously and new challenges arose in the representation, analysis and visualization of such data. Considering the number of space missions that provided and will provide remote sensing data, there is still the need of a system that can be dispatched in several remote repositories and being accessible from a single client of commodity hardware.</p><p>To tackle this challenge, at the DLR Institute for Software Technology we have defined a dual backend frontend system, enabling the interactive analysis and visualization of large-scale remote sensing data. The basis for all visualization and interaction approaches is CosmoScout VR, a visualization tool developed internally at DLR, and publicly available on Github, that allows the visualization of complex planetary data and large simulation data in real-time. The dual component of this system is based on an MPI framework, called Viracocha, that enables the analysis of large data remotely, and allows the efficient network usage about sending compact and partial results for interactive visualization in CosmoScout as soon as they are computed.</p><p>A node-based interface is defined within the visualization tool, and this lets a domain expert to easily define customized pipelines for processing and visualizing the remote data. Each “node” of this interface is either linked with a feature extraction module, defined in Viracocha, or to a rendering module defined directly in CosmoScout. Being this interface completely customizable by a user, multiple pipelines can be defined over the same dataset to enhance even more the visualization feedback for analysis purposes.</p><p>Being an ongoing project, on top of these tools, as a novel strategy in EO data processing and visualization, we plan to define and implement strategies based on Topological Data Analysis (TDA). TDA is an emerging set of technique for processing the data considering its topological features. These include both the geometric information associated to a point, as well all the non-geometric scalar values, like temperature and pressure, to name a few, that can be captured during a monitoring mission. One of the major theories behind TDA is Discrete Morse Theory, that, given a scalar value, is used to define a gradient on such function, extract the critical points, identify the region-of-influence of each critical point, and so on. This strategy is parameter free and enables a domain scientist to process large datasets without a prior knowledge of it.</p><p>An interesting research question, that it will be investigated during this project is the correlation of changes of critical points at different time steps, and the identification of deformation (or changes) across time in the original dataset.</p>


1980 ◽  
Vol 23 (3) ◽  
pp. 317-320
Author(s):  
R. M. Mathsen

In a recent paper [1] I. B. Lazarevic announced an extension of results of L. Tornheim [2; Theorems 2 & 3] concerning points of contact between two distinct members of an n-parameter family and between a member of an n-parameter family and a corresponding convex function. In the proofs of these extensions [1; Theorems 3.1 & 3.2] use is made of Tornheim′s Convergence Theorem [2; Theorem 5]; however this theorem is not correctly applied in [1] since it requires distinct limiting nodes, and that hypothesis necessarily fails in the approach used in [1], In this note proofs of results more general than those in [1] are given independent of convergence theorems.


Author(s):  
Steven Paquette ◽  
J. David Brantley ◽  
Brian D. Corner ◽  
Peng Li ◽  
Thomas Oliver

The use of 3D scanning systems for the capture and measurement of human body dimensions is becoming commonplace. While the ability of available scanning systems to record the surface anatomy of the human body is generally regarded as acceptable for most applications, effective use of the images to obtain anthropometric data requires specially developed data extraction software. However, for large data sets, extraction of useful information can be quite time consuming. A major benefit therefore is to possess an automated software program that quickly facilitates the extraction of reliable anthropometric data from 3D scanned images. In this paper the accuracy and variability of two fully automated data extraction systems (Cyberware WB-4 scanner with Natick-Scan software and Hamamatsu BL Scanner with accompanying software) are examined and compared with measurements obtained from traditional anthropometry. In order to remove many confounding variables that living humans introduce during the scanning process, a set of clothing dressforms was chosen as the focus of study. An analysis of the measurement data generally indicates that automated data extraction compares favorably with standard anthropometry for some measurements but requires additional refinement for others.


Sign in / Sign up

Export Citation Format

Share Document