scholarly journals A parallel discord discovery algorithm for time series on many-core accelerators

Author(s):  
М.Л. Цымблер

Диссонанс является уточнением понятия аномальной подпоследовательности (существенно непохожей на остальные подпоследовательности) временного ряда. Задача поиска диссонанса встречается в широком спектре предметных областей, связанных с временными рядами: медицина, экономика, моделирование климата и др. В работе предложен новый параллельный алгоритм поиска диссонанса во временном ряде на платформе многоядерного ускорителя для случая, когда входные данные могут быть размещены в оперативной памяти. Алгоритм использует возможность независимого вычисления евклидовых расстояний между подпоследовательностями ряда. Алгоритм состоит из двух этапов: подготовка данных и поиск. На этапе подготовки выполняется построение вспомогательных матричных структур данных, обеспечивающих распараллеливание и векторизацию вычислений. На стадии поиска осуществляется нахождение диссонанса с помощью построенных структур данных. Выполнена реализация алгоритма для ускорителей архитектур Intel MIC (Many Integrated Core) и NVIDIA GPU, распараллеливание выполнено с помощью технологий программирования OpenMP и OpenAcc соответственно. Представлены результаты вычислительных экспериментов, подтверждающих масштабируемость разработанного алгоритма. Discord is a refinement of the concept of anomalous subsequence of a time series. The discord discovery problem frequently occurs in a wide range of application areas related to time series: medicine, economics, climate modeling, etc. In this paper we propose a new parallel discord discovery algorithm for many-core systems in the case when the input data fit in the main memory. The algorithm exploits the ability to independently calculate the Euclidean distances between the subsequences of the time series. Computations are paralleled using OpenMP and OpenAcc for the Intel MIC (Many Integrated Core) and NVIDIA GPU platforms, respectively. The algorithm consists of two stages, namely precomputations and discovery. At the precomputation stage, we construct the auxiliary matrix data structures to ensure the efficient vectorization of computations on an accelerator. At the discovery stage, the algorithm searches for a discord based on the constructed structures. A number of numerical experiments confirm a high scalability of the proposed algorithm.

Author(s):  
Т.В. Речкалов ◽  
М.Л. Цымблер

Алгоритм PAM (Partitioning Around Medoids) представляет собой разделительный алгоритм кластеризации, в котором в качестве центров кластеров выбираются только кластеризуемые объекты (медоиды). Кластеризация на основе техники медоидов применяется в широком спектре приложений: сегментирование медицинских и спутниковых изображений, анализ ДНК-микрочипов и текстов и др. На сегодня имеются параллельные реализации PAM для систем GPU и FPGA, но отсутствуют таковые для многоядерных ускорителей архитектуры Intel Many Integrated Core (MIC). В настоящей статье предлагается новый параллельный алгоритм кластеризации PhiPAM для ускорителей Intel MIC. Вычисления распараллеливаются с помощью технологии OpenMP. Алгоритм предполагает использование специализированной компоновки данных в памяти и техники тайлинга, позволяющих эффективно векторизовать вычисления на системах Intel MIC. Эксперименты, проведенные на реальных наборах данных, показали хорошую масштабируемость алгоритма. The PAM (Partitioning Around Medoids) is a partitioning clustering algorithm where each cluster is represented by an object from the input dataset (called a medoid). The medoid-based clustering is used in a wide range of applications: the segmentation of medical and satellite images, the analysis of DNA microarrays and texts, etc. Currently, there are parallel implementations of PAM for GPU and FPGA systems, but not for Intel Many Integrated Core (MIC) accelerators. In this paper, we propose a novel parallel PhiPAM clustering algorithm for Intel MIC systems. Computations are parallelized by the OpenMP technology. The algorithm exploits a sophisticated memory data layout and loop tiling technique, which allows one to efficiently vectorize computations with Intel MIC. Experiments performed on real data sets show a good scalability of the algorithm.


Author(s):  
Я.А. Краева ◽  
М.Л. Цымблер

В настоящее время поиск похожих подпоследовательностей требуется в широком спектре приложений интеллектуального анализа временных рядов: моделирование климата, финансовые прогнозы, медицинские исследования и др. В большинстве указанных приложений при поиске используется мера схожести Dynamic Time Warping (DTW), поскольку на сегодняшний день научное сообщество признает меру DTW одной из лучших для большинства предметных областей. Мера DTW имеет квадратичную вычислительную сложность относительно длины искомой подпоследовательности, в силу чего разработан ряд параллельных алгоритмов ее вычисления на устройствах FPGA и многоядерных ускорителях с архитектурами GPU и Intel MIC. В настоящей статье предлагается новый параллельный алгоритм для поиска похожих подпоследовательностей в сверхбольших временных рядах на кластерных системах с узлами на базе многоядерных процессоров Intel Xeon Phi поколения Knights Landing (KNL). Вычисления распараллеливаются на двух уровнях: на уровне всех узлов кластера - с помощью технологии MPI и в рамках одного узла кластера - с помощью технологии OpenMP. Алгоритм предполагает использование дополнительных структур данных и избыточных вычислений, позволяющих эффективно задействовать возможности векторизации вычислений на процессорных системах Phi KNL. Эксперименты, проведенные на синтетических и реальных наборах данных, показали хорошую масштабируемость алгоритма. Nowadays, the subsequence similarity search is required in a wide range of time series mining applications: climate modeling, financial forecasts, medical research, etc. In most of these applications, the Dynamic Time Warping (DTW) similarity measure is used, since DTW is empirically confirmed as one of the best similarity measures for the majority of subject domains. Since the DTW measure has a quadratic computational complexity with respect to the length of query subsequence, a number of parallel algorithms for various many-core architectures are developed, namely FPGA, GPU, and Intel MIC. In this paper we propose a new parallel algorithm for subsequence similarity search in very large time series on computer cluster systems with nodes based on Intel Xeon Phi Knights Landing (KNL) many-core processors. Computations are parallelized on two levels as follows: by MPI at the level of all cluster nodes and by OpenMP within a single cluster node. The algorithm involves additional data structures and redundant computations, which make it possible to efficiently use the capabilities of vector computations on Phi KNL. Experimental evaluation of the algorithm on real-world and synthetic datasets shows that the proposed algorithm is highly scalable.


2021 ◽  
Vol 13 (16) ◽  
pp. 3069
Author(s):  
Yadong Liu ◽  
Junhwan Kim ◽  
David H. Fleisher ◽  
Kwang Soo Kim

Seasonal forecasts of crop yield are important components for agricultural policy decisions and farmer planning. A wide range of input data are often needed to forecast crop yield in a region where sophisticated approaches such as machine learning and process-based models are used. This requires considerable effort for data preparation in addition to identifying data sources. Here, we propose a simpler approach called the Analogy Based Crop-yield (ABC) forecast scheme to make timely and accurate prediction of regional crop yield using a minimum set of inputs. In the ABC method, a growing season from a prior long-term period, e.g., 10 years, is first identified as analogous to the current season by the use of a similarity index based on the time series leaf area index (LAI) patterns. Crop yield in the given growing season is then forecasted using the weighted yield average reported in the analogous seasons for the area of interest. The ABC approach was used to predict corn and soybean yields in the Midwestern U.S. at the county level for the period of 2017–2019. The MOD15A2H, which is a satellite data product for LAI, was used to compile inputs. The mean absolute percentage error (MAPE) of crop yield forecasts was <10% for corn and soybean in each growing season when the time series of LAI from the day of year 89 to 209 was used as inputs to the ABC approach. The prediction error for the ABC approach was comparable to results from a deep neural network model that relied on soil and weather data as well as satellite data in a previous study. These results indicate that the ABC approach allowed for crop yield forecast with a lead-time of at least two months before harvest. In particular, the ABC scheme would be useful for regions where crop yield forecasts are limited by availability of reliable environmental data.


Author(s):  
Sriram Vangal ◽  
Somnath Paul ◽  
Steven Hsu ◽  
Amit Agarwal ◽  
Saurabh Kumar ◽  
...  

2019 ◽  
Vol 12 (11) ◽  
pp. 4661-4679 ◽  
Author(s):  
Bin Cao ◽  
Xiaojing Quan ◽  
Nicholas Brown ◽  
Emilie Stewart-Jones ◽  
Stephan Gruber

Abstract. Simulations of land-surface processes and phenomena often require driving time series of meteorological variables. Corresponding observations, however, are unavailable in most locations, even more so, when considering the duration, continuity and data quality required. Atmospheric reanalyses provide global coverage of relevant meteorological variables, but their use is largely restricted to grid-based studies. This is because technical challenges limit the ease with which reanalysis data can be applied to models at the site scale. We present the software toolkit GlobSim, which automates the downloading, interpolation and scaling of different reanalyses – currently ERA5, ERA-Interim, JRA-55 and MERRA-2 – to produce meteorological time series for user-defined point locations. The resulting data have consistent structure and units to efficiently support ensemble simulation. The utility of GlobSim is demonstrated using an application in permafrost research. We perform ensemble simulations of ground-surface temperature for 10 terrain types in a remote tundra area in northern Canada and compare the results with observations. Simulation results reproduced seasonal cycles and variation between terrain types well, demonstrating that GlobSim can support efficient land-surface simulations. Ensemble means often yielded better accuracy than individual simulations and ensemble ranges additionally provide indications of uncertainty arising from uncertain input. By improving the usability of reanalyses for research requiring time series of climate variables for point locations, GlobSim can enable a wide range of simulation studies and model evaluations that previously were impeded by technical hurdles in obtaining suitable data.


2021 ◽  
Vol 13 (9) ◽  
pp. 1743
Author(s):  
Daniel Paluba ◽  
Josef Laštovička ◽  
Antonios Mouratidis ◽  
Přemysl Štych

This study deals with a local incidence angle correction method, i.e., the land cover-specific local incidence angle correction (LC-SLIAC), based on the linear relationship between the backscatter values and the local incidence angle (LIA) for a given land cover type in the monitored area. Using the combination of CORINE Land Cover and Hansen et al.’s Global Forest Change databases, a wide range of different LIAs for a specific forest type can be generated for each scene. The algorithm was developed and tested in the cloud-based platform Google Earth Engine (GEE) using Sentinel-1 open access data, Shuttle Radar Topography Mission (SRTM) digital elevation model, and CORINE Land Cover and Hansen et al.’s Global Forest Change databases. The developed method was created primarily for time-series analyses of forests in mountainous areas. LC-SLIAC was tested in 16 study areas over several protected areas in Central Europe. The results after correction by LC-SLIAC showed a reduction of variance and range of backscatter values. Statistically significant reduction in variance (of more than 40%) was achieved in areas with LIA range >50° and LIA interquartile range (IQR) >12°, while in areas with low LIA range and LIA IQR, the decrease in variance was very low and statistically not significant. Six case studies with different LIA ranges were further analyzed in pre- and post-correction time series. Time-series after the correction showed a reduced fluctuation of backscatter values caused by different LIAs in each acquisition path. This reduction was statistically significant (with up to 95% reduction of variance) in areas with a difference in LIA greater than or equal to 27°. LC-SLIAC is freely available on GitHub and GEE, making the method accessible to the wide remote sensing community.


2018 ◽  
Vol 22 (2) ◽  
pp. 1175-1192 ◽  
Author(s):  
Qian Zhang ◽  
Ciaran J. Harman ◽  
James W. Kirchner

Abstract. River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1) fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2) the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling – in the form of spectral slope (β) or other equivalent scaling parameters (e.g., Hurst exponent) – are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1) they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β  =  0) to Brown noise (β  =  2) and (2) their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths) in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb–Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among all methods for a wide range of prescribed β values and gap distributions. The aliasing method, however, does not itself account for sampling irregularity, and this introduces some bias in the result. Nonetheless, the wavelet method is recommended for estimating β in irregular time series until improved methods are developed. Finally, all methods' performances depend strongly on the sampling irregularity, highlighting that the accuracy and precision of each method are data specific. Accurately quantifying the strength of fractal scaling in irregular water-quality time series remains an unresolved challenge for the hydrologic community and for other disciplines that must grapple with irregular sampling.


2021 ◽  
Vol 7 ◽  
pp. e744
Author(s):  
Si Thu Aung ◽  
Yodchanan Wongsawat

Epilepsy is a common neurological disease that affects a wide range of the world population and is not limited by age. Moreover, seizures can occur anytime and anywhere because of the sudden abnormal discharge of brain neurons, leading to malfunction. The seizures of approximately 30% of epilepsy patients cannot be treated with medicines or surgery; hence these patients would benefit from a seizure prediction system to live normal lives. Thus, a system that can predict a seizure before its onset could improve not only these patients’ social lives but also their safety. Numerous seizure prediction methods have already been proposed, but the performance measures of these methods are still inadequate for a complete prediction system. Here, a seizure prediction system is proposed by exploring the advantages of multivariate entropy, which can reflect the complexity of multivariate time series over multiple scales (frequencies), called multivariate multiscale modified-distribution entropy (MM-mDistEn), with an artificial neural network (ANN). The phase-space reconstruction and estimation of the probability density between vectors provide hidden complex information. The multivariate time series property of MM-mDistEn provides more understandable information within the multichannel data and makes it possible to predict of epilepsy. Moreover, the proposed method was tested with two different analyses: simulation data analysis proves that the proposed method has strong consistency over the different parameter selections, and the results from experimental data analysis showed that the proposed entropy combined with an ANN obtains performance measures of 98.66% accuracy, 91.82% sensitivity, 99.11% specificity, and 0.84 area under the curve (AUC) value. In addition, the seizure alarm system was applied as a postprocessing step for prediction purposes, and a false alarm rate of 0.014 per hour and an average prediction time of 26.73 min before seizure onset were achieved by the proposed method. Thus, the proposed entropy as a feature extraction method combined with an ANN can predict the ictal state of epilepsy, and the results show great potential for all epilepsy patients.


Author(s):  
Viacheslav S. Okunev

The main purpose of the work is to determine the possibility of cluster decays of superheavy atomic nuclei. The universality of the principle of similarity allows you to apply it to the analysis of not studied physical processes. Analogies are observed in forced and spontaneous decays of atomic nuclei. It is shown that in two stages, processes initiated by external influence are realized: fragmentation reactions, forced fission of stable nuclei, and impact radioactivity. Nuclear reactions of fragmentation and forced fission of stable isotopes of lead and bismuth are realized under the action of particles (hadrons) and light atomic nuclei with a kinetic energy of more than 108 eV. Shock radioactivity is observed in the collision of macroobjects having a crystalline structure at speeds of at least ∼1 km/s. Also, in two stages, some radioactive decays of atomic nuclei are realized, including extremely rare cluster decays. Based on the analogies of the processes considered, some cautious predictions are made about the possibility of cluster decays of atomic nuclei in a wide range of atomic masses.


2020 ◽  
Author(s):  
Yuan Yuan ◽  
Lei Lin

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>


Sign in / Sign up

Export Citation Format

Share Document