A parallel discord discovery algorithm for time series on many-core accelerators

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v20r320 ◽

2019 ◽

pp. 211-223

Author(s):

М.Л. Цымблер

Keyword(s):

Time Series ◽

Climate Modeling ◽

Main Memory ◽

Wide Range ◽

Euclidean Distances ◽

Nvidia Gpu ◽

Two Stages ◽

Many Core ◽

Intel Mic ◽

Many Integrated Core

Диссонанс является уточнением понятия аномальной подпоследовательности (существенно непохожей на остальные подпоследовательности) временного ряда. Задача поиска диссонанса встречается в широком спектре предметных областей, связанных с временными рядами: медицина, экономика, моделирование климата и др. В работе предложен новый параллельный алгоритм поиска диссонанса во временном ряде на платформе многоядерного ускорителя для случая, когда входные данные могут быть размещены в оперативной памяти. Алгоритм использует возможность независимого вычисления евклидовых расстояний между подпоследовательностями ряда. Алгоритм состоит из двух этапов: подготовка данных и поиск. На этапе подготовки выполняется построение вспомогательных матричных структур данных, обеспечивающих распараллеливание и векторизацию вычислений. На стадии поиска осуществляется нахождение диссонанса с помощью построенных структур данных. Выполнена реализация алгоритма для ускорителей архитектур Intel MIC (Many Integrated Core) и NVIDIA GPU, распараллеливание выполнено с помощью технологий программирования OpenMP и OpenAcc соответственно. Представлены результаты вычислительных экспериментов, подтверждающих масштабируемость разработанного алгоритма. Discord is a refinement of the concept of anomalous subsequence of a time series. The discord discovery problem frequently occurs in a wide range of application areas related to time series: medicine, economics, climate modeling, etc. In this paper we propose a new parallel discord discovery algorithm for many-core systems in the case when the input data fit in the main memory. The algorithm exploits the ability to independently calculate the Euclidean distances between the subsequences of the time series. Computations are paralleled using OpenMP and OpenAcc for the Intel MIC (Many Integrated Core) and NVIDIA GPU platforms, respectively. The algorithm consists of two stages, namely precomputations and discovery. At the precomputation stage, we construct the auxiliary matrix data structures to ensure the efficient vectorization of computations on an accelerator. At the discovery stage, the algorithm searches for a discord based on the constructed structures. A number of numerical experiments confirm a high scalability of the proposed algorithm.

Download Full-text

A parallel data clustering algorithm for Intel MIC accelerators

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v20r211 ◽

2019 ◽

pp. 104-115

Author(s):

Т.В. Речкалов ◽

М.Л. Цымблер

Keyword(s):

Dna Microarrays ◽

Clustering Algorithm ◽

Real Data ◽

Data Sets ◽

Data Layout ◽

Partitioning Around Medoids ◽

Wide Range ◽

Input Dataset ◽

Intel Mic ◽

Many Integrated Core

Алгоритм PAM (Partitioning Around Medoids) представляет собой разделительный алгоритм кластеризации, в котором в качестве центров кластеров выбираются только кластеризуемые объекты (медоиды). Кластеризация на основе техники медоидов применяется в широком спектре приложений: сегментирование медицинских и спутниковых изображений, анализ ДНК-микрочипов и текстов и др. На сегодня имеются параллельные реализации PAM для систем GPU и FPGA, но отсутствуют таковые для многоядерных ускорителей архитектуры Intel Many Integrated Core (MIC). В настоящей статье предлагается новый параллельный алгоритм кластеризации PhiPAM для ускорителей Intel MIC. Вычисления распараллеливаются с помощью технологии OpenMP. Алгоритм предполагает использование специализированной компоновки данных в памяти и техники тайлинга, позволяющих эффективно векторизовать вычисления на системах Intel MIC. Эксперименты, проведенные на реальных наборах данных, показали хорошую масштабируемость алгоритма. The PAM (Partitioning Around Medoids) is a partitioning clustering algorithm where each cluster is represented by an object from the input dataset (called a medoid). The medoid-based clustering is used in a wide range of applications: the segmentation of medical and satellite images, the analysis of DNA microarrays and texts, etc. Currently, there are parallel implementations of PAM for GPU and FPGA systems, but not for Intel Many Integrated Core (MIC) accelerators. In this paper, we propose a novel parallel PhiPAM clustering algorithm for Intel MIC systems. Computations are parallelized by the OpenMP technology. The algorithm exploits a sophisticated memory data layout and loop tiling technique, which allows one to efficiently vectorize computations with Intel MIC. Experiments performed on real data sets show a good scalability of the algorithm.

Download Full-text

The use of MPI and OpenMP technologies for subsequence similarity search in very long time series on a computer cluster system with nodes based on the Intel Xeon Phi Knights Landing many-core processor

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v20r104 ◽

2019 ◽

pp. 29-44

Author(s):

Я.А. Краева ◽

М.Л. Цымблер

Keyword(s):

Time Series ◽

Similarity Search ◽

Xeon Phi ◽

Intel Xeon Phi ◽

Time Warping ◽

Knights Landing ◽

Dynamic Time ◽

Many Core ◽

Intel Mic ◽

Intel Xeon

В настоящее время поиск похожих подпоследовательностей требуется в широком спектре приложений интеллектуального анализа временных рядов: моделирование климата, финансовые прогнозы, медицинские исследования и др. В большинстве указанных приложений при поиске используется мера схожести Dynamic Time Warping (DTW), поскольку на сегодняшний день научное сообщество признает меру DTW одной из лучших для большинства предметных областей. Мера DTW имеет квадратичную вычислительную сложность относительно длины искомой подпоследовательности, в силу чего разработан ряд параллельных алгоритмов ее вычисления на устройствах FPGA и многоядерных ускорителях с архитектурами GPU и Intel MIC. В настоящей статье предлагается новый параллельный алгоритм для поиска похожих подпоследовательностей в сверхбольших временных рядах на кластерных системах с узлами на базе многоядерных процессоров Intel Xeon Phi поколения Knights Landing (KNL). Вычисления распараллеливаются на двух уровнях: на уровне всех узлов кластера - с помощью технологии MPI и в рамках одного узла кластера - с помощью технологии OpenMP. Алгоритм предполагает использование дополнительных структур данных и избыточных вычислений, позволяющих эффективно задействовать возможности векторизации вычислений на процессорных системах Phi KNL. Эксперименты, проведенные на синтетических и реальных наборах данных, показали хорошую масштабируемость алгоритма. Nowadays, the subsequence similarity search is required in a wide range of time series mining applications: climate modeling, financial forecasts, medical research, etc. In most of these applications, the Dynamic Time Warping (DTW) similarity measure is used, since DTW is empirically confirmed as one of the best similarity measures for the majority of subject domains. Since the DTW measure has a quadratic computational complexity with respect to the length of query subsequence, a number of parallel algorithms for various many-core architectures are developed, namely FPGA, GPU, and Intel MIC. In this paper we propose a new parallel algorithm for subsequence similarity search in very large time series on computer cluster systems with nodes based on Intel Xeon Phi Knights Landing (KNL) many-core processors. Computations are parallelized on two levels as follows: by MPI at the level of all cluster nodes and by OpenMP within a single cluster node. The algorithm involves additional data structures and redundant computations, which make it possible to efficiently use the capabilities of vector computations on Phi KNL. Experimental evaluation of the algorithm on real-world and synthetic datasets shows that the proposed algorithm is highly scalable.

Download Full-text

Analogy-Based Crop Yield Forecasts Based on Temporal Similarity of Leaf Area Index

Remote Sensing ◽

10.3390/rs13163069 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3069

Author(s):

Yadong Liu ◽

Junhwan Kim ◽

David H. Fleisher ◽

Kwang Soo Kim

Keyword(s):

Time Series ◽

Leaf Area Index ◽

Leaf Area ◽

Crop Yield ◽

Satellite Data ◽

Growing Season ◽

Environmental Data ◽

Area Index ◽

Current Season ◽

Wide Range

Seasonal forecasts of crop yield are important components for agricultural policy decisions and farmer planning. A wide range of input data are often needed to forecast crop yield in a region where sophisticated approaches such as machine learning and process-based models are used. This requires considerable effort for data preparation in addition to identifying data sources. Here, we propose a simpler approach called the Analogy Based Crop-yield (ABC) forecast scheme to make timely and accurate prediction of regional crop yield using a minimum set of inputs. In the ABC method, a growing season from a prior long-term period, e.g., 10 years, is first identified as analogous to the current season by the use of a similarity index based on the time series leaf area index (LAI) patterns. Crop yield in the given growing season is then forecasted using the weighted yield average reported in the analogous seasons for the area of interest. The ABC approach was used to predict corn and soybean yields in the Midwestern U.S. at the county level for the period of 2017–2019. The MOD15A2H, which is a satellite data product for LAI, was used to compile inputs. The mean absolute percentage error (MAPE) of crop yield forecasts was <10% for corn and soybean in each growing season when the time series of LAI from the day of year 89 to 209 was used as inputs to the ABC approach. The prediction error for the ABC approach was comparable to results from a deep neural network model that relied on soil and weather data as well as satellite data in a previous study. These results indicate that the ABC approach allowed for crop yield forecast with a lead-time of at least two months before harvest. In particular, the ABC scheme would be useful for regions where crop yield forecasts are limited by availability of reliable environmental data.

Download Full-text

Wide-Range Many-Core SoC Design in Scaled CMOS: Challenges and Opportunities

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/tvlsi.2021.3061649 ◽

2021 ◽

Vol 29 (5) ◽

pp. 843-856

Author(s):

Sriram Vangal ◽

Somnath Paul ◽

Steven Hsu ◽

Amit Agarwal ◽

Saurabh Kumar ◽

...

Keyword(s):

Wide Range ◽

Challenges And Opportunities ◽

Many Core ◽

Scaled Cmos

Download Full-text

GlobSim (v1.0): deriving meteorological time series for point locations from multiple global reanalyses

Geoscientific Model Development ◽

10.5194/gmd-12-4661-2019 ◽

2019 ◽

Vol 12 (11) ◽

pp. 4661-4679 ◽

Cited By ~ 3

Author(s):

Bin Cao ◽

Xiaojing Quan ◽

Nicholas Brown ◽

Emilie Stewart-Jones ◽

Stephan Gruber

Keyword(s):

Time Series ◽

Land Surface ◽

Ground Surface ◽

Reanalysis Data ◽

Meteorological Variables ◽

Ensemble Simulations ◽

Driving Time ◽

Wide Range ◽

Global Coverage ◽

Uncertain Input

Abstract. Simulations of land-surface processes and phenomena often require driving time series of meteorological variables. Corresponding observations, however, are unavailable in most locations, even more so, when considering the duration, continuity and data quality required. Atmospheric reanalyses provide global coverage of relevant meteorological variables, but their use is largely restricted to grid-based studies. This is because technical challenges limit the ease with which reanalysis data can be applied to models at the site scale. We present the software toolkit GlobSim, which automates the downloading, interpolation and scaling of different reanalyses – currently ERA5, ERA-Interim, JRA-55 and MERRA-2 – to produce meteorological time series for user-defined point locations. The resulting data have consistent structure and units to efficiently support ensemble simulation. The utility of GlobSim is demonstrated using an application in permafrost research. We perform ensemble simulations of ground-surface temperature for 10 terrain types in a remote tundra area in northern Canada and compare the results with observations. Simulation results reproduced seasonal cycles and variation between terrain types well, demonstrating that GlobSim can support efficient land-surface simulations. Ensemble means often yielded better accuracy than individual simulations and ensemble ranges additionally provide indications of uncertainty arising from uncertain input. By improving the usability of reanalyses for research requiring time series of climate variables for point locations, GlobSim can enable a wide range of simulation studies and model evaluations that previously were impeded by technical hurdles in obtaining suitable data.

Download Full-text

Land Cover-Specific Local Incidence Angle Correction: A Method for Time-Series Analysis of Forest Ecosystems

Remote Sensing ◽

10.3390/rs13091743 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1743

Author(s):

Daniel Paluba ◽

Josef Laštovička ◽

Antonios Mouratidis ◽

Přemysl Štych

Keyword(s):

Time Series ◽

Land Cover ◽

Forest Type ◽

Incidence Angle ◽

Correction Method ◽

Google Earth ◽

Forest Change ◽

Corine Land Cover ◽

Angle Correction ◽

Wide Range

This study deals with a local incidence angle correction method, i.e., the land cover-specific local incidence angle correction (LC-SLIAC), based on the linear relationship between the backscatter values and the local incidence angle (LIA) for a given land cover type in the monitored area. Using the combination of CORINE Land Cover and Hansen et al.’s Global Forest Change databases, a wide range of different LIAs for a specific forest type can be generated for each scene. The algorithm was developed and tested in the cloud-based platform Google Earth Engine (GEE) using Sentinel-1 open access data, Shuttle Radar Topography Mission (SRTM) digital elevation model, and CORINE Land Cover and Hansen et al.’s Global Forest Change databases. The developed method was created primarily for time-series analyses of forests in mountainous areas. LC-SLIAC was tested in 16 study areas over several protected areas in Central Europe. The results after correction by LC-SLIAC showed a reduction of variance and range of backscatter values. Statistically significant reduction in variance (of more than 40%) was achieved in areas with LIA range >50° and LIA interquartile range (IQR) >12°, while in areas with low LIA range and LIA IQR, the decrease in variance was very low and statistically not significant. Six case studies with different LIA ranges were further analyzed in pre- and post-correction time series. Time-series after the correction showed a reduced fluctuation of backscatter values caused by different LIAs in each acquisition path. This reduction was statistically significant (with up to 95% reduction of variance) in areas with a difference in LIA greater than or equal to 27°. LC-SLIAC is freely available on GitHub and GEE, making the method accessible to the wide remote sensing community.

Download Full-text

Evaluation of statistical methods for quantifying fractal scaling in water-quality time series with irregular sampling

Hydrology and Earth System Sciences ◽

10.5194/hess-22-1175-2018 ◽

2018 ◽

Vol 22 (2) ◽

pp. 1175-1192 ◽

Cited By ~ 2

Author(s):

Qian Zhang ◽

Ciaran J. Harman ◽

James W. Kirchner

Keyword(s):

Water Quality ◽

Time Series ◽

Irregular Sampling ◽

Quality Data ◽

Sampled Data ◽

Spectral Slope ◽

Water Quality Data ◽

Fractal Scaling ◽

Wide Range ◽

Irregularly Sampled Data

Abstract. River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1) fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2) the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling – in the form of spectral slope (β) or other equivalent scaling parameters (e.g., Hurst exponent) – are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1) they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β  =  0) to Brown noise (β  =  2) and (2) their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths) in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb–Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among all methods for a wide range of prescribed β values and gap distributions. The aliasing method, however, does not itself account for sampling irregularity, and this introduces some bias in the result. Nonetheless, the wavelet method is recommended for estimating β in irregular time series until improved methods are developed. Finally, all methods' performances depend strongly on the sampling irregularity, highlighting that the accuracy and precision of each method are data specific. Accurately quantifying the strength of fractal scaling in irregular water-quality time series remains an unresolved challenge for the hydrologic community and for other disciplines that must grapple with irregular sampling.

Download Full-text

Prediction of epileptic seizures based on multivariate multiscale modified-distribution entropy

PeerJ Computer Science ◽

10.7717/peerj-cs.744 ◽

2021 ◽

Vol 7 ◽

pp. e744

Author(s):

Si Thu Aung ◽

Yodchanan Wongsawat

Keyword(s):

Time Series ◽

Data Analysis ◽

Performance Measures ◽

Multiple Scales ◽

Multivariate Time Series ◽

World Population ◽

Prediction System ◽

Seizure Prediction ◽

Feature Extraction Method ◽

Wide Range

Epilepsy is a common neurological disease that affects a wide range of the world population and is not limited by age. Moreover, seizures can occur anytime and anywhere because of the sudden abnormal discharge of brain neurons, leading to malfunction. The seizures of approximately 30% of epilepsy patients cannot be treated with medicines or surgery; hence these patients would benefit from a seizure prediction system to live normal lives. Thus, a system that can predict a seizure before its onset could improve not only these patients’ social lives but also their safety. Numerous seizure prediction methods have already been proposed, but the performance measures of these methods are still inadequate for a complete prediction system. Here, a seizure prediction system is proposed by exploring the advantages of multivariate entropy, which can reflect the complexity of multivariate time series over multiple scales (frequencies), called multivariate multiscale modified-distribution entropy (MM-mDistEn), with an artificial neural network (ANN). The phase-space reconstruction and estimation of the probability density between vectors provide hidden complex information. The multivariate time series property of MM-mDistEn provides more understandable information within the multichannel data and makes it possible to predict of epilepsy. Moreover, the proposed method was tested with two different analyses: simulation data analysis proves that the proposed method has strong consistency over the different parameter selections, and the results from experimental data analysis showed that the proposed entropy combined with an ANN obtains performance measures of 98.66% accuracy, 91.82% sensitivity, 99.11% specificity, and 0.84 area under the curve (AUC) value. In addition, the seizure alarm system was applied as a postprocessing step for prediction purposes, and a false alarm rate of 0.014 per hour and an average prediction time of 26.73 min before seizure onset were achieved by the proposed method. Thus, the proposed entropy as a feature extraction method combined with an ANN can predict the ictal state of epilepsy, and the results show great potential for all epilepsy patients.

Download Full-text

Spontaneous and forced decay of atomic nuclei realized in two stages

Journal of the Belarusian State University. Physics ◽

10.33581/2520-2243-2020-3-122-135 ◽

2020 ◽

pp. 122-135

Author(s):

Viacheslav S. Okunev

Keyword(s):

Stable Isotopes ◽

Kinetic Energy ◽

Nuclear Reactions ◽

External Influence ◽

Physical Processes ◽

Atomic Nuclei ◽

Wide Range ◽

Fragmentation Reactions ◽

Two Stages ◽

Principle Of Similarity

The main purpose of the work is to determine the possibility of cluster decays of superheavy atomic nuclei. The universality of the principle of similarity allows you to apply it to the analysis of not studied physical processes. Analogies are observed in forced and spontaneous decays of atomic nuclei. It is shown that in two stages, processes initiated by external influence are realized: fragmentation reactions, forced fission of stable nuclei, and impact radioactivity. Nuclear reactions of fragmentation and forced fission of stable isotopes of lead and bismuth are realized under the action of particles (hadrons) and light atomic nuclei with a kinetic energy of more than 108 eV. Shock radioactivity is observed in the collision of macroobjects having a crystalline structure at speeds of at least ∼1 km/s. Also, in two stages, some radioactive decays of atomic nuclei are realized, including extremely rare cluster decays. Based on the analogies of the processes considered, some cautious predictions are made about the possibility of cluster decays of atomic nuclei in a wide range of atomic masses.

Download Full-text

Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification

10.36227/techrxiv.13025039.v1 ◽

2020 ◽

Author(s):

Yuan Yuan ◽

Lei Lin

Keyword(s):

Time Series ◽

Deep Learning ◽

Large Scale ◽

Temporal Structure ◽

Satellite Image ◽

Fine Tuning ◽

Small Scale ◽

Model Parameters ◽

Learning Approaches ◽

Wide Range

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>

Download Full-text