Product formalisms for measures on spaces with binary tree structures: representation, visualization, and multiscale noise

Abstract In this paper, we present a theoretical foundation for a representation of a data set as a measure in a very large hierarchically parametrized family of positive measures, whose parameters can be computed explicitly (rather than estimated by optimization), and illustrate its applicability to a wide range of data types. The preprocessing step then consists of representing data sets as simple measures. The theoretical foundation consists of a dyadic product formula representation lemma, and a visualization theorem. We also define an additive multiscale noise model that can be used to sample from dyadic measures and a more general multiplicative multiscale noise model that can be used to perturb continuous functions, Borel measures, and dyadic measures. The first two results are based on theorems in [15, 3, 1]. The representation uses the very simple concept of a dyadic tree and hence is widely applicable, easily understood, and easily computed. Since the data sample is represented as a measure, subsequent analysis can exploit statistical and measure theoretic concepts and theories. Because the representation uses the very simple concept of a dyadic tree defined on the universe of a data set, and the parameters are simply and explicitly computable and easily interpretable and visualizable, we hope that this approach will be broadly useful to mathematicians, statisticians, and computer scientists who are intrigued by or involved in data science, including its mathematical foundations.

Download Full-text

Estimating upper-extremity function from kinematics in stroke patients following goal-oriented computer-based training

Journal of NeuroEngineering and Rehabilitation ◽

10.1186/s12984-021-00971-8 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Belén Rubio Ballester ◽

Fabrizio Antenucci ◽

Martina Maier ◽

Anthony C. C. Coolen ◽

Paul F. M. J. Verschure

Keyword(s):

Data Science ◽

Objective Assessment ◽

Noise Model ◽

Alternative Methods ◽

Reaching Movements ◽

Stroke Patients ◽

Kinematic Data ◽

Work Area ◽

Wide Range ◽

Computer Based

Abstract Introduction After a stroke, a wide range of deficits can occur with varying onset latencies. As a result, assessing impairment and recovery are enormous challenges in neurorehabilitation. Although several clinical scales are generally accepted, they are time-consuming, show high inter-rater variability, have low ecological validity, and are vulnerable to biases introduced by compensatory movements and action modifications. Alternative methods need to be developed for efficient and objective assessment. In this study, we explore the potential of computer-based body tracking systems and classification tools to estimate the motor impairment of the more affected arm in stroke patients. Methods We present a method for estimating clinical scores from movement parameters that are extracted from kinematic data recorded during unsupervised computer-based rehabilitation sessions. We identify a number of kinematic descriptors that characterise the patients’ hemiparesis (e.g., movement smoothness, work area), we implement a double-noise model and perform a multivariate regression using clinical data from 98 stroke patients who completed a total of 191 sessions with RGS. Results Our results reveal a new digital biomarker of arm function, the Total Goal-Directed Movement (TGDM), which relates to the patients work area during the execution of goal-oriented reaching movements. The model’s performance to estimate FM-UE scores reaches an accuracy of $$R^2$$ R 2 : 0.38 with an error ($$\sigma$$ σ : 12.8). Next, we evaluate its reliability ($$r=0.89$$ r = 0.89 for test-retest), longitudinal external validity ($$95\%$$ 95 % true positive rate), sensitivity, and generalisation to other tasks that involve planar reaching movements ($$R^2$$ R 2 : 0.39). The model achieves comparable accuracy also for the Chedoke Arm and Hand Activity Inventory ($$R^2$$ R 2 : 0.40) and Barthel Index ($$R^2$$ R 2 : 0.35). Conclusions Our results highlight the clinical value of kinematic data collected during unsupervised goal-oriented motor training with the RGS combined with data science techniques, and provide new insight into factors underlying recovery and its biomarkers.

Download Full-text

A public data set of spatio-temporal match events in soccer competitions

Scientific Data ◽

10.1038/s41597-019-0247-7 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 12

Author(s):

Luca Pappalardo ◽

Paolo Cintia ◽

Alessio Rossi ◽

Emanuele Massucco ◽

Paolo Ferragina ◽

...

Keyword(s):

Data Science ◽

Social Systems ◽

Team Sports ◽

Data Set ◽

Evaluation Of Performance ◽

Public Data ◽

Wide Range ◽

Spatio Temporal ◽

Measurement And Evaluation ◽

Temporal Events

Abstract Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of sensing technologies that provide high-fidelity data streams for every match. Unfortunately, these detailed data are owned by specialized companies and hence are rarely publicly available for scientific research. To fill this gap, this paper describes the largest open collection of soccer-logs ever released, containing all the spatio-temporal events (passes, shots, fouls, etc.) that occured during each match for an entire season of seven prominent soccer competitions. Each match event contains information about its position, time, outcome, player and characteristics. The nature of team sports like soccer, halfway between the abstraction of a game and the reality of complex social systems, combined with the unique size and composition of this dataset, provide an ideal ground for tackling a wide range of data science problems, including the measurement and evaluation of performance, both at individual and at collective level, and the determinants of success and failure.

Download Full-text

Improvement of images by using graduate transformations of their Fourier depictions

Technology audit and production reserves ◽

10.15587/2706-5448.2021.230079 ◽

2021 ◽

Vol 2 (2(58)) ◽

pp. 16-19

Author(s):

Ihor Polovynko ◽

Lubomyr Kniazevich

Keyword(s):

Fourier Transform ◽

Image Enhancement ◽

Data Science ◽

Visual Analysis ◽

Inverse Fourier Transform ◽

Signal Amplitude ◽

Initial Image ◽

Data Set ◽

Space Images ◽

Wide Range

The object of research is low-quality digital images. The presented work is devoted to the problem of digital processing of low quality images, which is one of the most important tasks of data science in the field of extracting useful information from a large data set. It is proposed to carry out the process of image enhancement by means of tonal processing of their Fourier images. The basis for this approach is the fact that Fourier images are described by brightness values in a wide range of values, which can be significantly reduced by gradation transformations. The work carried out the Fourier transform of the image with the separation of the amplitude and phase. The important role of the phase in the process of forming the image obtained after the implementation of the inverse Fourier transform is shown. Although the information about the signal amplitude is lost during the phase analysis, nevertheless all the main details correspond accurately to the initial image. This suggests that when modifying the Fourier spectra of images, it is necessary to take into account the effect on both the amplitude and the phase of the object under study. The effectiveness of the proposed method is demonstrated by the example of space images of the Earth's surface. It is shown that after the gradation logarithmic Fourier transform of the image and the inverse Fourier transform, an image is obtained that is more contrasting than the original one, will certainly facilitate the work with it in the process of visual analysis. To explain the results obtained, the schedule of the obtained gradation transformation into the Mercator series was carried out. It is shown that the resulting image consists of two parts. The first of them corresponds to the reproduction of the original image obtained by the inverse Fourier transform, and the second performs smoothing of its brightness, similar to the action of the combined method of spatial image enhancement. When using the proposed method, preprocessing is also necessary, which, as a rule, includes operations necessary for centering the Fourier image, as well as converting the original data into floating point format.

Download Full-text

Feature Engineering for Various Data Types in Data Science

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch001 ◽

2021 ◽

pp. 1-16

Author(s):

Nilesh Kumar Sahu ◽

Manorama Patnaik ◽

Itu Snigdh

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Input Data ◽

Data Science ◽

Learning Algorithm ◽

Feature Engineering ◽

Machine Learning Algorithm ◽

Data Types ◽

Data Set ◽

Machine Learning Models

The precision of any machine learning algorithm depends on the data set, its suitability, and its volume. Therefore, data and its characteristics have currently become the predominant components of any predictive or precision-based domain like machine learning. Feature engineering refers to the process of changing and preparing this input data so that it is ready for training machine learning models. Several features such as categorical, numerical, mixed, date, and time are to be considered for feature extraction in feature engineering. Datasets containing characteristics such as cardinality, missing data, and rare labels for categorical features, distribution, outliers, and magnitude are currently considered as features. This chapter discusses various data types and their techniques for applying to feature engineering. This chapter also focuses on the implementation of various data techniques for feature extraction.

Download Full-text

Structure Guided Molecular Docking Assisted Alignment Dependent 3DQSAR Study on Steroidal Aromatase Inhibitors (SAIs) as Anti-breast Cancer Agents

Letters in Drug Design & Discovery ◽

10.2174/1570180815666181010101024 ◽

2019 ◽

Vol 16 (7) ◽

pp. 808-817 ◽

Cited By ~ 3

Author(s):

Laxmi Banjare ◽

Sant Kumar Verma ◽

Akhlesh Kumar Jain ◽

Suresh Thareja

Keyword(s):

Breast Cancer ◽

Molecular Docking ◽

Aromatase Inhibitors ◽

Direct Synthesis ◽

3D Qsar ◽

Chemotherapeutic Agents ◽

Modeling Tools ◽

Data Set ◽

Wide Range ◽

Steroidal Aromatase Inhibitors

Background: In spite of the availability of various treatment approaches including surgery, radiotherapy, and hormonal therapy, the steroidal aromatase inhibitors (SAIs) play a significant role as chemotherapeutic agents for the treatment of estrogen-dependent breast cancer with the benefit of reduced risk of recurrence. However, due to greater toxicity and side effects associated with currently available anti-breast cancer agents, there is emergent requirement to develop target-specific AIs with safer anti-breast cancer profile. Methods: It is challenging task to design target-specific and less toxic SAIs, though the molecular modeling tools viz. molecular docking simulations and QSAR have been continuing for more than two decades for the fast and efficient designing of novel, selective, potent and safe molecules against various biological targets to fight the number of dreaded diseases/disorders. In order to design novel and selective SAIs, structure guided molecular docking assisted alignment dependent 3D-QSAR studies was performed on a data set comprises of 22 molecules bearing steroidal scaffold with wide range of aromatase inhibitory activity. Results: 3D-QSAR model developed using molecular weighted (MW) extent alignment approach showed good statistical quality and predictive ability when compared to model developed using moments of inertia (MI) alignment approach. Conclusion: The explored binding interactions and generated pharmacophoric features (steric and electrostatic) of steroidal molecules could be exploited for further design, direct synthesis and development of new potential safer SAIs, that can be effective to reduce the mortality and morbidity associated with breast cancer.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text

A Self-Spatial Adaptive Weighting Based U-Net for Image Segmentation

Electronics ◽

10.3390/electronics10030348 ◽

2021 ◽

Vol 10 (3) ◽

pp. 348

Author(s):

Choongsang Cho ◽

Young Han Lee ◽

Jongyoul Park ◽

Sangkeun Lee

Keyword(s):

Image Segmentation ◽

Medical Image ◽

Medical Image Segmentation ◽

Feature Maps ◽

Data Set ◽

Feature Map ◽

Adaptive Weighting ◽

Spatially Adaptive ◽

Wide Range ◽

Decoder Architecture

Semantic image segmentation has a wide range of applications. When it comes to medical image segmentation, its accuracy is even more important than those of other areas because the performance gives useful information directly applicable to disease diagnosis, surgical planning, and history monitoring. The state-of-the-art models in medical image segmentation are variants of encoder-decoder architecture, which is called U-Net. To effectively reflect the spatial features in feature maps in encoder-decoder architecture, we propose a spatially adaptive weighting scheme for medical image segmentation. Specifically, the spatial feature is estimated from the feature maps, and the learned weighting parameters are obtained from the computed map, since segmentation results are predicted from the feature map through a convolutional layer. Especially in the proposed networks, the convolutional block for extracting the feature map is replaced with the widely used convolutional frameworks: VGG, ResNet, and Bottleneck Resent structures. In addition, a bilinear up-sampling method replaces the up-convolutional layer to increase the resolution of the feature map. For the performance evaluation of the proposed architecture, we used three data sets covering different medical imaging modalities. Experimental results show that the network with the proposed self-spatial adaptive weighting block based on the ResNet framework gave the highest IoU and DICE scores in the three tasks compared to other methods. In particular, the segmentation network combining the proposed self-spatially adaptive block and ResNet framework recorded the highest 3.01% and 2.89% improvements in IoU and DICE scores, respectively, in the Nerve data set. Therefore, we believe that the proposed scheme can be a useful tool for image segmentation tasks based on the encoder-decoder architecture.

Download Full-text

Using Satellite Data for CBRN (Chemical, Biological, Radiological, and Nuclear) Threat Detection, Monitoring, and Modelling

Surveys in Geophysics ◽

10.1007/s10712-021-09637-5 ◽

2021 ◽

Author(s):

Gary Sutlieff ◽

Lucy Berthoud ◽

Mark Stinchcombe

Keyword(s):

Satellite Data ◽

Meteorological Data ◽

Atmospheric Composition ◽

Chemical Detection ◽

List Type ◽

Data Types ◽

Wide Range ◽

Future Technologies ◽

The Impact ◽

Near Future

Abstract CBRN (Chemical, Biological, Radiological, and Nuclear) threats are becoming more prevalent, as more entities gain access to modern weapons and industrial technologies and chemicals. This has produced a need for improvements to modelling, detection, and monitoring of these events. While there are currently no dedicated satellites for CBRN purposes, there are a wide range of possibilities for satellite data to contribute to this field, from atmospheric composition and chemical detection to cloud cover, land mapping, and surface property measurements. This study looks at currently available satellite data, including meteorological data such as wind and cloud profiles, surface properties like temperature and humidity, chemical detection, and sounding. Results of this survey revealed several gaps in the available data, particularly concerning biological and radiological detection. The results also suggest that publicly available satellite data largely does not meet the requirements of spatial resolution, coverage, and latency that CBRN detection requires, outside of providing terrain use and building height data for constructing models. Lastly, the study evaluates upcoming instruments, platforms, and satellite technologies to gauge the impact these developments will have in the near future. Improvements in spatial and temporal resolution as well as latency are already becoming possible, and new instruments will fill in the gaps in detection by imaging a wider range of chemicals and other agents and by collecting new data types. This study shows that with developments coming within the next decade, satellites should begin to provide valuable augmentations to CBRN event detection and monitoring. Article Highlights There is a wide range of existing satellite data in fields that are of interest to CBRN detection and monitoring. The data is mostly of insufficient quality (resolution or latency) for the demanding requirements of CBRN modelling for incident control. Future technologies and platforms will improve resolution and latency, making satellite data more viable in the CBRN management field

Download Full-text

Prediction of Wide Range Two-Dimensional Refractivity Using an IDW Interpolation Method from High-Altitude Refractivity Data of Multiple Meteorological Observatories

Applied Sciences ◽

10.3390/app11041431 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1431

Author(s):

Sungsik Wang ◽

Tae Heung Lim ◽

Kyoungsoo Oh ◽

Chulhun Seo ◽

Hosung Choo

Keyword(s):

High Altitude ◽

Korean Peninsula ◽

Interpolation Method ◽

Atmospheric Condition ◽

Propagation Path ◽

Two Dimensional ◽

Data Set ◽

Wide Range ◽

Atmospheric Data ◽

Terrain Surface

This article proposes a method for the prediction of wide range two-dimensional refractivity for synthetic aperture radar (SAR) applications, using an inverse distance weighted (IDW) interpolation of high-altitude radio refractivity data from multiple meteorological observatories. The radio refractivity is extracted from an atmospheric data set of twenty meteorological observatories around the Korean Peninsula along a given altitude. Then, from the sparse refractive data, the two-dimensional regional radio refractivity of the entire Korean Peninsula is derived using the IDW interpolation, in consideration of the curvature of the Earth. The refractivities of the four seasons in 2019 are derived at the locations of seven meteorological observatories within the Korean Peninsula, using the refractivity data from the other nineteen observatories. The atmospheric refractivities on 15 February 2019 are then evaluated across the entire Korean Peninsula, using the atmospheric data collected from the twenty meteorological observatories. We found that the proposed IDW interpolation has the lowest average, the lowest average root-mean-square error (RMSE) of ∇M (gradient of M), and more continuous results than other methods. To compare the resulting IDW refractivity interpolation for airborne SAR applications, all the propagation path losses across Pohang and Heuksando are obtained using the standard atmospheric condition of ∇M = 118 and the observation-based interpolated atmospheric conditions on 15 February 2019. On the terrain surface ranging from 90 km to 190 km, the average path losses in the standard and derived conditions are 179.7 dB and 182.1 dB, respectively. Finally, based on the air-to-ground scenario in the SAR application, two-dimensional illuminated field intensities on the terrain surface are illustrated.

Download Full-text