Probabilistic Sequence Mining – Evaluation and Extension of ProMFS Algorithm for Real-Time Problems

2012 ◽  
Vol 58 (4) ◽  
pp. 323-326
Author(s):  
Krzysztof Hryniów ◽  
Andrzej Dzieliński

Abstract Sequential pattern mining is an extensively studied method for data mining. One of new and less documented approaches is estimation of statistical characteristics of sequence for creating model sequences, that can be used to speed up the process of sequence mining. This paper proposes extensive modifications to one of such algorithms, ProMFS (probabilistic algorithm for mining frequent sequences), which notably increases algorithm’s processing speed by a significant reduction of its computational complexity. A new version of algorithm is evaluated for real-life and artificial data sets and proven to be useful in real-time applications and problems.

Author(s):  
Pierre-Alexandre Murena ◽  
Jérémie Sublime ◽  
Basarab Matei ◽  
Antoine Cornuéjols

Clustering is a compression task which consists in grouping similar objects into clusters. In real-life applications, the system may have access to several views of the same data and each view may be processed by a specific clustering algorithm: this framework is called multi-view clustering and can benefit from algorithms capable of exchanging information between the different views. In this paper, we consider this type of unsupervised ensemble learning as a compression problem and develop a theoretical framework based on algorithmic theory of information suitable for multi-view clustering and collaborative clustering applications. Using this approach, we propose a new algorithm based on solid theoretical basis, and test it on several real and artificial data sets.


Author(s):  
O. R. Uwaeme ◽  
N. P. Akpan ◽  
U. C. Orumie

In this study, we proposed a generalization of the Pranav distribution by Shukla (2018). This new distribution called an extended Pranav distribution is obtained using the exponentiation method. The statistical characteristics of this new distribution such as the moments, moment generating function, reliability function, hazard function, Rényi entropy and order statistics are derived. The graphical illustrations of the shapes of the probability density function, the cumulative distribution function, and hazard rate functions are provided. The maximum likelihood estimates of the parameters were obtained and finally, we examine the performance of this new distribution using some real-life data sets to show its flexibility and better goodness of fit as compared with other distributions.


Author(s):  
Carson K. Leung

Big data analytics and mining aims to discover implicit, previously unknown, and potentially useful information and knowledge from big data sets that contain huge volumes of valuable veracious data collected or generated at a high velocity from a wide variety of rich data sources. Among different big data analytic and mining tasks, this chapter focuses on frequent pattern mining. By relying on the MapReduce programming model, researchers only need to specify the “map” and “reduce” functions to discover (organizational) knowledge from (i) big data sets of precise data in a breadth-first manner or depth-first manner and/or from (ii) big data sets of uncertain data. Such a big data analytics process can be sped up by focusing the mining according to the user-specified constraints that express the user interests. The resulting (constrained or unconstrained) frequent patterns mined from big data sets provide users with new insights and a sound understanding of users' patterns. Such (organizational) knowledge is useful is many real-life information science and technology applications.


Author(s):  
Céline Fiot

The explosive growth of collected and stored data has generated a need for new techniques transforming these large amounts of data into useful comprehensible knowledge. Among these techniques, referred to as data mining, sequential pattern approaches handle sequence databases, extracting frequently occurring patterns related to time. Since most real-world databases consist of historical and quantitative data, some works have been done for mining the quantitative information stored within such sequence databases, uncovering fuzzy sequential patterns. In this chapter, we first introduce the various fuzzy sequential pattern approaches and the general principles they are based on. Then, we focus on a complete framework for mining fuzzy sequential patterns handling different levels of consideration of quantitative information. This framework is then applied to two real-life data sets: Web access logs and a textual database. We conclude on a discussion about future trends in fuzzy pattern mining.


Author(s):  
Yi Sun ◽  
Iván Ramírez Díaz ◽  
Alfredo Cuesta Infante ◽  
Kalyan Veeramachaneni

In many real life situations, including job and loan applications, gatekeepers must make justified and fair real-time decisions about a person’s fitness for a particular opportunity. In this paper, we aim to accomplish approximate group fairness in an online stochastic decision-making process, where the fairness metric we consider is equalized odds. Our work follows from the classical learning-from-experts scheme, assuming a finite set of classifiers (human experts, rules, options, etc) that cannot be modified. We run separate instances of the algorithm for each label class as well as sensitive groups, where the probability of choosing each instance is optimized for both fairness and regret. Our theoretical results show that approximately equalized odds can be achieved without sacrificing much regret. We also demonstrate the performance of the algorithm on real data sets commonly used by the fairness community.


2009 ◽  
Vol 14 (2) ◽  
pp. 109-119 ◽  
Author(s):  
Ulrich W. Ebner-Priemer ◽  
Timothy J. Trull

Convergent experimental data, autobiographical studies, and investigations on daily life have all demonstrated that gathering information retrospectively is a highly dubious methodology. Retrospection is subject to multiple systematic distortions (i.e., affective valence effect, mood congruent memory effect, duration neglect; peak end rule) as it is based on (often biased) storage and recollection of memories of the original experience or the behavior that are of interest. The method of choice to circumvent these biases is the use of electronic diaries to collect self-reported symptoms, behaviors, or physiological processes in real time. Different terms have been used for this kind of methodology: ambulatory assessment, ecological momentary assessment, experience sampling method, and real-time data capture. Even though the terms differ, they have in common the use of computer-assisted methodology to assess self-reported symptoms, behaviors, or physiological processes, while the participant undergoes normal daily activities. In this review we discuss the main features and advantages of ambulatory assessment regarding clinical psychology and psychiatry: (a) the use of realtime assessment to circumvent biased recollection, (b) assessment in real life to enhance generalizability, (c) repeated assessment to investigate within person processes, (d) multimodal assessment, including psychological, physiological and behavioral data, (e) the opportunity to assess and investigate context-specific relationships, and (f) the possibility of giving feedback in real time. Using prototypic examples from the literature of clinical psychology and psychiatry, we demonstrate that ambulatory assessment can answer specific research questions better than laboratory or questionnaire studies.


Healthcare ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 234 ◽  
Author(s):  
Hyun Yoo ◽  
Soyoung Han ◽  
Kyungyong Chung

Recently, a massive amount of big data of bioinformation is collected by sensor-based IoT devices. The collected data are also classified into different types of health big data in various techniques. A personalized analysis technique is a basis for judging the risk factors of personal cardiovascular disorders in real-time. The objective of this paper is to provide the model for the personalized heart condition classification in combination with the fast and effective preprocessing technique and deep neural network in order to process the real-time accumulated biosensor input data. The model can be useful to learn input data and develop an approximation function, and it can help users recognize risk situations. For the analysis of the pulse frequency, a fast Fourier transform is applied in preprocessing work. With the use of the frequency-by-frequency ratio data of the extracted power spectrum, data reduction is performed. To analyze the meanings of preprocessed data, a neural network algorithm is applied. In particular, a deep neural network is used to analyze and evaluate linear data. A deep neural network can make multiple layers and can establish an operation model of nodes with the use of gradient descent. The completed model was trained by classifying the ECG signals collected in advance into normal, control, and noise groups. Thereafter, the ECG signal input in real time through the trained deep neural network system was classified into normal, control, and noise. To evaluate the performance of the proposed model, this study utilized a ratio of data operation cost reduction and F-measure. As a result, with the use of fast Fourier transform and cumulative frequency percentage, the size of ECG reduced to 1:32. According to the analysis on the F-measure of the deep neural network, the model had 83.83% accuracy. Given the results, the modified deep neural network technique can reduce the size of big data in terms of computing work, and it is an effective system to reduce operation time.


2021 ◽  
Vol 11 (11) ◽  
pp. 4940
Author(s):  
Jinsoo Kim ◽  
Jeongho Cho

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Julius Žilinskas ◽  
Algirdas Lančinskas ◽  
Mario R. Guarracino

AbstractDuring the COVID-19 pandemic it is essential to test as many people as possible, in order to detect early outbreaks of the infection. Present testing solutions are based on the extraction of RNA from patients using oropharyngeal and nasopharyngeal swabs, and then testing with real-time PCR for the presence of specific RNA filaments identifying the virus. This approach is limited by the availability of reactants, trained technicians and laboratories. One of the ways to speed up the testing procedures is a group testing, where the swabs of multiple patients are grouped together and tested. In this paper we propose to use the group testing technique in conjunction with an advanced replication scheme in which each patient is allocated in two or more groups to reduce the total numbers of tests and to allow testing of even larger numbers of people. Under mild assumptions, a 13 ×  average reduction of tests can be achieved compared to individual testing without delay in time.


Author(s):  
Christian Luksch ◽  
Lukas Prost ◽  
Michael Wimmer

We present a real-time rendering technique for photometric polygonal lights. Our method uses a numerical integration technique based on a triangulation to calculate noise-free diffuse shading. We include a dynamic point in the triangulation that provides a continuous near-field illumination resembling the shape of the light emitter and its characteristics. We evaluate the accuracy of our approach with a diverse selection of photometric measurement data sets in a comprehensive benchmark framework. Furthermore, we provide an extension for specular reflection on surfaces with arbitrary roughness that facilitates the use of existing real-time shading techniques. Our technique is easy to integrate into real-time rendering systems and extends the range of possible applications with photometric area lights.


Sign in / Sign up

Export Citation Format

Share Document