Utility Mining Across Multi-Dimensional Sequences

Knowledge extraction from database is the fundamental task in database and data mining community, which has been applied to a wide range of real-world applications and situations. Different from the support-based mining models, the utility-oriented mining framework integrates the utility theory to provide more informative and useful patterns. Time-dependent sequence data are commonly seen in real life. Sequence data have been widely utilized in many applications, such as analyzing sequential user behavior on the Web, influence maximization, route planning, and targeted marketing. Unfortunately, all the existing algorithms lose sight of the fact that the processed data not only contain rich features (e.g., occur quantity, risk, and profit), but also may be associated with multi-dimensional auxiliary information, e.g., transaction sequence can be associated with purchaser profile information. In this article, we first formulate the problem of utility mining across multi-dimensional sequences, and propose a novel framework named MDUS to extract <underline>M</underline>ulti-<underline>D</underline>imensional <underline>U</underline>tility-oriented <underline>S</underline>equential useful patterns. To the best of our knowledge, this is the first study that incorporates the time-dependent sequence-order, quantitative information, utility factor, and auxiliary dimension. Two algorithms respectively named MDUS EM and MDUS SD are presented to address the formulated problem. The former algorithm is based on database transformation, and the later one performs pattern joins and a searching method to identify desired patterns across multi-dimensional sequences. Extensive experiments are carried on six real-life datasets and one synthetic dataset to show that the proposed algorithms can effectively and efficiently discover the useful knowledge from multi-dimensional sequential databases. Moreover, the MDUS framework can provide better insight, and it is more adaptable to real-life situations than the current existing models.

Download Full-text

On-Shelf Utility Mining of Sequence Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3457570 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-31

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yuting Yang ◽

Wensheng Gan ◽

Philip S. Yu

Keyword(s):

High Efficiency ◽

Sequence Data ◽

Real Life ◽

Search Space ◽

Upper Bounds ◽

Utility Mining ◽

Limited Memory ◽

Time Periods ◽

High Utility ◽

Synthetic Datasets

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.

Download Full-text

Time Series Chaos Detection and Assessment via Scale Dependent Lyapunov Exponent

International Journal of Statistics and Probability ◽

10.5539/ijsp.v5n6p1 ◽

2016 ◽

Vol 5 (6) ◽

pp. 1

Author(s):

Livio Fenga

Keyword(s):

Time Series ◽

Lyapunov Exponent ◽

Market Price ◽

Real Life ◽

Quantitative Information ◽

Small Sample ◽

Experimental Conditions ◽

Practical Applications ◽

Chaos Detection ◽

Wide Range

Many dynamical systems in a wide range of disciplines -- such as engineering, economy and biology -- exhibit complex behaviors generated by nonlinear components which might result in deterministic chaos. While in lab--controlled setups its detection and level estimation is in general a doable task, usually the same does not hold for many practical applications. This is because experimental conditions imply facts like low signal--to--noise ratios, small sample sizes and not--repeatability of the experiment, so that the performances of the tools commonly employed for chaos detection can be seriously affected. To tackle this problem, a combined approach based on wavelet and chaos theory is proposed. This is a procedure designed to provide the analyst with qualitative and quantitative information, hopefully conducive to a better understanding of the dynamical system the time series under investigation is generated from. The chaos detector considered is the well known Lyapunov Exponent. A real life application, using the Italian Electric Market price index, is employed to corroborate the validity of the proposed approach.

Download Full-text

Time-Dependent Graphs: Definitions, Applications, and Algorithms

Data Science and Engineering ◽

10.1007/s41019-019-00105-0 ◽

2019 ◽

Vol 4 (4) ◽

pp. 352-366 ◽

Cited By ~ 12

Author(s):

Yishu Wang ◽

Ye Yuan ◽

Yuliang Ma ◽

Guoren Wang

Keyword(s):

Dynamic Systems ◽

Topological Structure ◽

Real Life ◽

Route Planning ◽

Time Dependent ◽

Time Varying ◽

Graph Structure ◽

Evolving Graphs ◽

Broad Concept ◽

Temporal Graphs

Abstract A time-dependent graph is, informally speaking, a graph structure dynamically changes with time. In such graphs, the weights associated with edges dynamically change over time, that is, the edges in such graphs are activated by sequences of time-dependent elements. Many real-life scenarios can be better modeled by time-dependent graphs, such as bioinformatics networks, transportation networks, and social networks. In particular, the time-dependent graph is a very broad concept, which is reflected in the related research with many names, including temporal graphs, evolving graphs, time-varying graphs, historical graphs, and so on. Though static graphs have been extensively studied, for their time-dependent generalizations, we are still far from a complete and mature theory of models and algorithms. In this paper, we discuss the definition and topological structure of time-dependent graphs, as well as models for their relationship to dynamic systems. In addition, we review some classic problems on time-dependent graphs, e.g., route planning, social analysis, and subgraph problem (including matching and mining). We also introduce existing time-dependent systems and summarize their advantages and limitations. We try to keep the descriptions consistent as much as possible and we hope the survey can help practitioners to understand existing time-dependent techniques.

Download Full-text

Home Learning in Times of COVID: Experiences of Parents

Journal of Education and Educational Development ◽

10.22555/joeed.v7i1.3260 ◽

2020 ◽

Vol 7 (1) ◽

pp. 9 ◽

Cited By ~ 3

Author(s):

Shelina Bhamani ◽

Areeba Zainab Makhdoom ◽

Vardah Bharuchi ◽

Nasreen Ali ◽

Sidra Kaleem ◽

...

Keyword(s):

Real Life ◽

Sampling Technique ◽

Online Classes ◽

Google Docs ◽

Learning Gap ◽

Wide Range ◽

Home Learning ◽

Collection Data ◽

Learning At Home ◽

At Home

<p align="center"><em>The widespread prevalence of COVID-19 pandemic has affected academia and parents alike. Due to the sudden closure of schools, students are missing social interaction which is vital for better learning and grooming while most schools have started online classes. This has become a tough routine for the parents working online at home since they have to ensure their children’s education. The study presented was designed to explore the experiences of home learning in times of COVID-19. A descriptive qualitative study was planned to explore the experiences of parents about home learning and management during COVID-19 to get an insight into real-life experiences. Purposive sampling technique was used for data collection. Data were collected from 19 parents falling in the inclusion criteria. Considering the lockdown problem, the data were collected via Google docs form with open-ended questions related to COVID-19 and home learning. Three major themes emerged after the data analysis: impact of COVID on children learning; support given by schools; and strategies used by caregivers at home to support learning. It was analyzed that the entire nation and academicians around the world have come forward to support learning at home offering a wide range of free online avenues to support parents to facilitate home-learning. Furthermore, parents too have adapted quickly to address the learning gap that have emerged in their children’s learning in these challenging times. Measures should be adopted to provide essential learning skills to children at home. Centralized data dashboards and educational technology may be used to keep the students, parents and schools updated.</em></p>

Download Full-text

mtDNAcombine: tools to combine sequences from multiple studies

BMC Bioinformatics ◽

10.1186/s12859-021-04048-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Eleanor F. Miller ◽

Andrea Manica

Keyword(s):

Sequence Data ◽

Data Extraction ◽

Bayesian Skyline Plot ◽

Model Organisms ◽

Data Sets ◽

Data Handling ◽

Online Database ◽

Genetic Studies ◽

Wide Range ◽

Existing Data

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.

Download Full-text

A Systematic Review and Qualitative Synthesis Resulting in a Typology of Elementary Classroom Movement Integration Interventions

Sports Medicine - Open ◽

10.1186/s40798-019-0218-8 ◽

2020 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Spyridoula Vazou ◽

Collin A. Webster ◽

Gregory Stewart ◽

Priscila Candal ◽

Cate A. Egan ◽

...

Keyword(s):

Physical Activity ◽

Teacher Collaboration ◽

A Priori ◽

Real Life ◽

Dose Intensity ◽

Routine Practice ◽

Qualitative Synthesis ◽

Wide Range ◽

Meta Analyses ◽

Movement Integration

Abstract Background/Objective Movement integration (MI) involves infusing physical activity into normal classroom time. A wide range of MI interventions have succeeded in increasing children’s participation in physical activity. However, no previous research has attempted to unpack the various MI intervention approaches. Therefore, this study aimed to systematically review, qualitatively analyze, and develop a typology of MI interventions conducted in primary/elementary school settings. Subjects/Methods Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed to identify published MI interventions. Irrelevant records were removed first by title, then by abstract, and finally by full texts of articles, resulting in 72 studies being retained for qualitative analysis. A deductive approach, using previous MI research as an a priori analytic framework, alongside inductive techniques were used to analyze the data. Results Four types of MI interventions were identified and labeled based on their design: student-driven, teacher-driven, researcher-teacher collaboration, and researcher-driven. Each type was further refined based on the MI strategies (movement breaks, active lessons, other: opening activity, transitions, reward, awareness), the level of intrapersonal and institutional support (training, resources), and the delivery (dose, intensity, type, fidelity). Nearly half of the interventions were researcher-driven, which may undermine the sustainability of MI as a routine practice by teachers in schools. An imbalance is evident on the MI strategies, with transitions, opening and awareness activities, and rewards being limitedly studied. Delivery should be further examined with a strong focus on reporting fidelity. Conclusions There are distinct approaches that are most often employed to promote the use of MI and these approaches may often lack a minimum standard for reporting MI intervention details. This typology may be useful to effectively translate the evidence into practice in real-life settings to better understand and study MI interventions.

Download Full-text

Online Route Planning over Time-Dependent Road Networks

2021 IEEE 37th International Conference on Data Engineering (ICDE) ◽

10.1109/icde51399.2021.00035 ◽

2021 ◽

Author(s):

Di Chen ◽

Ye Yuan ◽

Wenjin Du ◽

Yurong Cheng ◽

Guoren Wang

Keyword(s):

Road Networks ◽

Route Planning ◽

Time Dependent ◽

Over Time

Download Full-text

Quantitative linear dichroism imaging of molecular processes in living cells made simple by open software tools

Communications Biology ◽

10.1038/s42003-021-01694-1 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Alexey Bondar ◽

Olga Rybakova ◽

Josef Melcr ◽

Jan Dohnálek ◽

Petro Khoroshyy ◽

...

Keyword(s):

Biological Systems ◽

Linear Dichroism ◽

Quantitative Information ◽

Living Cells ◽

Software Tools ◽

Model Systems ◽

Membrane Properties ◽

Polarization Microscopy ◽

Molecular Processes ◽

Wide Range

AbstractFluorescence-detected linear dichroism microscopy allows observing various molecular processes in living cells, as well as obtaining quantitative information on orientation of fluorescent molecules associated with cellular features. Such information can provide insights into protein structure, aid in development of genetically encoded probes, and allow determinations of lipid membrane properties. However, quantitating and interpreting linear dichroism in biological systems has been laborious and unreliable. Here we present a set of open source ImageJ-based software tools that allow fast and easy linear dichroism visualization and quantitation, as well as extraction of quantitative information on molecular orientations, even in living systems. The tools were tested on model synthetic lipid vesicles and applied to a variety of biological systems, including observations of conformational changes during G-protein signaling in living cells, using fluorescent proteins. Our results show that our tools and model systems are applicable to a wide range of molecules and polarization-resolved microscopy techniques, and represent a significant step towards making polarization microscopy a mainstream tool of biological imaging.

Download Full-text

Sequence data from isolated lichen-associated melanized fungi enhance delimitation of two new lineages within Chaetothyriomycetidae

Mycological Progress ◽

10.1007/s11557-021-01706-8 ◽

2021 ◽

Vol 20 (7) ◽

pp. 911-927

Author(s):

Lucia Muggia ◽

Yu Quan ◽

Cécile Gueidan ◽

Abdullah M. S. Al-Hatmi ◽

Martin Grube ◽

...

Keyword(s):

Sequence Data ◽

Single Species ◽

Sister Group ◽

Asexual Propagation ◽

Dna Sequence Data ◽

Wide Range ◽

The Family ◽

Rock Inhabiting Fungi ◽

Stable Habitat

AbstractLichen thalli provide a long-lived and stable habitat for colonization by a wide range of microorganisms. Increased interest in these lichen-associated microbial communities has revealed an impressive diversity of fungi, including several novel lineages which still await formal taxonomic recognition. Among these, members of the Eurotiomycetes and Dothideomycetes usually occur asymptomatically in the lichen thalli, even if they share ancestry with fungi that may be parasitic on their host. Mycelia of the isolates are characterized by melanized cell walls and the fungi display exclusively asexual propagation. Their taxonomic placement requires, therefore, the use of DNA sequence data. Here, we consider recently published sequence data from lichen-associated fungi and characterize and formally describe two new, individually monophyletic lineages at family, genus, and species levels. The Pleostigmataceae fam. nov. and Melanina gen. nov. both comprise rock-inhabiting fungi that associate with epilithic, crust-forming lichens in subalpine habitats. The phylogenetic placement and the monophyly of Pleostigmataceae lack statistical support, but the family was resolved as sister to the order Verrucariales. This family comprises the species Pleostigma alpinum sp. nov., P. frigidum sp. nov., P. jungermannicola, and P. lichenophilum sp. nov. The placement of the genus Melanina is supported as a lineage within the Chaetothyriales. To date, this genus comprises the single species M. gunde-cimermaniae sp. nov. and forms a sister group to a large lineage including Herpotrichiellaceae, Chaetothyriaceae, Cyphellophoraceae, and Trichomeriaceae. The new phylogenetic analysis of the subclass Chaetothyiomycetidae provides new insight into genus and family level delimitation and classification of this ecologically diverse group of fungi.

Download Full-text

Estimating the number of usability problems affecting medical devices: modelling the discovery matrix

BMC Medical Research Methodology ◽

10.1186/s12874-020-01091-y ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Vincent Vandewalle ◽

Alexandre Caron ◽

Coralie Delettrez ◽

Renaud Périchon ◽

Sylvia Pelayo ◽

...

Keyword(s):

Medical Devices ◽

Market Access ◽

Real Life ◽

Usability Testing ◽

Probability Of Detection ◽

Usability Problems ◽

Usability Problem ◽

Wide Range ◽

The Matrix ◽

Problem Detection

Abstract Background Usability testing of medical devices are mandatory for market access. The testings’ goal is to identify usability problems that could cause harm to the user or limit the device’s effectiveness. In practice, human factor engineers study participants under actual conditions of use and list the problems encountered. This results in a binary discovery matrix in which each row corresponds to a participant, and each column corresponds to a usability problem. One of the main challenges in usability testing is estimating the total number of problems, in order to assess the completeness of the discovery process. Today’s margin-based methods fit the column sums to a binomial model of problem detection. However, the discovery matrix actually observed is truncated because of undiscovered problems, which corresponds to fitting the marginal sums without the zeros. Margin-based methods fail to overcome the bias related to truncation of the matrix. The objective of the present study was to develop and test a matrix-based method for estimating the total number of usability problems. Methods The matrix-based model was based on the full discovery matrix (including unobserved columns) and not solely on a summary of the data (e.g. the margins). This model also circumvents a drawback of margin-based methods by simultaneously estimating the model’s parameters and the total number of problems. Furthermore, the matrix-based method takes account of a heterogeneous probability of detection, which reflects a real-life setting. As suggested in the usability literature, we assumed that the probability of detection had a logit-normal distribution. Results We assessed the matrix-based method’s performance in a range of settings reflecting real-life usability testing and with heterogeneous probabilities of problem detection. In our simulations, the matrix-based method improved the estimation of the number of problems (in terms of bias, consistency, and coverage probability) in a wide range of settings. We also applied our method to five real datasets from usability testing. Conclusions Estimation models (and particularly matrix-based models) are of value in estimating and monitoring the detection process during usability testing. Matrix-based models have a solid mathematical grounding and, with a view to facilitating the decision-making process for both regulators and device manufacturers, should be incorporated into current standards.

Download Full-text