Enhancing investigative pattern detection via inexact matching and graph databases

Abstract Background Most studies on children evaluate longitudinal growth as an important health indicator. Different methods have been used to detect growth patterns across childhood, but with no comparison between them to evaluate result consistency. We explored the variation in growth patterns as detected by different clustering and latent class modelling techniques. Moreover, we investigated how the characteristics/features (e.g. slope, tempo, velocity) of longitudinal growth influence pattern detection. Methods We studied 1134 children from The Applied Research Group for Kids cohort with longitudinal-growth measurements [height, weight, body mass index (BMI)] available from birth until 12 years of age. Growth patterns were identified by latent class mixed models (LCMM) and time-series clustering (TSC) using various algorithms and distance measures. Time-invariant features were extracted from all growth measures. A random forest classifier was used to predict the identified growth patterns for each growth measure using the extracted features. Results Overall, 72 TSC configurations were tested. For BMI, we identified three growth patterns by both TSC and LCMM. The clustering agreement was 58% between LCMM and TS clusters, whereas it varied between 30.8% and 93.3% within the TSC configurations. The extracted features (n = 67) predicted the identified patterns for each growth measure with accuracy of 82%–89%. Specific feature categories were identified as the most important predictors for patterns of all tested growth measures. Conclusion Growth-pattern detection is affected by the method employed. This can impact on comparisons across different populations or associations between growth patterns and health outcomes. Growth features can be reliably used as predictors of growth patterns.

Download Full-text

A real-time abnormal operation pattern detection method for building energy systems based on association rule bases

Building Simulation ◽

10.1007/s12273-021-0791-x ◽

2021 ◽

Author(s):

Chaobo Zhang ◽

Yang Zhao ◽

Yangze Zhou ◽

Xuejun Zhang ◽

Tingting Li

Keyword(s):

Real Time ◽

Association Rule ◽

Detection Method ◽

Energy Systems ◽

Building Energy ◽

Pattern Detection ◽

Operation Pattern ◽

Building Energy Systems ◽

Rule Bases ◽

Abnormal Operation

Download Full-text

Learning-free pattern detection for manuscript research:

International Journal on Document Analysis and Recognition (IJDAR) ◽

10.1007/s10032-021-00371-7 ◽

2021 ◽

Author(s):

Hussein Mohammed ◽

Volker Märgner ◽

Giovanni Ciotti

Keyword(s):

South Asian ◽

State Of The Art ◽

Research Question ◽

Software Tool ◽

Pattern Detection ◽

Asian Studies ◽

Medieval Manuscripts ◽

South Asian Studies ◽

Training Samples ◽

Nearest Neighbour Classifier

AbstractAutomatic pattern detection has become increasingly important for scholars in the humanities as the number of manuscripts that have been digitised has grown. Most of the state-of-the-art methods used for pattern detection depend on the availability of a large number of training samples, which are typically not available in the humanities as they involve tedious manual annotation by researchers (e.g. marking the location and size of words, drawings, seals and so on). This makes the applicability of such methods very limited within the field of manuscript research. We propose a learning-free approach based on a state-of-the-art Naïve Bayes Nearest-Neighbour classifier for the task of pattern detection in manuscript images. The method has already been successfully applied to an actual research question from South Asian studies about palm-leaf manuscripts. Furthermore, state-of-the-art results have been achieved on two extremely challenging datasets, namely the AMADI_LontarSet dataset of handwriting on palm leaves for word-spotting and the DocExplore dataset of medieval manuscripts for pattern detection. A performance analysis is provided as well in order to facilitate later comparisons by other researchers. Finally, an easy-to-use implementation of the proposed method is developed as a software tool and made freely available.

Download Full-text

DeepIaC: deep learning-based linguistic anti-pattern detection in IaC

Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation ◽

10.1145/3416505.3423564 ◽

2020 ◽

Author(s):

Nemania Borovits ◽

Indika Kumara ◽

Parvathy Krishnan ◽

Stefano Dalla Palma ◽

Dario Di Nucci ◽

...

Keyword(s):

Deep Learning ◽

Pattern Detection

Download Full-text

Advantages of using graph databases to explore chromatin conformation capture experiments

BMC Bioinformatics ◽

10.1186/s12859-020-03937-0 ◽

2021 ◽

Vol 22 (S2) ◽

Author(s):

Daniele D’Agostino ◽

Pietro Liò ◽

Marco Aldinucci ◽

Ivan Merelli

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Cell Types ◽

Graph Database ◽

Graph Databases ◽

Sources Of Information ◽

Chromosome Conformation ◽

Wide Scale ◽

User Friendly ◽

Different Cell Types

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Download Full-text

Transient Pattern Detection from Streaming Nature Data

2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) ◽

10.1109/candarw51189.2020.00089 ◽

2020 ◽

Author(s):

Thanapol Phungtua-eng ◽

Yoshitaka Yamamoto ◽

Shigeyuki Sako

Keyword(s):

Pattern Detection

Download Full-text