Enhancing investigative pattern detection via inexact matching and graph databases

Author(s):  
Shashika R Muramudalige ◽  
Benjamin W. K. Hung ◽  
Anura P Jayasumana ◽  
Indrakshi Ray ◽  
Jytte Klausen
Author(s):  
Paraskevi Massara ◽  
Charles D G Keown-Stoneman ◽  
Lauren Erdman ◽  
Eric O Ohuma ◽  
Celine Bourdon ◽  
...  

Abstract Background Most studies on children evaluate longitudinal growth as an important health indicator. Different methods have been used to detect growth patterns across childhood, but with no comparison between them to evaluate result consistency. We explored the variation in growth patterns as detected by different clustering and latent class modelling techniques. Moreover, we investigated how the characteristics/features (e.g. slope, tempo, velocity) of longitudinal growth influence pattern detection. Methods We studied 1134 children from The Applied Research Group for Kids cohort with longitudinal-growth measurements [height, weight, body mass index (BMI)] available from birth until 12 years of age. Growth patterns were identified by latent class mixed models (LCMM) and time-series clustering (TSC) using various algorithms and distance measures. Time-invariant features were extracted from all growth measures. A random forest classifier was used to predict the identified growth patterns for each growth measure using the extracted features. Results Overall, 72 TSC configurations were tested. For BMI, we identified three growth patterns by both TSC and LCMM. The clustering agreement was 58% between LCMM and TS clusters, whereas it varied between 30.8% and 93.3% within the TSC configurations. The extracted features (n = 67) predicted the identified patterns for each growth measure with accuracy of 82%–89%. Specific feature categories were identified as the most important predictors for patterns of all tested growth measures. Conclusion Growth-pattern detection is affected by the method employed. This can impact on comparisons across different populations or associations between growth patterns and health outcomes. Growth features can be reliably used as predictors of growth patterns.


Author(s):  
Hussein Mohammed ◽  
Volker Märgner ◽  
Giovanni Ciotti

AbstractAutomatic pattern detection has become increasingly important for scholars in the humanities as the number of manuscripts that have been digitised has grown. Most of the state-of-the-art methods used for pattern detection depend on the availability of a large number of training samples, which are typically not available in the humanities as they involve tedious manual annotation by researchers (e.g. marking the location and size of words, drawings, seals and so on). This makes the applicability of such methods very limited within the field of manuscript research. We propose a learning-free approach based on a state-of-the-art Naïve Bayes Nearest-Neighbour classifier for the task of pattern detection in manuscript images. The method has already been successfully applied to an actual research question from South Asian studies about palm-leaf manuscripts. Furthermore, state-of-the-art results have been achieved on two extremely challenging datasets, namely the AMADI_LontarSet dataset of handwriting on palm leaves for word-spotting and the DocExplore dataset of medieval manuscripts for pattern detection. A performance analysis is provided as well in order to facilitate later comparisons by other researchers. Finally, an easy-to-use implementation of the proposed method is developed as a software tool and made freely available.


Author(s):  
Nemania Borovits ◽  
Indika Kumara ◽  
Parvathy Krishnan ◽  
Stefano Dalla Palma ◽  
Dario Di Nucci ◽  
...  

2021 ◽  
Vol 22 (S2) ◽  
Author(s):  
Daniele D’Agostino ◽  
Pietro Liò ◽  
Marco Aldinucci ◽  
Ivan Merelli

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.


Sign in / Sign up

Export Citation Format

Share Document