Document Similarity Approach Using Grammatical Linkages with Graph Databases

A document similarity approach using grammatical linkages with graph databases

International Journal of Enterprise Network Management ◽

10.1504/ijenm.2019.10024721 ◽

2019 ◽

Vol 10 (3/4) ◽

pp. 211

Author(s):

K. Umamaheswari ◽

V. Priya

Keyword(s):

Graph Databases ◽

Document Similarity

Get full-text (via PubEx)

A document similarity approach using grammatical linkages with graph databases

International Journal of Enterprise Network Management ◽

10.1504/ijenm.2019.103143 ◽

2019 ◽

Vol 10 (3/4) ◽

pp. 211

Author(s):

V. Priya ◽

K. Umamaheswari

Keyword(s):

Graph Databases ◽

Document Similarity

Get full-text (via PubEx)

Pembobotan Berdasarkan Tingkat Kesamaan Semantik pada Metode Fuzzy Semi-Supervised Co-Clustering untuk Pengelompokkan Dokumen Teks

Jurnal ULTIMATICS ◽

10.31937/ti.v6i2.333 ◽

2014 ◽

Vol 6 (2) ◽

pp. 46-51

Author(s):

Galang Amanda Dwi P. ◽

Gregorius Edwadr ◽

Agus Zainal Arifin

Keyword(s):

Supervised Learning ◽

Semantic Similarity ◽

The Other ◽

Classification Result ◽

Document Similarity ◽

The Matrix ◽

Index Terms ◽

Membership Value ◽

Degree Of Similarity

Nowadays, a large number of information can not be reached by the reader because of the misclassification of text-based documents. The misclassified data can also make the readers obtain the wrong information. The method which is proposed by this paper is aiming to classify the documents into the correct group. Each document will have a membership value in several different classes. The method will be used to find the degree of similarity between the two documents is the semantic similarity. In fact, there is no document that doesn’t have a relationship with the other but their relationship might be close to 0. This method calculates the similarity between two documents by taking into account the level of similarity of words and their synonyms. After all inter-document similarity values obtained, a matrix will be created. The matrix is then used as a semi-supervised factor. The output of this method is the value of the membership of each document, which must be one of the greatest membership value for each document which indicates where the documents are grouped. Classification result computed by the method shows a good value which is 90 %. Index Terms - Fuzzy co-clustering, Heuristic, Semantica Similiarity, Semi-supervised learning.

Get full-text (via PubEx)

Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies

Proceedings of the 10th International Conference on Knowledge Capture - K-CAP '19 ◽

10.1145/3360901.3364444 ◽

2019 ◽

Cited By ~ 1

Author(s):

Carlos Badenes-Olmedo ◽

José Luis Redondo-García ◽

Oscar Corcho

Keyword(s):

Document Similarity ◽

Specific Concept ◽

Cross Lingual ◽

Concept Hierarchies

Get full-text (via PubEx)

Advantages of using graph databases to explore chromatin conformation capture experiments

BMC Bioinformatics ◽

10.1186/s12859-020-03937-0 ◽

2021 ◽

Vol 22 (S2) ◽

Author(s):

Daniele D’Agostino ◽

Pietro Liò ◽

Marco Aldinucci ◽

Ivan Merelli

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Cell Types ◽

Graph Database ◽

Graph Databases ◽

Sources Of Information ◽

Chromosome Conformation ◽

Wide Scale ◽

User Friendly ◽

Different Cell Types

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Get full-text (via PubEx)

Applying graph database technology for analyzing perturbed co-expression networks in cancer

Database ◽

10.1093/database/baaa110 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Claire M Simpson ◽

Florian Gnad

Keyword(s):

Relational Databases ◽

Molecular Mechanisms ◽

Biological Data ◽

Database Management System ◽

Graph Database ◽

Graph Databases ◽

Graph Representations ◽

Rnaseq Data ◽

Database Technology ◽

Speed Accuracy

Abstract Graph representations provide an elegant solution to capture and analyze complex molecular mechanisms in the cell. Co-expression networks are undirected graph representations of transcriptional co-behavior indicating (co-)regulations, functional modules or even physical interactions between the corresponding gene products. The growing avalanche of available RNA sequencing (RNAseq) data fuels the construction of such networks, which are usually stored in relational databases like most other biological data. Inferring linkage by recursive multiple-join statements, however, is computationally expensive and complex to design in relational databases. In contrast, graph databases store and represent complex interconnected data as nodes, edges and properties, making it fast and intuitive to query and analyze relationships. While graph-based database technologies are on their way from a fringe domain to going mainstream, there are only a few studies reporting their application to biological data. We used the graph database management system Neo4j to store and analyze co-expression networks derived from RNAseq data from The Cancer Genome Atlas. Comparing co-expression in tumors versus healthy tissues in six cancer types revealed significant perturbation tracing back to erroneous or rewired gene regulation. Applying centrality, community detection and pathfinding graph algorithms uncovered the destruction or creation of central nodes, modules and relationships in co-expression networks of tumors. Given the speed, accuracy and straightforwardness of managing these densely connected networks, we conclude that graph databases are ready for entering the arena of biological data.

Get full-text (via PubEx)

Understanding the graph databases and power grid systems

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/439/3/032110 ◽

2018 ◽

Vol 439 ◽

pp. 032110 ◽

Cited By ~ 1

Author(s):

Xuming Lv ◽

Shuo Chen ◽

Shanqi Zheng ◽

Jingzhao Luan ◽

Yonghui Guo

Keyword(s):

Power Grid ◽

Graph Databases ◽

Grid Systems

Get full-text (via PubEx)

Efficient Access Methods for Very Large Distributed Graph Databases

Information Sciences ◽

10.1016/j.ins.2021.05.047 ◽

2021 ◽

Author(s):

David Luaces ◽

José R.R. Viqueira ◽

José M. Cotos ◽

Julián C. Flores

Keyword(s):

Graph Databases ◽

Access Methods ◽

Efficient Access

Get full-text (via PubEx)

Data Profiling in Property Graph Databases

Journal of Data and Information Quality ◽

10.1145/3409473 ◽

2020 ◽

Vol 12 (4) ◽

pp. 1-27

Author(s):

Sofía Maiolo ◽

Lorena Etcheverry ◽

Adriana Marotta

Keyword(s):

Graph Databases ◽

Data Profiling

Get full-text (via PubEx)

Large expert-curated database for benchmarking document similarity detection in biomedical literature search

Database ◽

10.1093/database/baz085 ◽

2019 ◽

Vol 2019 ◽

Author(s):

Peter Brown ◽

Aik-Choon Tan ◽

Mohamed A El-Esawi ◽

Thomas Liehr ◽

Oliver Blanck ◽

...

Keyword(s):

Literature Search ◽

Relevant Literature ◽

Biomedical Literature ◽

Medical Subject Headings ◽

Document Similarity ◽

Inverse Document Frequency ◽

Research Fields ◽

Experience Levels ◽

Document Frequency ◽

Systematic Biases

Abstract Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

Get full-text (via PubEx)