entity resolution Latest Research Papers

Improving Wildlife Population Inference Using Aerial Imagery and Entity Resolution

Journal of Agricultural Biological and Environmental Statistics ◽

10.1007/s13253-021-00484-w ◽

2022 ◽

Author(s):

Xinyi Lu ◽

Mevin B. Hooten ◽

Andee Kaplan ◽

Jamie N. Womble ◽

Michael R. Bower

Keyword(s):

Entity Resolution ◽

Aerial Imagery ◽

Wildlife Population ◽

Population Inference

Get full-text (via PubEx)

MultiBlock: A Scalable Iterative Approach for Progressive Entity Resolution

10.1109/bigdata52589.2021.9671540 ◽

2021 ◽

Author(s):

Dimitrios Karapiperis ◽

Aris Gkoulalas-Divanis ◽

Vassilios S. Verykios

Keyword(s):

Entity Resolution ◽

Iterative Approach

Get full-text (via PubEx)

Active deep learning on entity resolution by risk sampling

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.107729 ◽

2021 ◽

pp. 107729

Author(s):

Youcef Nafa ◽

Qun Chen ◽

Zhaoqiang Chen ◽

Xingyu Lu ◽

Haiyang He ◽

...

Keyword(s):

Deep Learning ◽

Entity Resolution

Get full-text (via PubEx)

Towards Deep Entity Resolution via Soft Schema Matching

Neurocomputing ◽

10.1016/j.neucom.2021.10.106 ◽

2021 ◽

Author(s):

Chenchen Sun ◽

Derong Shen

Keyword(s):

Entity Resolution ◽

Schema Matching

Get full-text (via PubEx)

The role of transitive closure in evaluating blocking methods for dirty entity resolution

Journal of Intelligent Information Systems ◽

10.1007/s10844-021-00676-3 ◽

2021 ◽

Author(s):

Mahdi Niknam ◽

Behrouz Minaei-Bidgoli ◽

Rouhollah Dianat

Keyword(s):

Transitive Closure ◽

Entity Resolution

Get full-text (via PubEx)

Real-Time Entity Resolution by Forest-Based Indexing in Database Systems with Vertical Fragmentations

10.1145/3487075.3487142 ◽

2021 ◽

Author(s):

Liang Zhu ◽

Jiapeng Yang ◽

Xin Song ◽

Yu Wang ◽

Yonggang Wei

Keyword(s):

Real Time ◽

Entity Resolution ◽

Database Systems

Get full-text (via PubEx)

ACERPI: An approach for ordinances collection, information extraction and entity resolution

10.5753/sbbd.2021.17869 ◽

2021 ◽

Author(s):

Christian Schmitz ◽

Serigne K. Mbaye ◽

Edimar Manica ◽

Renata Galante

Keyword(s):

Information Extraction ◽

Named Entity Recognition ◽

Entity Resolution ◽

Entity Recognition ◽

Recognition Model ◽

Named Entity ◽

The People ◽

Advanced Search ◽

Public Repositories ◽

Extract Information

Ordinances are documents issued by federal institutions that contain, among others, information regarding their staff. These documents are accessible through public repositories that usually do not allow any filter or advanced search on documents’ contents. This paper presents ACERPI, an approach which identifies the people mentioned in the ordinances to help the user find the documents of interest. ACERPI combines techniques to discover, obtain, convert and structure documents, extract information, and link employees entities. Experiments were performed on two real datasets and demonstrated a recall of 72.7% for our named entity recognition model trained with only 534 samples and F1 measure of 90% in the efficacy of the entity resolution technique.

Get full-text (via PubEx)

An Effective Entity Resolution Approach for Big Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k9503.09101121 ◽

2021 ◽

Vol 10 (11) ◽

pp. 100-112

Author(s):

Randa Mohamed Abd El-ghafar ◽

◽

Ali H. El-Bastawissy ◽

Eman S. Nasr ◽

Mervat H. Gheith ◽

...

Keyword(s):

Big Data ◽

Phase 1 ◽

Entity Resolution ◽

Locality Sensitive Hashing ◽

Performance Time ◽

Efficiency And Effectiveness ◽

Similar Cluster ◽

Computational Resources ◽

Five Phases ◽

F Measure

Entity Resolution (ER) is defined as the process 0f identifying records/ objects that correspond to real-world objects/ entities. To define a good ER approach, the schema of the data should be well-known. In addition, schema alignment of multiple datasets is not an easy task and may require either domain expert or ML algorithm to select which attributes to match. Schema agnostic meta-blocking tries to solve such a problem by considering each token as a blocking key regardless of the attributes it appears in. It may also be coupled with meta-blocking to reduce the number of false negatives. However, it requires the exact match of tokens which is very hard to occur in the actual datasets and it results in very low precision. To overcome such issues, we propose a novel and efficient ER approach for big data implemented in Apache Spark. The proposed approach is employed to avoid schema alignment as it treats the attributes as a bag of words and generates a set of n-grams which is transformed to vectors. The generated vectors are compared using a chosen similarity measure. The proposed approach is a generic one as it can accept all types of datasets. It consists of five consecutive sub-modules: 1) Dataset acquisition, 2) Dataset pre-processing, 3) Setting selection criteria, where all settings of the proposed approach are selected such as the used blocking key, the significant attributes, NLP techniques, ER threshold, and the used scenario of ER, 4) ER pipeline construction, and 5) Clustering where the similar records are grouped into the similar cluster. The ER pipeline could accept two types of attributes; the Weighted Attributes (WA) or the Compound Attributes (CA). In addition, it accepts all the settings selected in the fourth module. The pipeline consists of five phases. Phase 1) Generating the tokens composing the attributes. Phase 2) Generating n-grams of length n. Phase 3) Applying the hashing Text Frequency (TF) to convert each n-grams to a fixed-length feature vector. Phase 4) Applying Locality Sensitive Hashing (LSH), which maps similar input items to the same buckets with a higher probability than dissimilar input items. Phase 5) Classification of the objects to duplicates or not according to the calculated similarity between them. We introduced seven different scenarios as an input to the ER pipeline. To minimize the number of comparisons, we proposed the length filter which greatly contributes to improving the effectiveness of the proposed approach as it achieves the highest F-measure between the existing computational resources and scales well with the available working nodes. Three results have been revealed: 1) Using the CA in the different scenarios achieves better results than the single WA in terms of efficiency and effectiveness. 2) Scenario 3 and 4 Achieve the best performance time because using Soundex and Stemming contribute to reducing the performance time of the proposed approach. 3) Scenario 7 achieves the highest F-measure because by utilizing the length filter, we only compare records that are nearly within a pre-determined percentage of increase or decrease of string length. LSH is used to map the same inputs items to the buckets with a higher probability than dis-similar ones. It takes numHashTables as a parameter. Increasing the number of candidate pairs with the same numHashTables will reduce the accuracy of the model. Utilizing the length filter helps to minimize the number of candidates which in turn increases the accuracy of the approach.

Get full-text (via PubEx)

SeMBlock: A semantic-aware meta-blocking approach for entity resolution

Intelligent Decision Technologies ◽

10.3233/idt-200207 ◽

2021 ◽

pp. 1-8

Author(s):

Delaram Javdani ◽

Hossein Rahmani ◽

Gerhard Weiss

Keyword(s):

Large Scale ◽

Weighted Graph ◽

Entity Resolution ◽

Quality Measure ◽

Locality Sensitive Hashing ◽

Data Sets ◽

Real World Data ◽

Data Set ◽

Comprehensive Comparison ◽

F Measure

Entity resolution refers to the process of identifying, matching, and integrating records belonging to unique entities in a data set. However, a comprehensive comparison across all pairs of records leads to quadratic matching complexity. Therefore, blocking methods are used to group similar entities into small blocks before the matching. Available blocking methods typically do not consider semantic relationships among records. In this paper, we propose a Semantic-aware Meta-Blocking approach called SeMBlock. SeMBlock considers the semantic similarity of records by applying locality-sensitive hashing (LSH) based on word embedding to achieve fast and reliable blocking in a large-scale data environment. To improve the quality of the blocks created, SeMBlock builds a weighted graph of semantically similar records and prunes the graph edges. We extensively compare SeMBlock with 16 existing blocking methods, using three real-world data sets. The experimental results show that SeMBlock significantly outperforms all 16 methods with respect to two relevant measures, F-measure and pair-quality measure. F-measure and pair-quality measure of SeMBlock are approximately 7% and 27%, respectively, higher than recently released blocking methods.

Get full-text (via PubEx)

Entity Resolution of Japanese Apartment Property Information Using Neural Networks

10.1109/mipr51284.2021.00052 ◽

2021 ◽

Author(s):

Youiti Kado ◽

Takashi Hirokata ◽

Koji Matsumura ◽

Xueting Wang ◽

Toshihiko Yamasaki

Keyword(s):

Neural Networks ◽

Entity Resolution

Get full-text (via PubEx)

entity resolution
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Wildlife Population Inference Using Aerial Imagery and Entity Resolution

MultiBlock: A Scalable Iterative Approach for Progressive Entity Resolution

Active deep learning on entity resolution by risk sampling

Towards Deep Entity Resolution via Soft Schema Matching

The role of transitive closure in evaluating blocking methods for dirty entity resolution

Real-Time Entity Resolution by Forest-Based Indexing in Database Systems with Vertical Fragmentations

ACERPI: An approach for ordinances collection, information extraction and entity resolution

An Effective Entity Resolution Approach for Big Data

SeMBlock: A semantic-aware meta-blocking approach for entity resolution

Entity Resolution of Japanese Apartment Property Information Using Neural Networks

Export Citation Format

entity resolutionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Wildlife Population Inference Using Aerial Imagery and Entity Resolution

MultiBlock: A Scalable Iterative Approach for Progressive Entity Resolution

Active deep learning on entity resolution by risk sampling

Towards Deep Entity Resolution via Soft Schema Matching

The role of transitive closure in evaluating blocking methods for dirty entity resolution

Real-Time Entity Resolution by Forest-Based Indexing in Database Systems with Vertical Fragmentations

ACERPI: An approach for ordinances collection, information extraction and entity resolution

An Effective Entity Resolution Approach for Big Data

SeMBlock: A semantic-aware meta-blocking approach for entity resolution

Entity Resolution of Japanese Apartment Property Information Using Neural Networks

entity resolution
Recently Published Documents