Decomposing Semantic Inferences

Linguistic Issues in Language Technology ◽

10.33011/lilt.v9i.1319 ◽

2014 ◽

Vol 9 ◽

Author(s):

Elena Cabrio ◽

Bernardo Magnini

Keyword(s):

Computational Linguistics ◽

Research Community ◽

Data Sets ◽

Decomposition Approach ◽

Semantic Inference ◽

The Core ◽

Granularity Level ◽

Textual Entailment ◽

Linguistic Dimension ◽

Logical Representation

Beside formal approaches to semantic inference that rely on logical representation of meaning, the notion of Textual Entailment (TE) has been proposed as an applied framework to capture major semantic inference needs across applications in Computational Linguistics. Although several approaches have been tried and evaluation campaigns have shown improvements in TE, a renewed interest is rising in the research community towards a deeper and better understanding of the core phenomena involved in textual inference. Pursuing this direction, we are convinced that crucial progress will derive from a focus on decomposing the complexity of the TE task into basic phenomena and on their combination. In this paper, we carry out a deep analysis on TE data sets, investigating the relations among two relevant aspects of semantic inferences: the logical dimension, i.e. the capacity of the inference to prove the conclusion from its premises, and the linguistic dimension, i.e. the linguistic devices used to accomplish the goal of the inference. We propose a decomposition approach over TE pairs, where single linguistic phenomena are isolated in what we have called atomic inference pairs, and we show that at this granularity level the actual correlation between the linguistic and the logical dimensions of semantic inferences emerges and can be empirically observed.

Download Full-text

Asymmetric Attributional Word Similarities Measures to detect Relations of Textual Generality

10.20944/preprints202008.0210.v1 ◽

2020 ◽

Author(s):

Sebastião Pais ◽

Gaël Dias

Keyword(s):

Language Processing ◽

Independent Method ◽

Inference Mechanism ◽

Semantic Inference ◽

The Core ◽

Annotation Data ◽

Entailment Relation ◽

Different Types ◽

Textual Entailment ◽

Task Systems

In this work we present a new unsupervised and language-independent methodology to detect relations of textual generality, for this, we introduce a particular case of textual entailment (TE), namely Textual Entailment by Generality (TEG). TE aims to capture primary semantic inference needs across applications in Natural Language Processing (NLP). Since 2005, in the TE recognition (RTE) task, systems are asked to automatically judge whether the meaning of a portion of the text, the Text - T, entails the meaning of another text, the Hypothesis - H. Several novel approaches and improvements in TE technologies demonstrated in RTE Challenges are signalling of renewed interest towards a more in-depth and better understanding of the core phenomena involved in TE. In line with this direction, in this work, we focus on a particular case of entailment, entailment by generality, to detect relations of textual generality. In-text, there are different kinds of entailment, yielded from different types of implicative reasoning (lexical, syntactical, common sense based), but here we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more general one. Therefore, we have T→GH whenever the premise T entails the hypothesis H, being it also more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, from a pair &lang;T,H&rang; having an entailment relation. To this end, we introduce an Informative Asymmetric Measure (IAM) called Simplified Asymmetric InfoSimba (AISs), which we combine with different Asymmetric Association Measures (AAM). In this work, we hypothesize the existence of a particular mode of TE, namely TEG. Thus, the main contribution of our study is to highlight the importance of this inference mechanism. Consequently, the new annotation data seems to be a valuable resource for the community.

Download Full-text

Nordic EPOS - A FAIR Nordic EPOS Data Hub

10.5194/egusphere-egu21-2770 ◽

2021 ◽

Author(s):

Annakaisa Korja ◽

Kuvvet Atakan ◽

Peter H. Voss ◽

Michael Roth ◽

Kristin Vogfjord ◽

...

Keyword(s):

Solid Earth ◽

Arctic Region ◽

Research Community ◽

The Arctic ◽

Data Sets ◽

On Line ◽

Scientific Expertise ◽

Sustainable Societies ◽

And Training ◽

The Arctic Region

Nordic EPOS - A FAIR Nordic EPOS Data Hub &#8211; is a consortium of the Nordic geophysical observatories financed by NordForsk. It is delivering on-line data to European Plate Observing System&#8217;s Thematic Core Services (EPOS&#8217;s TCSs). Nordic EPOS consortium comprises of the Universities of Helsinki, Bergen, Uppsala, Oulu and GEUS and Icelandic Meteorological Office. Nordic EPOS enhances and stimulates the ongoing active Nordic interactions related to Solid Earth Research Infrastructures (RIs) in general and EPOS in particular. Nordic EPOS develops expertise and tools designed to integrate Nordic RI data and to enhance their accessibility and usefulness to the Nordic research community. Together we can address global challenges in Norden and with Nordic data.The Nordic EPOS&#8217;s main tasks are to advance the usage of multi-disciplinary Solid Earth data sets on scientific and societal problem solving, increase the amount of open, shared homogenized data sets, and increase the scientific expertise in creating sustainable societies in Nordic countries and especially in the Arctic region. In addition to developing services better suited for Nordic interest for EPOS, Nordic EPOS will also try to bring forward Nordic research interest, such as research of Arctic areas in TCSs and EPOS-ERIC governance and scientific boards.The Nordic EPOS is organized into Tasks and Activities. The project has six main infrastructure TASKs: I - Training in usage of EPOS-RI data and services; II - Nordic data integration and FAIRness; III - Nordic station management of seismological networks, IV - Induced seismicity, safe society; V - Ash and gas monitoring; and VI- Geomagnetic hazards. In addition, the project has one transversal TASK VII on Communication and dissemination. The activities within the TASKs are workshops, tutorials, demos and training sessions (virtual and on-site), and communication and dissemination of EPOS data and metadata information at local, national and international workshops, meetings, and conferences.

Download Full-text

An Efficient Framework for Vietnamese Sentiment Classification

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200579 ◽

2020 ◽

Author(s):

Cuong V. Nguyen ◽

Khiem H. Le ◽

Anh M. Tran ◽

Binh T. Nguyen

Keyword(s):

Product Quality ◽

Sentiment Analysis ◽

New Products ◽

Classification Problem ◽

Research Community ◽

Sentiment Classification ◽

Experimental Results ◽

Data Sets ◽

Online Retailers

With the booming development of E-commerce platforms in many counties, there is a massive amount of customers’ review data in different products and services. Understanding customers’ feedbacks in both current and new products can give online retailers the possibility to improve the product quality, meet customers’ expectations, and increase the corresponding revenue. In this paper, we investigate the Vietnamese sentiment classification problem on two datasets containing Vietnamese customers’ reviews. We propose eight different approaches, including Bi-LSTM, Bi-LSTM + Attention, Bi-GRU, Bi-GRU + Attention, Recurrent CNN, Residual CNN, Transformer, and PhoBERT, and conduct all experiments on two datasets, AIVIVN 2019 and our dataset self-collected from multiple Vietnamese e-commerce websites. The experimental results show that all our proposed methods outperform the winning solution of the competition “AIVIVN 2019 Sentiment Champion” with a significant margin. Especially, Recurrent CNN has the best performance in comparison with other algorithms in terms of both AUC (98.48%) and F1-score (93.42%) in this competition dataset and also surpasses other techniques in our dataset collected. Finally, we aim to publish our codes, and these two data-sets later to contribute to the current research community related to the field of sentiment analysis.

Download Full-text

Alternative Clustering

Advances in Business Information Systems and Analytics - Applying Predictive Analytics Within the Service Sector ◽

10.4018/978-1-5225-2148-8.ch001 ◽

2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Avinash Navlani ◽

V. B. Gupta

Keyword(s):

Research Problem ◽

Research Community ◽

Data Sets ◽

Complex Data ◽

High Quality ◽

Data Set ◽

Alternative Clustering ◽

Complex Data Sets ◽

Data Objects ◽

Community Clustering

In the last couple of decades, clustering has become a very crucial research problem in the data mining research community. Clustering refers to the partitioning of data objects such as records and documents into groups or clusters of similar characteristics. Clustering is unsupervised learning, because of unsupervised nature there is no unique solution for all problems. Most of the time complex data sets require explanation in multiple clustering sets. All the Traditional clustering approaches generate single clustering. There is more than one pattern in a dataset; each of patterns can be interesting in from different perspectives. Alternative clustering intends to find all unlike groupings of the data set such that each grouping has high quality and distinct from each other. This chapter gives you an overall view of alternative clustering; it's various approaches, related work, comparing with various confusing related terms like subspace, multi-view, and ensemble clustering, applications, issues, and challenges.

Download Full-text

The Religious Divide in Voting Preferences and Attitudes in the 2019 Election

Studies in Indian Politics ◽

10.1177/2321023019874892 ◽

2019 ◽

Vol 7 (2) ◽

pp. 161-175 ◽

Cited By ~ 1

Author(s):

Shreyas Sardesai

Keyword(s):

Sufficient Evidence ◽

Data Sets ◽

Election Study ◽

Large Measure ◽

National Elections ◽

Election Result ◽

The Core ◽

National Election Study ◽

Bharatiya Janata Party ◽

The Government

This article attempts to empirically test the claims made by several commentators that religious polarization was at the core of the 2019 Lok Sabha election verdict. Relying heavily on the National Election Study (NES) data sets, it finds that the election result was in large measure an outcome of massive vote consolidation on religious lines, with the majority Hindu community preferring the Bharatiya Janata Party (BJP)-led National Democratic Alliance (NDA) in unprecedented proportion and the main religious minorities largely staying away from it, although there were some exceptions. It shows that, for two national elections in a row, the Narendra Modi- and Amit Shah-led BJP has been able to overcome the caste hierarchies among Hindus and systematically construct a Hindu category of voters versus others. This chasm between Hindus and the minorities is also seen with respect to their attitudes regarding the government, its leadership and contentious issues like the Ayodhya dispute. This article, however, does not find sufficient evidence with regard to the claims that a large part of the Hindu support for the BJP-led alliance may have been on account of anti-minority sentiments.

Download Full-text

Improved SfM-Based Indoor Localization with Occlusion Removal

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2018070102 ◽

2018 ◽

Vol 10 (3) ◽

pp. 24-40

Author(s):

Yushi Li ◽

George Baciu ◽

Yu Han ◽

Chenhui Li

Keyword(s):

Indoor Localization ◽

Sparse Matrix ◽

Matrix Decomposition ◽

Low Rank ◽

Data Sets ◽

Indoor Environments ◽

Decomposition Approach ◽

Localization Accuracy ◽

Occlusion Removal ◽

Accuracy Increase

This article describes a novel 3D image-based indoor localization system integrated with an improved SfM (structure from motion) approach and an obstacle removal component. In contrast with existing state-of-the-art localization techniques focusing on static outdoor or indoor environments, the adverse effects, generated by moving obstacles in busy indoor spaces, are considered in this work. In particular, the problem of occlusion removal is converted into a separation problem of moving foreground and static background. A low-rank and sparse matrix decomposition approach is used to solve this problem efficiently. Moreover, a SfM with RT (re-triangulation) is adopted in order to handle the drifting problem of incremental SfM method in indoor scene reconstruction. To evaluate the performance of the system, three data sets and the corresponding query sets are established to simulate different states of the indoor environment. Quantitative experimental results demonstrate that both query registration rate and localization accuracy increase significantly after integrating the authors' improvements.

Download Full-text

Systems Biology and Multi-Omics Integration: Viewpoints from the Metabolomics Research Community

Metabolites ◽

10.3390/metabo9040076 ◽

2019 ◽

Vol 9 (4) ◽

pp. 76 ◽

Cited By ~ 93

Author(s):

Farhana R. Pinu ◽

David J. Beale ◽

Amy M. Paten ◽

Konstantinos Kouremenos ◽

Sanjay Swarup ◽

...

Keyword(s):

New Zealand ◽

Systems Biology ◽

Life Science ◽

Biological Systems ◽

Research Community ◽

Data Sets ◽

Omics Data ◽

Omics Integration ◽

Traditional Approaches ◽

Multiple Interactions

The use of multiple omics techniques (i.e., genomics, transcriptomics, proteomics, and metabolomics) is becoming increasingly popular in all facets of life science. Omics techniques provide a more holistic molecular perspective of studied biological systems compared to traditional approaches. However, due to their inherent data differences, integrating multiple omics platforms remains an ongoing challenge for many researchers. As metabolites represent the downstream products of multiple interactions between genes, transcripts, and proteins, metabolomics, the tools and approaches routinely used in this field could assist with the integration of these complex multi-omics data sets. The question is, how? Here we provide some answers (in terms of methods, software tools and databases) along with a variety of recommendations and a list of continuing challenges as identified during a peer session on multi-omics integration that was held at the recent ‘Australian and New Zealand Metabolomics Conference’ (ANZMET 2018) in Auckland, New Zealand (Sept. 2018). We envisage that this document will serve as a guide to metabolomics researchers and other members of the community wishing to perform multi-omics studies. We also believe that these ideas may allow the full promise of integrated multi-omics research and, ultimately, of systems biology to be realized.

Download Full-text

CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets

Molecular Biology and Evolution ◽

10.1093/molbev/msaa224 ◽

2020 ◽

Author(s):

Connor D Harris ◽

Ellis L Torrance ◽

Kasie Raymann ◽

Louis-Marie Bobay

Keyword(s):

Core Genome ◽

Genomic Data ◽

Data Sets ◽

The Core ◽

Genomic Analyses ◽

Massive Accumulation ◽

Genome Comparisons

Abstract The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.

Download Full-text

New techniques of applying multi-wavelength anomalous scattering data

Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences ◽

10.1098/rspa.1993.0087 ◽

1993 ◽

Vol 442 (1914) ◽

pp. 13-32 ◽

Cited By ~ 5

Keyword(s):

Fourier Coefficients ◽

Structural Model ◽

Real Data ◽

Scattering Data ◽

Anomalous Scattering ◽

Data Sets ◽

New Techniques ◽

The Core ◽

Multi Wavelength ◽

Do So

Several different methods of using multi-wavelength anomalous scattering data are described and illustrated by application to the solution of the known protein structure, core streptavidin, for which data at three wavelengths were available. Three of the methods depend on the calculation of Patterson-like functions for which the Fourier coefficients involve combinations of the anomalous structure amplitudes from either two or three wavelengths. Each of these maps should show either vectors between anomalous scatterers or between anomalous scatterers and non-anomalous scatterers. While they do so when ideal data are used, with real data they give little information; it is concluded that these methods are far too sensitive to errors in the data and to the scaling of the data-sets to each other. Another Patterson-type function, the P s function, which uses only single-wavelength data can be made more effective by combining the information from several wavelengths. Two analytical methods are described, called AGREE and ROTATE, both of which were very successfully applied to the core streptavidin data. They are both made more effective by preprocessing the data with a procedure called REVISE which brings a measure of mutual consistency to the data from different wavelengths. The best phases obtained from AGREE lead to a map with a conventional correlation coefficient of 0.549 and this should readily be interpreted in terms of a structural model.

Download Full-text

Research of Improved Attribute Reduction Algorithm Based on Data Mining of Rough Set

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2120 ◽

2014 ◽

Vol 644-650 ◽

pp. 2120-2123 ◽

Cited By ~ 2

Author(s):

De Zhi An ◽

Guang Li Wu ◽

Jun Lu

Keyword(s):

Data Mining ◽

Rough Set ◽

Rough Set Theory ◽

Attribute Reduction ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Reduction Algorithm ◽

The Core ◽

Rules Extraction

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.

Download Full-text