Text Matching and Categorization: Mining Implicit Semantic Knowledge from Tree-Shape Structures

The diversities of large-scale semistructured data make the extraction of implicit semantic information have enormous difficulties. This paper proposes an automatic and unsupervised method of text categorization, in which tree-shape structures are used to represent semantic knowledge and to explore implicit information by mining hidden structures without cumbersome lexical analysis. Mining implicit frequent structures in trees can discover both direct and indirect semantic relations, which largely enhances the accuracy of matching and classifying texts. The experimental results show that the proposed algorithm remarkably reduces the time and effort spent in training and classifying, which outperforms established competitors in correctness and effectiveness.

Download Full-text

Modeling Source Syntax and Semantics for Neural AMR Parsing

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/691 ◽

2019 ◽

Cited By ~ 1

Author(s):

DongLai Ge ◽

Junhui Li ◽

Muhua Zhu ◽

Shoushan Li

Keyword(s):

Semantic Information ◽

Benchmark Dataset ◽

Experimental Results ◽

Semantic Relations ◽

Semantic Structure ◽

Encoding Scheme ◽

Word Sequence ◽

Mixed Sequence ◽

Source Sentence ◽

Meaning Representation

Sequence-to-sequence (seq2seq) approaches formalize Abstract Meaning Representation (AMR) parsing as a translation task from a source sentence to a target AMR graph. However, previous studies generally model a source sentence as a word sequence but ignore the inherent syntactic and semantic information in the sentence. In this paper, we propose two effective approaches to explicitly modeling source syntax and semantics into neural seq2seq AMR parsing. The first approach linearizes source syntactic and semantic structure into a mixed sequence of words, syntactic labels, and semantic labels, while in the second approach we propose a syntactic and semantic structure-aware encoding scheme through a self-attentive model to explicitly capture syntactic and semantic relations between words. Experimental results on an English benchmark dataset show that our two approaches achieve significant improvement of 3.1% and 3.4% F1 scores over a strong seq2seq baseline.

Download Full-text

Research on Automatic Construction of Financial Ontology Using Chinese Encyclopedia Resource

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.1240 ◽

2013 ◽

Vol 765-767 ◽

pp. 1240-1244

Author(s):

Qian Mo ◽

Shu Zhang

Keyword(s):

Artificial Intelligence ◽

Knowledge Management ◽

Information Retrieval ◽

Semantic Web ◽

Large Scale ◽

Semantic Information ◽

Dominant Role ◽

Experimental Results ◽

Automatic Construction ◽

Scale Creation

Ontology plays a dominant role in a growing number of different fields, such as information retrieval, artificial intelligence, semantic Web and knowledge management, etc. However, manual construction of large ontology is not feasible. This article discusses how to create Financial Ontology automatically from a resource of Chinese Encyclopedia. Financial Ontology includes Is-A relationship, Class-Instance relationship, Attribute-of relationship and Synonym relationship. Experimental Results show us that the constructed Financial Ontology has great advantages in the large scale, creation cost and the richness of semantic information.

Download Full-text

An Effective e-Commerce Recommender System Based on Trust and Semantic Information

Cybernetics and Information Technologies ◽

10.2478/cait-2021-0008 ◽

2021 ◽

Vol 21 (1) ◽

pp. 103-118

Author(s):

Qusai Y. Shambour ◽

Nidal M. Turab ◽

Omar Y. Adwan

Keyword(s):

Electronic Commerce ◽

Recommender Systems ◽

Recommender System ◽

Semantic Information ◽

Significant Rise ◽

Experimental Results ◽

Semantic Relations ◽

The Internet ◽

Rating Data ◽

Recommendation Accuracy

Abstract Electronic commerce has been growing gradually over the last decade as a new driver of the retail industry. In fact, the growth of e-Commerce has caused a significant rise in the number of choices of products and services offered on the Internet. This is where recommender systems come into play by providing meaningful recommendations to consumers based on their needs and interests effectively. However, recommender systems are still vulnerable to the scenarios of sparse rating data and cold start users and items. To develop an effective e-Commerce recommender system that addresses these limitations, we propose a Trust-Semantic enhanced Multi-Criteria CF (TSeMCCF) approach that exploits the trust relations and multi-criteria ratings of users, and the semantic relations of items within the CF framework to achieve effective results when sufficient rating data are not available. The experimental results have shown that the proposed approach outperforms other benchmark recommendation approaches with regard to recommendation accuracy and coverage.

Download Full-text

Computational fluid dynamics of rectangular external loop airlift reactor

International Journal of Chemical Reactor Engineering ◽

10.1515/ijcre-2020-0009 ◽

2020 ◽

Vol 18 (5-6) ◽

Author(s):

Shivanand M. Teli ◽

Channamallikarjun S. Mathpati

Keyword(s):

Large Scale ◽

Lift Coefficient ◽

Volume Ratio ◽

Reynolds Stress Model ◽

Airlift Reactor ◽

Experimental Results ◽

Turbulent Dispersion ◽

Drag Forces ◽

External Loop ◽

Turbulent Models

AbstractThe novel design of a rectangular external loop airlift reactor is at present the most used large-scale reactor for microalgae culture. It has a unique future for a large surface to volume ratio for exposure of light radiation for photosynthesis reaction. The 3D simulations have been performed in rectangular EL-ALR. The Eulerian–Eulerian approach has been used with a dispersed gas phase for different turbulent models. The performance and applicability of different turbulent model’s i.e., K-epsilon standard, K-epsilon realizable, K-omega, and Reynolds stress model are used and compared with experimental results. All drag forces and non-drag forces (turbulent dispersion, virtual mass, and lift coefficient) are included in the model. The experimental values of overall gas hold-up and average liquid circulation velocity have been compared with simulation and literature results. It is seemed to give good agreements. For the different elevations in the downcomer section, liquid axial velocity, turbulent kinetic energy, and turbulent eddy dissipation experimental have been compared with different turbulent models. The K-epsilon Realizable model gives better prediction with experimental results.

Download Full-text

Evaluation of recent advances in recommender systems on Arabic content

Journal Of Big Data ◽

10.1186/s40537-021-00420-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mehdi Srifi ◽

Ahmed Oussous ◽

Ayoub Ait Lahcen ◽

Salma Mouline

Keyword(s):

Recommender Systems ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Recent Advances ◽

Research Gap ◽

Text Preprocessing

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.

Download Full-text

Algorithm of Text Categorization Based on Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.311.158 ◽

2013 ◽

Vol 311 ◽

pp. 158-163 ◽

Cited By ~ 1

Author(s):

Li Qin Huang ◽

Li Qun Lin ◽

Yan Huang Liu

Keyword(s):

Cloud Computing ◽

Text Categorization ◽

Experimental Results ◽

Support Vector ◽

Computing Environment ◽

Mapreduce Framework ◽

Cloud Computing Environment ◽

Environment Map ◽

Vector Machines ◽

Parallel Text

MapReduce framework of cloud computing has an effective way to achieve massive text categorization. In this paper a distributed parallel text training algorithm in cloud computing environment based on multi-class Support Vector Machines(SVM) is designed. In cloud computing environment Map tasks realize distributing various types of samples and Reduce tasks realize the specific SVM training. Experimental results show that the execution time of text training decreases with the number of Reduce tasks increasing. Also a parallel text classifying based on cloud computing is designed and implemented, which classify the unknown type texts. Experimental results show that the speed of text classifying increases with the number of Map tasks increasing.

Download Full-text

Local Graph Edge Partitioning

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3466685 ◽

2021 ◽

Vol 12 (5) ◽

pp. 1-25

Author(s):

Shengwei Ji ◽

Chenyang Bu ◽

Lei Li ◽

Xindong Wu

Keyword(s):

Real World ◽

Graph Partitioning ◽

Large Scale ◽

Complete Information ◽

Local Information ◽

Experimental Results ◽

Two Stage ◽

Graph Computation ◽

Local Graph ◽

Edge Partitioning

Graph edge partitioning, which is essential for the efficiency of distributed graph computation systems, divides a graph into several balanced partitions within a given size to minimize the number of vertices to be cut. Existing graph partitioning models can be classified into two categories: offline and streaming graph partitioning models. The former requires global graph information during the partitioning, which is expensive in terms of time and memory for large-scale graphs. The latter creates partitions based solely on the received graph information. However, the streaming model may result in a lower partitioning quality compared with the offline model. Therefore, this study introduces a Local Graph Edge Partitioning model, which considers only the local information (i.e., a portion of a graph instead of the entire graph) during the partitioning. Considering only the local graph information is meaningful because acquiring complete information for large-scale graphs is expensive. Based on the Local Graph Edge Partitioning model, two local graph edge partitioning algorithms—Two-stage Local Partitioning and Adaptive Local Partitioning—are given. Experimental results obtained on 14 real-world graphs demonstrate that the proposed algorithms outperform rival algorithms in most tested cases. Furthermore, the proposed algorithms are proven to significantly improve the efficiency of the real graph computation system GraphX.

Download Full-text

Large Scale Text Categorization Based on Density Statistics Merging

Advances in Intelligent, Interactive Systems and Applications - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-02804-6_43 ◽

2019 ◽

pp. 321-327

Author(s):

Rujuan Wang ◽

Suhua Wang

Keyword(s):

Text Categorization ◽

Large Scale

Download Full-text

Attending to Entities for Better Text Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6254 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7554-7561

Author(s):

Pengxiang Cheng ◽

Katrin Erk

Keyword(s):

Large Scale ◽

Human Performance ◽

State Of The Art ◽

Syntactic Structure ◽

Semantic Knowledge ◽

Training Data ◽

Language Models ◽

Long Distance ◽

Future Directions ◽

Text Understanding

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.

Download Full-text

Multi-Task Self-Supervised Learning for Disfluency Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6456 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9193-9200

Author(s):

Shaolei Wang ◽

Wangxiang Che ◽

Qi Liu ◽

Pengda Qin ◽

Ting Liu ◽

...

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Experimental Results ◽

Training Data ◽

Competitive Performance ◽

Test Set ◽

Full Dataset ◽

Sentence Classification ◽

Trained Network

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.

Download Full-text