scholarly journals A Novel Framework for Aspect Knowledgebase Generated Automatically from Social Media Using Pattern Rules

2021 ◽  
Vol 22 (4) ◽  
Author(s):  
Tuan Anh Tran ◽  
Jarunee Duangsuwan ◽  
Wiphada Wettayaprasit

One of the factors improving businesses in business intelligence is summarization systems which could generate summaries based on sentiment from social media. However, these systems could not produce automatically, they used annotated datasets. To automatically produce sentiment summaries without using the annotated datasets, we propose a novel framework using pattern rules. The framework has two procedures: 1) pre-processing and 2) aspect knowledgebase generation. The first procedure is to check and correct misspelt words (bigram and unigram) by a proposed method, and tag part-of-speech all words. The second procedure is to automatically generate aspect knowledgebase used to produce sentiment summaries by the sentiment summarization systems. Pattern rules and semantic similarity-based pruning are used to automatically generate aspect knowledgebase from social media. In the experiments, eight domains from benchmark datasets of reviews are used. The performance evaluation of our proposed approach shows the high performance when compared to other approaches.

Author(s):  
Tuan Anh Tran ◽  
Jarunee Duangsuwan ◽  
Wiphada Wettayaprasit

Aspect-based online information on social media plays a vital role in influencing people’s opinions when consumers concern with their decisions to make a purchase, or companies intend to pursue opinions on their product or services. Determining aspect-based opinions from the online information is necessary for business intelligence to support users in reaching their objectives. In this study, we propose the new aspect extraction and scoring system which has three procedures. The first procedure is normalizing and tagging part-of-speech for sentences of datasets. The second procedure is extracting aspects with pattern rules. The third procedure is assigning scores for aspects with SentiWordNet. In the experiments, benchmark datasets of customer reviews are used for evaluation. The performance evaluation of our proposed system shows that our proposed system has high accuracy when compared to other systems.


2021 ◽  
pp. 1-12
Author(s):  
Fuqiang Zhao ◽  
Zhengyu Zhu ◽  
Ping Han

To measure semantic similarity between words, a novel model DFRVec that encodes multiple semantic information of a word in WordNet into a vector space is presented in this paper. Firstly, three different sub-models are proposed: 1) DefVec: encoding the definitions of a word in WordNet; 2) FormVec: encoding the part-of-speech (POS) of a word in WordNet; 3) RelVec: encoding the relations of a word in WordNet. Then by combining the three sub-models with an existing word embedding, the new model for generating the vector of a word is proposed. Finally, based on DFRVec and the path information in WordNet, a new method DFRVec+Path to measure semantic similarity between words is presented. The experiments on ten benchmark datasets show that DFRVec+Path can outperform many existing methods on semantic similarity measurement.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 656
Author(s):  
Xavier Larriva-Novo ◽  
Víctor A. Villagrá ◽  
Mario Vega-Barbas ◽  
Diego Rivera ◽  
Mario Sanz Rodrigo

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.


2018 ◽  
Vol 10 (8) ◽  
pp. 80
Author(s):  
Lei Zhang ◽  
Xiaoli Zhi

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.


2018 ◽  
Vol 10 (6) ◽  
pp. 964 ◽  
Author(s):  
Zhenfeng Shao ◽  
Ke Yang ◽  
Weixun Zhou

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. This is sufficient for simple problems, such as distinguishing between a building and a beach, but multiple labels and sometimes even dense (pixel) labels are required for more complex problems, such as RSIR and semantic segmentation.We therefore extended the existing multi-labeled dataset collected for multi-label RSIR and presented a dense labeling remote sensing dataset termed "DLRSD". DLRSD contained a total of 17 classes, and the pixels of each image were assigned with 17 pre-defined labels. We used DLRSD to evaluate the performance of RSIR methods ranging from traditional handcrafted feature-based methods to deep learning-based ones. More specifically, we evaluated the performances of RSIR methods from both single-label and multi-label perspectives. These results demonstrated the advantages of multiple labels over single labels for interpreting complex remote sensing images. DLRSD provided the literature a benchmark for RSIR and other pixel-based problems such as semantic segmentation.


Sign in / Sign up

Export Citation Format

Share Document