Hybrid Deep Neural Model for Duplicate Question Detection in Transliterated Bi-lingual Data

Author(s):  
Seema Rani ◽  
Avadhesh Kumar ◽  
Naresh Kumar

Background: Duplicate content often corrupts the filtering mechanism in online question answering. Moreover, as users are usually more comfortable conversing in their native language questions, transliteration adds to the challenges in detecting duplicate questions. This compromises with the response time and increases the answer overload. Thus, it has now become crucial to build clever, intelligent and semantic filters which semantically match linguistically disparate questions. Objective: Most of the research on duplicate question detection has been done on mono-lingual, majorly English Q&A platforms. The aim is to build a model which extends the cognitive capabilities of machines to interpret, comprehend and learn features for semantic matching in transliterated bi-lingual Hinglish (Hindi + English) data acquired from different Q&A platforms. Method: In the proposed DQDHinglish (Duplicate Question Detection) Model, firstly language transformation (transliteration & translation) is done to convert the bi-lingual transliterated question into a mono-lingual English only text. Next a hybrid of Siamese neural network containing two identical Long-term-Short-memory (LSTM) models and Multi-layer perceptron network is proposed to detect semantically similar question pairs. Manhattan distance function is used as the similarity measure. Result: A dataset was prepared by scrapping 100 question pairs from various social media platforms, such as Quora and TripAdvisor. The performance of the proposed model on the basis of accuracy and F-score. The proposed DQDHinglish achieves a validation accuracy of 82.40%. Conclusion: A deep neural model was introduced to find semantic match between English question and a Hinglish (Hindi + English) question such that similar intent questions can be combined to enable fast and efficient information processing and delivery. A dataset was created and the proposed model was evaluated on the basis of performance accuracy. To the best of our knowledge, this work is the first reported study on transliterated Hinglish semantic question matching.

2021 ◽  
Vol 11 (16) ◽  
pp. 7608
Author(s):  
Jian Chen ◽  
Jianpeng Chen ◽  
Xiangrong She ◽  
Jian Mao ◽  
Gang Chen

Address is a structured description used to identify a specific place or point of interest, and it provides an effective way to locate people or objects. The standardization of Chinese place name and address occupies an important position in the construction of a smart city. Traditional address specification technology often adopts methods based on text similarity or rule bases, which cannot handle complex, missing, and redundant address information well. This paper transforms the task of address standardization into calculating the similarity of address pairs, and proposes a contrast learning address matching model based on the attention-Bi-LSTM-CNN network (ABLC). First of all, ABLC use the Trie syntax tree algorithm to extract Chinese address elements. Next, based on the basic idea of contrast learning, a hybrid neural network is applied to learn the semantic information in the address. Finally, Manhattan distance is calculated as the similarity of the two addresses. Experiments on the self-constructed dataset with data augmentation demonstrate that the proposed model has better stability and performance compared with other baselines.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 708
Author(s):  
Wenbo Liu ◽  
Fei Yan ◽  
Jiyong Zhang ◽  
Tao Deng

The quality of detected lane lines has a great influence on the driving decisions of unmanned vehicles. However, during the process of unmanned vehicle driving, the changes in the driving scene cause much trouble for lane detection algorithms. The unclear and occluded lane lines cannot be clearly detected by most existing lane detection models in many complex driving scenes, such as crowded scene, poor light condition, etc. In view of this, we propose a robust lane detection model using vertical spatial features and contextual driving information in complex driving scenes. The more effective use of contextual information and vertical spatial features enables the proposed model more robust detect unclear and occluded lane lines by two designed blocks: feature merging block and information exchange block. The feature merging block can provide increased contextual information to pass to the subsequent network, which enables the network to learn more feature details to help detect unclear lane lines. The information exchange block is a novel block that combines the advantages of spatial convolution and dilated convolution to enhance the process of information transfer between pixels. The addition of spatial information allows the network to better detect occluded lane lines. Experimental results show that our proposed model can detect lane lines more robustly and precisely than state-of-the-art models in a variety of complex driving scenarios.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1332
Author(s):  
Hong Fan ◽  
Wu Du ◽  
Abdelghani Dahou ◽  
Ahmed A. Ewees ◽  
Dalia Yousri ◽  
...  

Social media has become an essential facet of modern society, wherein people share their opinions on a wide variety of topics. Social media is quickly becoming indispensable for a majority of people, and many cases of social media addiction have been documented. Social media platforms such as Twitter have demonstrated over the years the value they provide, such as connecting people from all over the world with different backgrounds. However, they have also shown harmful side effects that can have serious consequences. One such harmful side effect of social media is the immense toxicity that can be found in various discussions. The word toxic has become synonymous with online hate speech, internet trolling, and sometimes outrage culture. In this study, we build an efficient model to detect and classify toxicity in social media from user-generated content using the Bidirectional Encoder Representations from Transformers (BERT). The BERT pre-trained model and three of its variants has been fine-tuned on a well-known labeled toxic comment dataset, Kaggle public dataset (Toxic Comment Classification Challenge). Moreover, we test the proposed models with two datasets collected from Twitter from two different periods to detect toxicity in user-generated content (tweets) using hashtages belonging to the UK Brexit. The results showed that the proposed model can efficiently classify and analyze toxic tweets.


2021 ◽  
Vol 15 (4) ◽  
pp. 18-30
Author(s):  
Om Prakash Samantray ◽  
Satya Narayan Tripathy

There are several malware detection techniques available that are based on a signature-based approach. This approach can detect known malware very effectively but sometimes may fail to detect unknown or zero-day attacks. In this article, the authors have proposed a malware detection model that uses operation codes of malicious and benign executables as the feature. The proposed model uses opcode extract and count (OPEC) algorithm to prepare the opcode feature vector for the experiment. Most relevant features are selected using extra tree classifier feature selection technique and then passed through several supervised learning algorithms like support vector machine, naive bayes, decision tree, random forest, logistic regression, and k-nearest neighbour to build classification models for malware detection. The proposed model has achieved a detection accuracy of 98.7%, which makes this model better than many of the similar works discussed in the literature.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Haosen Liu ◽  
Youwei Wang ◽  
Xiabing Zhou ◽  
Zhengzheng Lou ◽  
Yangdong Ye

Purpose The railway signal equipment failure diagnosis is a vital element to keep the railway system operating safely. One of the most difficulties in signal equipment failure diagnosis is the uncertainty of causality between the consequence and cause for the accident. The traditional method to solve this problem is based on Bayesian Network, which needs a rigid and independent assumption basis and prior probability knowledge but ignoring the semantic relationship in causality analysis. This paper aims to perform the uncertainty of causality in signal equipment failure diagnosis through a new way that emphasis on mining semantic relationships. Design/methodology/approach This study proposes a deterministic failure diagnosis (DFD) model based on the question answering system to implement railway signal equipment failure diagnosis. It includes the failure diagnosis module and deterministic diagnosis module. In the failure diagnosis module, this paper exploits the question answering system to recognise the cause of failure consequences. The question answering is composed of multi-layer neural networks, which extracts the position and part of speech features of text data from lower layers and acquires contextual features and interactive features of text data by Bi-LSTM and Match-LSTM, respectively, from high layers, subsequently generates the candidate failure cause set by proposed the enhanced boundary unit. In the second module, this study ranks the candidate failure cause set in the semantic matching mechanism (SMM), choosing the top 1st semantic matching degree as the deterministic failure causative factor. Findings Experiments on real data set railway maintenance signal equipment show that the proposed DFD model can implement the deterministic diagnosis of railway signal equipment failure. Comparing massive existing methods, the model achieves the state of art in the natural understanding semantic of railway signal equipment diagnosis domain. Originality/value It is the first time to use a question answering system executing signal equipment failure diagnoses, which makes failure diagnosis more intelligent than before. The EMU enables the DFD model to understand the natural semantic in long sequence contexture. Then, the SMM makes the DFD model acquire the certainty failure cause in the failure diagnosis of railway signal equipment.


Author(s):  
Adil Hussain Mohammed

Cloud provide support to manage, control, monitor different organization. Due to flexible nature f cloud chance of attack on it increases by means of some software attack in form of ransomware. Many of researcher has proposed various model to prevent such attacks or to identify such activities. This paper has proposed a ransomware detection model by use of trained neural network. Training of neural network was done by filter or optimized feature set obtained from the feature reduction algorithm. Paper has proposed a Invasive Weed Optimization algorithm that filter good set of feature from the available input training dataset. Proposed model test was performed on real dataset, have set sessions related to cloud ransomware attacks. Result shows that proposed model has increase the comparing parameter values.


2020 ◽  
pp. 1-24
Author(s):  
Dequan Jin ◽  
Ziyan Qin ◽  
Murong Yang ◽  
Penghe Chen

We propose a novel neural model with lateral interaction for learning tasks. The model consists of two functional fields: an elementary field to extract features and a high-level field to store and recognize patterns. Each field is composed of some neurons with lateral interaction, and the neurons in different fields are connected by the rules of synaptic plasticity. The model is established on the current research of cognition and neuroscience, making it more transparent and biologically explainable. Our proposed model is applied to data classification and clustering. The corresponding algorithms share similar processes without requiring any parameter tuning and optimization processes. Numerical experiments validate that the proposed model is feasible in different learning tasks and superior to some state-of-the-art methods, especially in small sample learning, one-shot learning, and clustering.


Author(s):  
Tarek Helmy

The system that monitors the events occurring in a computer system or a network and analyzes the events for sign of intrusions is known as intrusion detection system. The performance of the intrusion detection system can be improved by combing anomaly and misuse analysis. This chapter proposes an ensemble multi-agent-based intrusion detection model. The proposed model combines anomaly, misuse, and host-based detection analysis. The agents in the proposed model use rules to check for intrusions, and adopt machine learning algorithms to recognize unknown actions, to update or create new rules automatically. Each agent in the proposed model encapsulates a specific classification technique, and gives its belief about any packet event in the network. These agents collaborate to determine the decision about any event, have the ability to generalize, and to detect novel attacks. Empirical results indicate that the proposed model is efficient, and outperforms other intrusion detection models.


2014 ◽  
Vol 5 (1) ◽  
pp. 1-13
Author(s):  
Haibo Wang ◽  
Bahram Alidaee ◽  
Wei Wang ◽  
Wei Ning

Telecommunication network infrastructures both stationary and ad hoc, play an important role in maintaining the stability of society worldwide. The protection of these critical infrastructures and their supporting structures become highly challenged due to its complexity. The understanding of interdependency of these infrastructures is the essential step to protect these infrastructures from destruction and attacks. This paper presents a critical infrastructure detection model to discover the interdependency based on the theories from social networks and new telecommunication pathways while this study transforms social theory into computational constructions. The procedure and solution of protecting critical infrastructures are discussed and computational results from the proposed model are presented.


Author(s):  
Richong Zhang ◽  
Xinyu Liu ◽  
Xinwei Chen ◽  
Zhiyuan Hu ◽  
Zhaoqing Xu ◽  
...  

Ci is a lyric poetry form that follows highly restrictive metrical structures. This makes it challenging for a computer to compose Ci subject to a specified metrical requirement. In this work, we adapt the CVAE framework to automated Ci generation under metrical constraints. Specifically, we present the first neural model that explicitly encodes the designated metrical structure for Ci generation. The proposed model is shown experimentally to generate Ci with nearly perfect metrical structures.


Sign in / Sign up

Export Citation Format

Share Document