scholarly journals A Robust Text Classifier Based on Denoising Deep Neural Network in the Analysis of Big Data

2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Wulamu Aziguli ◽  
Yuanyu Zhang ◽  
Yonghong Xie ◽  
Dezheng Zhang ◽  
Xiong Luo ◽  
...  

Text classification has always been an interesting issue in the research area of natural language processing (NLP). While entering the era of big data, a good text classifier is critical to achieving NLP for scientific big data analytics. With the ever-increasing size of text data, it has posed important challenges in developing effective algorithm for text classification. Given the success of deep neural network (DNN) in analyzing big data, this article proposes a novel text classifier using DNN, in an effort to improve the computational performance of addressing big text data with hybrid outliers. Specifically, through the use of denoising autoencoder (DAE) and restricted Boltzmann machine (RBM), our proposed method, named denoising deep neural network (DDNN), is able to achieve significant improvement with better performance of antinoise and feature extraction, compared to the traditional text classification algorithms. The simulations on benchmark datasets verify the effectiveness and robustness of our proposed text classifier.

News is a routine in everyone's life. It helps in enhancing the knowledge on what happens around the world. Fake news is a fictional information madeup with the intension to delude and hence the knowledge acquired becomes of no use. As fake news spreads extensively it has a negative impact in the society and so fake news detection has become an emerging research area. The paper deals with a solution to fake news detection using the methods, deep learning and Natural Language Processing. The dataset is trained using deep neural network. The dataset needs to be well formatted before given to the network which is made possible using the technique of Natural Language Processing and thus predicts whether a news is fake or not.


Author(s):  
Muhammad Ali Ramdhani ◽  
Dian Sa’adillah Maylawati ◽  
Teddy Mantoro

<span>Every language has unique characteristics, structures, and grammar. Thus, different styles will have different processes and result in processed in Natural Language Processing (NLP) research area. In the current NLP research area, Data Mining (DM) or Machine Learning (ML) technique is popular, especially for Deep Learning (DL) method. This research aims to classify text data in the Indonesian language using Convolutional Neural Network (CNN) as one of the DL algorithms. The CNN algorithm used modified following the Indonesian language characteristics. Thereby, in the text pre-processing phase, stopword removal and stemming are particularly suitable for the Indonesian language. The experiment conducted using 472 Indonesian News text data from various sources with four categories: ‘hiburan’ (entertainment), ‘olahraga’ (sport), ‘tajuk utama’ (headline news), and ‘teknologi’ (technology). Based on the experiment and evaluation using 377 training data and 95 testing data, producing five models with ten epoch for each model, CNN has the best percentage of accuracy around 90,74% and loss value around 29,05% for 300 hidden layers in classifying the Indonesian News data.</span>


2021 ◽  
Author(s):  
Toly Chen ◽  
Yu Cheng Wang

Abstract To enhance the effectiveness of projecting the cycle time range of a job in a factory, a hybrid big data analytics and Industry 4.0 (BD-I4) approach is proposed in this study. As a joint application of big data analytics and Industry 4.0, the BD-I4 approach is distinct from existing methods in this field. In the BD-I4 approach, first, each expert constructs a fuzzy deep neural network (FDNN) to project the cycle time range of a job, which is an application of big data analytics (i.e., deep learning). Subsequently, fuzzy weighted intersection (FWI) is applied to aggregate the cycle time ranges projected by experts to consider their unequal authority levels, which is an application of Industry 4.0 (i.e., artificial intelligence). After applying the BD-I4 approach to a real case, the experimental results showed that the proposed methodology improved the projection precision by up to 72%. This result implied that instead of relying on a single expert, seeking the collaboration among multiple experts may be more effective and efficient.


2021 ◽  
pp. 1-13
Author(s):  
Ling Ding ◽  
Xiaojun Chen ◽  
Yang Xiang

Few-shot text classification aims to learn a classifier from very few labeled text data. Existing studies on this topic mainly adopt prototypical networks and focus on interactive information between support set and query instances to learn generalized class prototypes. However, in the process of encoding, these methods only pay attention to the matching information between support set and query instances, and ignore much useful information about intra-class similarity and inter-class dissimilarity between all support samples. Therefore, in this paper we propose a negative-supervised capsule graph neural network (NSCGNN) which explicitly takes use of the similarity and dissimilarity between samples to make the text representations of the same type closer with each other and the ones of different types farther away, leading to representative and discriminative class prototypes. We firstly construct a graph to obtain text representations in the form of node capsules, where both intra-cluster similarity and inter-cluster dissimilarity between all samples are explored with information aggregation and negative supervision. Then, in order to induce generalized class prototypes based on those node capsules obtained from graph neural network, the dynamic routing algorithm is utilized in our model. Experimental results demonstrate the effectiveness of our proposed NSCGNN model, which outperforms existing few-shot approaches on three benchmark datasets.


Healthcare ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 234 ◽  
Author(s):  
Hyun Yoo ◽  
Soyoung Han ◽  
Kyungyong Chung

Recently, a massive amount of big data of bioinformation is collected by sensor-based IoT devices. The collected data are also classified into different types of health big data in various techniques. A personalized analysis technique is a basis for judging the risk factors of personal cardiovascular disorders in real-time. The objective of this paper is to provide the model for the personalized heart condition classification in combination with the fast and effective preprocessing technique and deep neural network in order to process the real-time accumulated biosensor input data. The model can be useful to learn input data and develop an approximation function, and it can help users recognize risk situations. For the analysis of the pulse frequency, a fast Fourier transform is applied in preprocessing work. With the use of the frequency-by-frequency ratio data of the extracted power spectrum, data reduction is performed. To analyze the meanings of preprocessed data, a neural network algorithm is applied. In particular, a deep neural network is used to analyze and evaluate linear data. A deep neural network can make multiple layers and can establish an operation model of nodes with the use of gradient descent. The completed model was trained by classifying the ECG signals collected in advance into normal, control, and noise groups. Thereafter, the ECG signal input in real time through the trained deep neural network system was classified into normal, control, and noise. To evaluate the performance of the proposed model, this study utilized a ratio of data operation cost reduction and F-measure. As a result, with the use of fast Fourier transform and cumulative frequency percentage, the size of ECG reduced to 1:32. According to the analysis on the F-measure of the deep neural network, the model had 83.83% accuracy. Given the results, the modified deep neural network technique can reduce the size of big data in terms of computing work, and it is an effective system to reduce operation time.


2021 ◽  
Author(s):  
R. Salter ◽  
Quyen Dong ◽  
Cody Coleman ◽  
Maria Seale ◽  
Alicia Ruvinsky ◽  
...  

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.


2021 ◽  
Vol 3 (4) ◽  
pp. 922-945
Author(s):  
Shaw-Hwa Lo ◽  
Yiqiao Yin

Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models are capable of making good predictions, yet there is a lack of connection between language semantics and prediction results. This paper proposes a novel influence score (I-score), a greedy search algorithm, called Backward Dropping Algorithm (BDA), and a novel feature engineering technique called the “dagger technique”. First, the paper proposes to use the novel influence score (I-score) to detect and search for the important language semantics in text documents that are useful for making good predictions in text classification tasks. Next, a greedy search algorithm, called the Backward Dropping Algorithm, is proposed to handle long-term dependencies in the dataset. Moreover, the paper proposes a novel engineering technique called the “dagger technique” that fully preserves the relationship between the explanatory variable and the response variable. The proposed techniques can be further generalized into any feed-forward Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs), and any neural network. A real-world application on the Internet Movie Database (IMDB) is used and the proposed methods are applied to improve prediction performance with an 81% error reduction compared to other popular peers if I-score and “dagger technique” are not implemented.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yan Cheng ◽  
Yingying Cai ◽  
Haomai Chen ◽  
Zhuang Cai ◽  
Gang Wu ◽  
...  

The evaluation of the learning process is an effective way to realize personalized online learning. Real-time evaluation of learners’ cognitive level during online learning helps to monitor learners’ cognitive state and adjust learning strategies to improve the quality of online learning. However, most of the existing cognitive level evaluation methods use manual coding or traditional machine learning methods, which are time-consuming and laborious. They cannot fully mine the implicit cognitive semantic information in unstructured text data, making the cognitive level evaluation inefficient. Therefore, this study proposed the bidirectional gated recurrent convolutional neural network combined with an attention mechanism (AM-BiGRU-CNN) deep neural network cognitive level evaluation method, and based on Bloom’s taxonomy of cognition objectives, taking the unstructured interactive text data released by 9167 learners in the massive open online course (MOOC) forum as an empirical study to support the method. The study found that the AM-BiGRU-CNN method has the best evaluation effect, with the overall accuracy of the evaluation of the six cognitive levels reaching 84.21%, of which the F1-Score at the creating level is 91.77%. The experimental results show that the deep neural network method can effectively identify the cognitive features implicit in the text and can be better applied to the automatic evaluation of the cognitive level of online learners. This study provides a technical reference for the evaluation of the cognitive level of the students in the online learning environment, and automatic evaluation in the realization of personalized learning strategies, teaching intervention, and resources recommended have higher application value.


Author(s):  
Noha Ali ◽  
Ahmed H. AbuEl-Atta ◽  
Hala H. Zayed

<span id="docs-internal-guid-cb130a3a-7fff-3e11-ae3d-ad2310e265f8"><span>Deep learning (DL) algorithms achieved state-of-the-art performance in computer vision, speech recognition, and natural language processing (NLP). In this paper, we enhance the convolutional neural network (CNN) algorithm to classify cancer articles according to cancer hallmarks. The model implements a recent word embedding technique in the embedding layer. This technique uses the concept of distributed phrase representation and multi-word phrases embedding. The proposed model enhances the performance of the existing model used for biomedical text classification. The result of the proposed model overcomes the previous model by achieving an F-score equal to 83.87% using an unsupervised technique that trained on PubMed abstracts called PMC vectors (PMCVec) embedding. Also, we made another experiment on the same dataset using the recurrent neural network (RNN) algorithm with two different word embeddings Google news and PMCVec which achieving F-score equal to 74.9% and 76.26%, respectively.</span></span>


Sign in / Sign up

Export Citation Format

Share Document